-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stability fixes #204
Stability fixes #204
Conversation
What a pleasant surprise! I'll start looking through the code but I'll probably have questions about why certain changes fixes some of those issues. Looking forward to Starfish! |
@@ -140,7 +152,7 @@ void port_disable(PortData* p) { | |||
|
|||
void port_send_status(PortData* p, u8 d) { | |||
if (p->reply_len >= BRIDGE_BUF_SIZE) { | |||
bridge_disable_chan(p->chan); | |||
port_error(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason you're reverting these changes is to make the codebase more adaptable for another project? This will freeze up the coprocessor if the bridge state machine hits an undefined state, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, included the wrong version of this patch. port_error
should call bridge_disable_chan
on T2, so as to not change the current behavior. Just meant to abstract it out into a function to avoid merge conflicts going back and forth because starfish doesn't have bridge_*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And these functions might eventually want to go in a port-platform.c
, and also abstract out IO operations for non-SAMD21 hardware like Reach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just to summarize, it looks like there were several issues here:
Does that sound right? All in all, this looks like a major improvement to the code base. Once you swap the right commit in for the |
to ease compatibility between T2 and starfish
The length field is 8 bits, for a maximum size of 255.
These interrupt handlers defaulted to highest priority, so they could have interrupted the other handlers, leading to data races and corrupted state.
The TCC prescaler was set to 1, not 1024, because the line meant to set CTRLA.ENABLE also cleared the rest ot CTRLA, including the prescaler field. However, a period of 65536 at 48MHz gives 732.4Hz, a reasonable PWM frequency. The TCC1 ISR failed to clear the interrupt flag, meaning that it re- executed as soon as it exited. Thus, the `interpolate` function generated a very high frequency sine wave as fast as it could calculate. The obeserved frequency on the LED was due to aliasing when this wave was sampled at the 732Hz period of the TCC, where CC0BUF is copied to CC0. Instead, this sets the pattern period deliberately. Division and modulo by 2^16 is much faster than by 2^16-1 because it can be done with a bit shift and mask.
If these functions start another USB transaction, and it completes before exiting the function, the interrupt flag was falsely cleared, and the ISR never re-triggered, which could lead to the transfer hanging.
The high-priority bridge ISR modifies CHID, and if that preempts these critical sections, they would accidentally configure the bridge DMA channel instead of their own.
Increase the delay in spid and make the bridge code more resilient against the EVSYS ISR preempting the DMA ISR when the bridge gets a new transaction before finishing processing the last one. There could be a better solution with more invasive changes to tell the SoC when it is allowed to poll. This fix doesn't guarantee correctness, and costs performance, but substantially improves stability when using all three bridge pipes simultaneously.
I pulled this locally, rebuilt and flashed with dfu-util. I then ran the following script, which wants to poll all available analog inputs every 100ms (this is relatively basic compared to what user programs might expect to be able to accomplish). "use strict";
const tessel = require("tessel");
const MAX_INTERVAL_MS = 100;
function constrain(value, low, high) {
return Math.max(Math.min(value, high), low);
}
function timeout(a, b) {
const diff = MAX_INTERVAL_MS - (b - a);
return constrain(diff, 0, MAX_INTERVAL_MS);
}
function analogRead(port, pin, callback) {
let start = Date.now();
const poll = _ => {
tessel.port[port].pin[pin].analogRead((error, data) => {
const now = Date.now();
callback(data);
setTimeout(poll, timeout(start, now));
start = now;
});
tessel.port[port].sock.write(new Buffer([24, pin]));
};
poll();
}
analogRead("A", 4, value => {
console.log("A4", value);
});
analogRead("A", 7, value => {
console.log("A7", value);
});
analogRead("B", 2, value => {
console.log("B2", value);
});
analogRead("B", 3, value => {
console.log("B3", value);
});
analogRead("B", 4, value => {
console.log("B4", value);
});
analogRead("B", 5, value => {
console.log("B5", value);
});
analogRead("B", 6, value => {
console.log("B6", value);
});
analogRead("B", 7, value => {
console.log("B7", value);
}); The result is consistent failure:
If I eliminate port A, and only read port B:
I'm going to adapt these for I2C and run similar tests, but I wanted to report my findings so far |
Did you update the SPI daemon on the SoC from this branch? |
Honestly, I'm not even sure how to answer that. Here is exactly what I did, step by step:
If there was some other update process that I needed to do, then I can say confidently that I did not. |
Just tested your code on my Tessel and saw it fail the same way. Looking at the code for
|
Ok, will do that and follow up with I2C version of the tests |
@kevinmehall just to be clear - you're suggesting that we need to make the ADC read in the SAMD21 enable an interrupt and start the read, then service the handler later rather than have a synchronous call, correct? |
@kevinmehall after updating the spid as you mention above, I've run the following script to test I2C on both ports: "use strict";
const tessel = require("tessel");
const MAX_INTERVAL_MS = 100;
function constrain(value, low, high) {
return Math.max(Math.min(value, high), low);
}
function timeout(a, b) {
const diff = MAX_INTERVAL_MS - (b - a);
return constrain(diff, 0, MAX_INTERVAL_MS);
}
function i2cRead(port, address, register, length, callback) {
var start = Date.now();
var i2c = new tessel.port[port].I2C(address);
var poll = _ => {
i2c.transfer(register, length, (error, rx) => {
var now = Date.now();
callback(rx);
setTimeout(poll, timeout(start, now));
start = now;
});
};
poll();
}
var peripherals = [
// Read the temperature coefficients from a BME280
{ port: "A", address: 0x77, register: 0x88, length: 24 },
// Read 3 axis compass data from HMC5983
// (Expect nonesense values because the unit isn't being
// configured prior to the read. We just want data anyway...)
{ port: "B", address: 0x1E, register: 0x03, length: 6 },
];
peripherals.forEach(p => {
i2cRead(p.port, p.address, p.register, p.length, buffer => {
console.log(`i2cRead (${p.port}, ${p.address}, ${p.register}, ${p.length}):
<${Array.from(buffer, v => ('00' + v.toString(16)).slice(-2)).join(", ")}>
`);
});
}); It's been running now for a very long time (I forgot to check the exact time that I started it) |
@johnnyman727 Correct |
❤️ 👍 looking forward to see this getting merged |
Working on ADC interrupts here: #207 |
@kevinmehall heads up, @johnnyman727 and I just discovered that my test code above had an "artifact" within that was causing it to fail with this patch. The line: tessel.port[port].sock.write(new Buffer([24, pin])); Was causing duplicate tessel.port[port].sock.write(new Buffer([24, pin]));
tessel.port[port].enqueue({
size: 2,
callback: (error, data) => {
//...
}
}); Whenever I changed it back to |
LGTM! @johnnyman727 land at will! |
Also, @kevinmehall I just want to say thanks and great work! |
👏 👏 👏 👏 |
See individual commit messages for details.
My primary test was running the accelerometer demo on port A and ambient demo on port B with
t2 run --usb
, which fails within seconds on the current release, and hasn't failed yet with these changes.Fixes #183, Fixes #169 (at least three of these bugs could lead to that kind of corrupted response)
Possibly #128
Possibly #166