Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stability fixes #204

Merged
merged 12 commits into from
Sep 2, 2016
Merged

Stability fixes #204

merged 12 commits into from
Sep 2, 2016

Conversation

kevinmehall
Copy link
Member

See individual commit messages for details.

My primary test was running the accelerometer demo on port A and ambient demo on port B with t2 run --usb, which fails within seconds on the current release, and hasn't failed yet with these changes.

Fixes #183, Fixes #169 (at least three of these bugs could lead to that kind of corrupted response)
Possibly #128
Possibly #166

@johnnyman727
Copy link
Contributor

What a pleasant surprise! I'll start looking through the code but I'll probably have questions about why certain changes fixes some of those issues. Looking forward to Starfish!

@@ -140,7 +152,7 @@ void port_disable(PortData* p) {

void port_send_status(PortData* p, u8 d) {
if (p->reply_len >= BRIDGE_BUF_SIZE) {
bridge_disable_chan(p->chan);
port_error(p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason you're reverting these changes is to make the codebase more adaptable for another project? This will freeze up the coprocessor if the bridge state machine hits an undefined state, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, included the wrong version of this patch. port_error should call bridge_disable_chan on T2, so as to not change the current behavior. Just meant to abstract it out into a function to avoid merge conflicts going back and forth because starfish doesn't have bridge_*.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And these functions might eventually want to go in a port-platform.c, and also abstract out IO operations for non-SAMD21 hardware like Reach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 makes sense to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnnyman727
Copy link
Contributor

So just to summarize, it looks like there were several issues here:

  • The port code was too implementation-specific
  • There was an off by one error in the buffer size that likely caused Pin.analogRead intermittently unresponsive #166
  • The interrupts were potentially being serviced in a non-ideal order, causing memory corruption when particular buffers were overwritten unexpectedly
  • The USB transfer logic wasn't clearing interrupt flags properly, possible causing hang ups
  • The MediaTek was possibly polling the SAMD21 too fast, causing buffers to be overwritten unexpectedly.

Does that sound right?

All in all, this looks like a major improvement to the code base. Once you swap the right commit in for the invalid function, I'll run a smoke test on my T2 and merge.

to ease compatibility between T2 and starfish
The length field is 8 bits, for a maximum size of 255.
These interrupt handlers defaulted to highest priority, so they could
have interrupted the other handlers, leading to data races and corrupted
state.
The TCC prescaler was set to 1, not 1024, because the line meant to set
CTRLA.ENABLE also cleared the rest ot CTRLA, including the prescaler
field. However, a period of 65536 at 48MHz gives 732.4Hz, a reasonable
PWM frequency.

The TCC1 ISR failed to clear the interrupt flag, meaning that it re-
executed as soon as it exited. Thus, the `interpolate` function
generated a very high frequency sine wave as fast as it could calculate.
The obeserved frequency on the LED was due to aliasing when this wave was
sampled at the 732Hz period of the TCC, where CC0BUF is copied to CC0.
Instead, this sets the pattern period deliberately.

Division and modulo by 2^16 is much faster than by 2^16-1 because it can
be done with a bit shift and mask.
If these functions start another USB transaction, and it completes
before exiting the function, the interrupt flag was falsely cleared, and
the ISR never re-triggered, which could lead to the transfer hanging.
The high-priority bridge ISR modifies CHID, and if that preempts these
critical sections, they would accidentally configure the bridge DMA
channel instead of their own.
Increase the delay in spid and make the bridge code more resilient
against the EVSYS ISR preempting the DMA ISR when the bridge gets a new
transaction before finishing processing the last one.

There could be a better solution with more invasive changes to tell the
SoC when it is allowed to poll. This fix doesn't guarantee correctness,
and costs performance, but substantially improves stability when using
all three bridge pipes simultaneously.
@rwaldron
Copy link
Contributor

I pulled this locally, rebuilt and flashed with dfu-util. I then ran the following script, which wants to poll all available analog inputs every 100ms (this is relatively basic compared to what user programs might expect to be able to accomplish).

"use strict";

const tessel = require("tessel");
const MAX_INTERVAL_MS = 100;

function constrain(value, low, high) {
  return Math.max(Math.min(value, high), low);
}

function timeout(a, b) {
  const diff = MAX_INTERVAL_MS - (b - a);
  return constrain(diff, 0, MAX_INTERVAL_MS);
}

function analogRead(port, pin, callback) {
  let start = Date.now();
  const poll = _ => {
    tessel.port[port].pin[pin].analogRead((error, data) => {
      const now = Date.now();
      callback(data);
      setTimeout(poll, timeout(start, now));
      start = now;
    });

    tessel.port[port].sock.write(new Buffer([24, pin]));
  };

  poll();
}

analogRead("A", 4, value => {
  console.log("A4", value);
});

analogRead("A", 7, value => {
  console.log("A7", value);
});

analogRead("B", 2, value => {
  console.log("B2", value);
});

analogRead("B", 3, value => {
  console.log("B3", value);
});

analogRead("B", 4, value => {
  console.log("B4", value);
});

analogRead("B", 5, value => {
  console.log("B5", value);
});

analogRead("B", 6, value => {
  console.log("B6", value);
});

analogRead("B", 7, value => {
  console.log("B7", value);
});

The result is consistent failure:

rwaldron in ~/robotz/t2-analog-read-pin-number
$ t2 run polling-all-analog-inputs.js
INFO Looking for your Tessel...
INFO Connected to ash.
INFO Building project.
INFO Writing project to RAM on ash (4.096 kB)...
INFO Deployed.
INFO Running polling-all-analog-inputs.js...
A4 0.871337890625
A7 0.73974609375
/usr/lib/node/tessel-export.js:299
          throw new Error('Received unexpected response with no commands pending: ' + byte);
          ^

Error: Received unexpected response with no commands pending: 132
    at null.<anonymous> (/usr/lib/node/tessel-export.js:299:17)
    at emitNone (events.js:67:13)
    at Socket.emit (events.js:166:7)
    at emitReadable_ (_stream_readable.js:419:10)
    at emitReadable (_stream_readable.js:413:7)
    at readableAddChunk (_stream_readable.js:164:13)
    at Socket.Readable.push (_stream_readable.js:111:10)
    at Pipe.onread (net.js:531:20)
rwaldron in ~/robotz/t2-analog-read-pin-number
$ t2 run polling-all-analog-inputs.js
INFO Looking for your Tessel...
INFO Connected to ash.
INFO Building project.
INFO Writing project to RAM on ash (4.096 kB)...
INFO Deployed.
INFO Running polling-all-analog-inputs.js...
A4 0.864501953125
A7 0.7421875
/usr/lib/node/tessel-export.js:299
          throw new Error('Received unexpected response with no commands pending: ' + byte);
          ^

Error: Received unexpected response with no commands pending: 132
    at null.<anonymous> (/usr/lib/node/tessel-export.js:299:17)
    at emitNone (events.js:67:13)
    at Socket.emit (events.js:166:7)
    at emitReadable_ (_stream_readable.js:419:10)
    at emitReadable (_stream_readable.js:413:7)
    at readableAddChunk (_stream_readable.js:164:13)
    at Socket.Readable.push (_stream_readable.js:111:10)

rwaldron in ~/robotz/t2-analog-read-pin-number
$ t2 run polling-all-analog-inputs.js
INFO Looking for your Tessel...
INFO Connected to ash.
INFO Building project.
INFO Writing project to RAM on ash (4.096 kB)...
INFO Deployed.
INFO Running polling-all-analog-inputs.js...
A4 0.864013671875
A7 0.736083984375
/usr/lib/node/tessel-export.js:299
          throw new Error('Received unexpected response with no commands pending: ' + byte);
          ^

Error: Received unexpected response with no commands pending: 132
    at null.<anonymous> (/usr/lib/node/tessel-export.js:299:17)
    at emitNone (events.js:67:13)
    at Socket.emit (events.js:166:7)
    at emitReadable_ (_stream_readable.js:419:10)
    at emitReadable (_stream_readable.js:413:7)
    at readableAddChunk (_stream_readable.js:164:13)
    at Socket.Readable.push (_stream_readable.js:111:10)
    at Pipe.onread (net.js:531:20)
rwaldron in ~/robotz/t2-analog-read-pin-number
$ t2 run polling-all-analog-inputs.js
INFO Looking for your Tessel...
INFO Connected to ash.
INFO Building project.
INFO Writing project to RAM on ash (4.096 kB)...
INFO Deployed.
INFO Running polling-all-analog-inputs.js...
A4 0.87060546875
A7 0.742431640625
/usr/lib/node/tessel-export.js:299
          throw new Error('Received unexpected response with no commands pending: ' + byte);
          ^

Error: Received unexpected response with no commands pending: 132
    at null.<anonymous> (/usr/lib/node/tessel-export.js:299:17)
    at emitNone (events.js:67:13)
    at Socket.emit (events.js:166:7)
    at emitReadable_ (_stream_readable.js:419:10)
    at emitReadable (_stream_readable.js:413:7)
    at readableAddChunk (_stream_readable.js:164:13)
    at Socket.Readable.push (_stream_readable.js:111:10)
    at Pipe.onread (net.js:531:20)

If I eliminate port A, and only read port B:

rwaldron in ~/robotz/t2-analog-read-pin-number
$ t2 run polling-all-analog-inputs.js
INFO Looking for your Tessel...
INFO Connected to ash.
INFO Building project.
INFO Writing project to RAM on ash (4.096 kB)...
INFO Deployed.
INFO Running polling-all-analog-inputs.js...
B2 0.8369140625
B3 0.6708984375
B4 0.843505859375
B5 0.704833984375
B6 0.88330078125
B7 0.77197265625
/usr/lib/node/tessel-export.js:299
          throw new Error('Received unexpected response with no commands pending: ' + byte);
          ^

Error: Received unexpected response with no commands pending: 132
    at null.<anonymous> (/usr/lib/node/tessel-export.js:299:17)
    at emitNone (events.js:67:13)
    at Socket.emit (events.js:166:7)
    at emitReadable_ (_stream_readable.js:419:10)
    at emitReadable (_stream_readable.js:413:7)
    at readableAddChunk (_stream_readable.js:164:13)
    at Socket.Readable.push (_stream_readable.js:111:10)
    at Pipe.onread (net.js:531:20)
rwaldron in ~/robotz/t2-analog-read-pin-number
$ t2 run polling-all-analog-inputs.js
INFO Looking for your Tessel...
INFO Connected to ash.
INFO Building project.
INFO Writing project to RAM on ash (4.096 kB)...
INFO Deployed.
INFO Running polling-all-analog-inputs.js...
B2 0.837646484375
B3 0.66943359375
B4 0.843505859375
B5 0.70654296875
B6 0.882080078125
B7 0.775390625
/usr/lib/node/tessel-export.js:299
          throw new Error('Received unexpected response with no commands pending: ' + byte);
          ^

Error: Received unexpected response with no commands pending: 132
    at null.<anonymous> (/usr/lib/node/tessel-export.js:299:17)
    at emitNone (events.js:67:13)
    at Socket.emit (events.js:166:7)
    at emitReadable_ (_stream_readable.js:419:10)
    at emitReadable (_stream_readable.js:413:7)
    at readableAddChunk (_stream_readable.js:164:13)
    at Socket.Readable.push (_stream_readable.js:111:10)
    at Pipe.onread (net.js:531:20)
rwaldron in ~/robotz/t2-analog-read-pin-number
$ t2 run polling-all-analog-inputs.js
INFO Looking for your Tessel...
INFO Connected to ash.
INFO Building project.
INFO Writing project to RAM on ash (4.096 kB)...
INFO Deployed.
INFO Running polling-all-analog-inputs.js...
B2 0.833251953125
B3 0.666748046875
B4 0.837890625
B5 0.70556640625
B6 0.88671875
B7 0.776123046875
/usr/lib/node/tessel-export.js:299
          throw new Error('Received unexpected response with no commands pending: ' + byte);
          ^

Error: Received unexpected response with no commands pending: 132
    at null.<anonymous> (/usr/lib/node/tessel-export.js:299:17)
    at emitNone (events.js:67:13)
    at Socket.emit (events.js:166:7)
    at emitReadable_ (_stream_readable.js:419:10)
    at emitReadable (_stream_readable.js:413:7)
    at readableAddChunk (_stream_readable.js:164:13)
    at Socket.Readable.push (_stream_readable.js:111:10)
    at Pipe.onread (net.js:531:20)

I'm going to adapt these for I2C and run similar tests, but I wanted to report my findings so far

@kevinmehall
Copy link
Member Author

Did you update the SPI daemon on the SoC from this branch?

@rwaldron
Copy link
Contributor

Did you update the SPI daemon on the SoC from this branch?

Honestly, I'm not even sure how to answer that. Here is exactly what I did, step by step:

  1. git fetch upstream
  2. git checkout -b stability upstream/stability && git merge master
  3. make clean; make
  4. Disconnect Tessel; reconnect holding button to enter bootloader mode
  5. dfu-util -aFlash -d 1209:7551 -D build/firmware.bin
  6. When that's done, I'm in the habit of doing a full disconnect/reconnect power cycle, so I did that.
  7. Move to test project and t2 run polling-all-analog-inputs.js; repeat this step as necessary

If there was some other update process that I needed to do, then I can say confidently that I did not.

@kevinmehall
Copy link
Member Author

Just tested your code on my Tessel and saw it fail the same way. Looking at the code for ANALOG_READ, it's blocking waiting for the sample in port_begin_cmd, which will make that completion handler take way too long, even with these changes. That needs to be made to use interrupts.

spid build instructions are here. scp the resulting binary to Tessel, then t2 root and mv it over /usr/bin/spid, and /etc/init.d/spid restart. Or, point the OpenWrt build to this commit, and build a full image.

@rwaldron
Copy link
Contributor

Ok, will do that and follow up with I2C version of the tests

@johnnyman727
Copy link
Contributor

@kevinmehall just to be clear - you're suggesting that we need to make the ADC read in the SAMD21 enable an interrupt and start the read, then service the handler later rather than have a synchronous call, correct?

@rwaldron
Copy link
Contributor

rwaldron commented Aug 26, 2016

@kevinmehall after updating the spid as you mention above, I've run the following script to test I2C on both ports:

"use strict";
const tessel = require("tessel");
const MAX_INTERVAL_MS = 100;

function constrain(value, low, high) {
  return Math.max(Math.min(value, high), low);
}

function timeout(a, b) {
  const diff = MAX_INTERVAL_MS - (b - a);
  return constrain(diff, 0, MAX_INTERVAL_MS);
}

function i2cRead(port, address, register, length, callback) {
  var start = Date.now();
  var i2c = new tessel.port[port].I2C(address);

  var poll = _ => {
    i2c.transfer(register, length, (error, rx) => {
      var now = Date.now();
      callback(rx);

      setTimeout(poll, timeout(start, now));
      start = now;
    });
  };
  poll();
}

var peripherals = [
  // Read the temperature coefficients from a BME280
  { port: "A", address: 0x77, register: 0x88, length: 24 },
  // Read 3 axis compass data from HMC5983
  // (Expect nonesense values because the unit isn't being
  // configured prior to the read. We just want data anyway...)
  { port: "B", address: 0x1E, register: 0x03, length: 6 },
];

peripherals.forEach(p => {
  i2cRead(p.port, p.address, p.register, p.length, buffer => {
    console.log(`i2cRead (${p.port}, ${p.address}, ${p.register}, ${p.length}):
      <${Array.from(buffer, v => ('00' + v.toString(16)).slice(-2)).join(", ")}>
    `);
  });
});

It's been running now for a very long time (I forgot to check the exact time that I started it)

@kevinmehall
Copy link
Member Author

@johnnyman727 Correct

@joaomdmoura
Copy link

❤️ 👍 looking forward to see this getting merged

@johnnyman727
Copy link
Contributor

Working on ADC interrupts here: #207

@rwaldron
Copy link
Contributor

rwaldron commented Sep 2, 2016

@kevinmehall heads up, @johnnyman727 and I just discovered that my test code above had an "artifact" within that was causing it to fail with this patch. The line:

    tessel.port[port].sock.write(new Buffer([24, pin]));

Was causing duplicate ANALOG_READ commands to be sent, which was the direct cause of failure. To explain why that even existed, there was an earlier version of this test code which looked like this:

tessel.port[port].sock.write(new Buffer([24, pin]));
tessel.port[port].enqueue({
  size: 2,
  callback: (error, data) => {
   //...
  }
});

Whenever I changed it back to analogRead, I clearly forgot/overlooked/whatever the write. We've made the correction to the test code and the failures are eliminated.

@rwaldron
Copy link
Contributor

rwaldron commented Sep 2, 2016

LGTM! @johnnyman727 land at will!

@johnnyman727 johnnyman727 merged commit fa5c31a into master Sep 2, 2016
@rwaldron
Copy link
Contributor

rwaldron commented Sep 2, 2016

Also, @kevinmehall I just want to say thanks and great work!

@joaomdmoura
Copy link

👏 👏 👏 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants