WARP Project Forums - Wireless Open-Access Research Platform

jyhng · 2009-Apr-17 20:35:19

Hi,

According to the changelog for v12.0, you guys have stopped using interrupts. What's the reason for using registers instead of interrupts?

Thanks,
Joshua

murphpo · 2009-Apr-18 17:45:08

We had two reasons. The first was performance, namely the latency between a PHY event and the MAC code's reaction to it. Servicing an interrupt requires a few levels of function calls that increased this latency. The second reason was debuggability. In refdes11, the MAC code would (very) occasionally fall into a state where it stopped processing interrupts. We traced the problem to an ISR never returning. The bizarre part was this occurred even when interrupts were supposedly disabled. Removing interrupts completely got rid of this, and makes debugging easier in general.

vick · 2009-Apr-20 16:32:40

Hi Patrick,

We're seeing a similar problem right now, being more pronounced when we have high packet rates through the board between the ethernet interface -> wireless interface, and vice-versa.

When you say you 'traced the problem to an ISR never returning', did you manage to narrow it down to a specific ISR/device (e.g., radio_rx or ethernet_tx) or does it happen with similar likelihood on any ISR/device?

Regards,
vick

murphpo · 2009-Apr-20 20:56:47

I mostly observed the debugging; Chris did the hard parts (and can fill in anything I get wrong).

We used the software debugger in the Xilinx SDK. When the node got "stuck" (when it stopped processing any interrupts), the debugger claimed the processor was stalled in the interrupt vectors, in the code for the PowerPC's non-critical exception. For reasons we never figured out, the interrupt controller was constantly asserting this signal (the xps_intc drives the PPC non-critical exception input), even when none of the peripheral interrupts were asserted. We double checked our use of the intc's driver, but couldn't find any instance of an interrupt not being cleared, or anything else to explain the exception being constantly asserted.

Now, it's entirely possible (likely, even) the error really is in our use of the intc driver and not the intc itself. But after a lot of debugging, we decided it was worth the hassle of recoding things to use polling.

vick · 2009-Apr-21 05:09:26

Hi Patrick,

Thanks for the explanation. This sounds extremely similar to what we're seeing here too.

When our node gets "stuck" with packets flying through in both directions, the Xilinx SDK debugger (frequently) shows the sequence: main -> XEmac_FifoSend -> XEmac_FifoSend -> XAssert. While I'm still figuring out how to interpret the debugger output, I don't think the double XEmac_FifoSend call should occur and so I began suspecting the plb_emac pcore/driver of doing something strange. However, your alternative view that the problem could be interrupt-controller related could be equally (more?) likely and also explain why we end up at an XAssert.

We've also started investigating a polling-mode alternative. In our case, the peripherals still generate interrupts. However, the IC does not generate interrupts in turn to the CPU (as it usually would). Instead, we run a tight 'while (1)' loop and scan the IC status register to determine which/if any peripherals have triggered an interrupt and service that. Is that you approach to poll-mode as well or have you an alternative method?

Regards,
vick

murphpo · 2009-Apr-21 07:47:40

Interesting. The stack trace for a "stuck" node in our case didn't have the EMAC calls in it (are you using XPS v9.1? That looks like a plb_ethernet call). This trace suggests one of the assertions in XEMmac_FifoSend() fails. I don't know exactly what happens when an assertion fails- maybe a critical exception that halts the processor? Here are the assertions (from $EDK91/sw/XilinxProcessorIPLib/drivers/emac_v1_01_a/src):

Code:

XASSERT_NONVOID(InstancePtr != XNULL);
XASSERT_NONVOID(BufPtr != XNULL);
XASSERT_NONVOID(ByteCount > XEM_HDR_SIZE);   /* send at least 1 byte */
XASSERT_NONVOID(InstancePtr->IsReady == XCOMPONENT_IS_READY);

If the function is getting called twice but with bogus arguments the second time, it might explain the halt.

In our polling-only design (refdes v12.1), we poll each peripheral (PHY, Timer, User I/O) explicitly. We removed the xps_intc core completely (probably unnecessary, but definitely satisfying). The underlying pcores still have their interrupt outputs. They're just floating in this version, and we poll their register equivalents.

vick · 2009-Apr-21 08:45:13

Hi again,

Sorry for not giving you any background on the setup earlier, but yes your assumptions are right; we're using XPS v9.1 with plb_ethernet.

As I mentioned, we don't seem to have problems with uni-directional traffic. Either XEmac_FifoSend or XEmac_FifoRecv running on its own doesn't seem to be a problem (we're testing with an external PC's UDP traffic-generator application on). The hang only occurs when we generate two independent UDP streams, going through the board in opposite directions (the setup is PC-board-air-board-PC, and each PC fires a UDP stream at the other through this chain).

Thus, the suspicion was that in the midst of XEmac_FifoSend, an incoming ethernet frame arrives causing XEmac_FifoRecv to be called from the emac ISR. Returning from the ISR back to XEmac_FifoSend, 'something got confused', resulting in it getting stuck. Or at least something akin to this is happening. However, this is still just speculation since we haven't been able to nail down the cause with more certainty.

Thanks for the help. If we don't figure this out soon, perhaps we'll start investingating the polling-option more closely.

Regards,
vick

WARP Project Forums - Wireless Open-Access Research Platform

#1 2009-Apr-17 20:35:19

Reference Design v12.0 Interrupts

#2 2009-Apr-18 17:45:08

Re: Reference Design v12.0 Interrupts

#3 2009-Apr-20 16:32:40

Re: Reference Design v12.0 Interrupts

#4 2009-Apr-20 20:56:47

Re: Reference Design v12.0 Interrupts

#5 2009-Apr-21 05:09:26

Re: Reference Design v12.0 Interrupts

#6 2009-Apr-21 07:47:40

Re: Reference Design v12.0 Interrupts

Code:

#7 2009-Apr-21 08:45:13

Re: Reference Design v12.0 Interrupts

Board footer