WARP Project Forums - Wireless Open-Access Research Platform

You are not logged in.

#1 2019-Jan-14 11:02:16

hubmun
Member
Registered: 2017-May-09
Posts: 20

DRE error when using both Ethernet ports

Hey,

I'm currently using the 802.11 reference design and changed it in a way that I use both Ethernet ports to receive and transmit packets to/from radio transmissions. That works good in general (ETH A is interrupt-driven as per default, ETH B is polled frequently for receptions) but sometimes throws an error that the Data Realignment Engine (DRE) is not available for Port A to transmit a packet over Ethernet. The exact output is:

"Error set buf addr 90005178 with 0 and FFFFFFF, 90005178"

It is thrown here within the wlan_platform_ethernet_send function:

int XAxiDma_BdSetBufAddr(XAxiDma_Bd* BdPtr, u32 Addr)
{
    u32 HasDRE;
    u8 WordLen;

    HasDRE = XAxiDma_BdRead(BdPtr, XAXIDMA_BD_HAS_DRE_OFFSET);
    WordLen = HasDRE & XAXIDMA_BD_WORDLEN_MASK;

    if (Addr & (WordLen - 1)) {
        if ((HasDRE & XAXIDMA_BD_HAS_DRE_MASK) == 0) {
            xil_printf("Error set buf addr %x with %x and %x,"
            " %x\r\n",Addr, HasDRE, (WordLen - 1),
            Addr & (WordLen - 1));

            return XST_INVALID_PARAM;
        }
    }

    XAxiDma_BdWrite(BdPtr, XAXIDMA_BD_BUFA_OFFSET, Addr);

    return XST_SUCCESS;

}


I'm trying to understand what the DRE actually is and why this error occurs sometimes - is it a resource shared between both ETH ports and the error occurs if one of them has the DRE and the other one would need it?

BR

Hubertus

Offline

 

#2 2019-Jan-14 11:41:56

murphpo
Administrator
From: Mango Communications
Registered: 2006-Jul-03
Posts: 5159

Re: DRE error when using both Ethernet ports

The axi_dma core optionally implements the data-realignment engine to support packet buffer descriptors which point to memory addresses that are not aligned to 8 bytes (i.e. non-64-bit-aligned addresses). The reference code relies on this in various places; for example when creating an Ethernet packet for a wireless reception, the Ethernet DMA accesses the Rx packet buffer directly and must be able to use packets at arbitrary offsets.

The two axi_dima instances in the 802.11 Ref Design for WARP v3 both include the DRE logic (via the 'C_INCLUDE_MM2S_DRE' and 'C_INCLUDE_S2MM_DRE' params of the two axi_dma instances in system.mhs). I'm not sure why the axi_dma driver would conclude the DRE was not implemented; you'll have to dig into the C code to determine why the 'HasDRE & XAXIDMA_BD_HAS_DRE_MASK' is failing despite both cores including DRE logic.

Offline

 

#3 2019-Jan-14 13:59:54

hubmun
Member
Registered: 2017-May-09
Posts: 20

Re: DRE error when using both Ethernet ports

Hey Murphpo,

Thanks for your reply! Do you have any idea where to start testing or checking for what causes the hasDRE to be 0?Is it somehow possible that the parallel usage of ETH A and B causes this issue or are they fully separate from each other?

BR

Hubertus

Last edited by hubmun (2019-Jan-14 14:00:37)

Offline

 

#4 2019-Jan-14 15:18:40

murphpo
Administrator
From: Mango Communications
Registered: 2006-Jul-03
Posts: 5159

Re: DRE error when using both Ethernet ports

My best guess is your custom C code is corrupting fields in the Eth DMA Buffer Descriptor (BD) data structure. You can trace the XAXIDMA_BD_HAS_DRE_OFFSET field in the BD all the way back to the XPAR_ETH_A_DMA_INCLUDE_MM2S_DRE/XPAR_ETH_A_DMA_INCLUDE_S2MM_DRE macros defined in xparameters.h. In the reference hardware project the MM2S/S2MM_DRE macros are set for both Eth DMAs. You can verify this in your project's xparameteters.h (wlan_bsp_cpu_high/mb_high/include/).

Assuming your xparameters.h has all four _DRE macros set to 1, next step would be to verify that the BDs have the HAS_DRE flag set immediately after the BD is created by XAxiDma_BdRingCreate(). The DMA driver and our MAC C will not deassert this flag. If the flags are set in all BDs after BdRingCreate(), but are de-asserted later when the BDs are used, it's most likely some other code is improperly writing BD memory.

Offline

 

#5 2019-Jan-15 07:06:51

hubmun
Member
Registered: 2017-May-09
Posts: 20

Re: DRE error when using both Ethernet ports

Ok, so you don't think that the AXI_DMA block might set this value to zero, due to an error or similar? In the xparameteters.h all values are set correctly (so as per default) and as I said it works well for quite some time (> 30 mins) before this error is asserted. I've checked that while it works the HasDRE value is set to a non-zero value. I'm having a bit trouble to find any reason why any other part of the code should overwrite the single TX BD that is used in the design for forwarding wireless Rx to Ethernet A.

Maybe as a fix, do you think resetting the BD or the DMA itself might be a solution?

Offline

 

#6 2019-Jan-15 10:23:33

murphpo
Administrator
From: Mango Communications
Registered: 2006-Jul-03
Posts: 5159

Re: DRE error when using both Ethernet ports

Code:

"Error set buf addr 90005178 with 0 and FFFFFFF, 90005178"

Are you certain this was the UART output? I can't figure out how this output is possible, as it implies WordLen = 0x10000000, but that should be impossible for a u8 variable.

The 90005178 address is ok; that's 376 bytes into the fifth Rx packet buffer (0x5178 = 5*4096 + 376).

In a good BD XAxiDma_BdRead(BdPtr, XAXIDMA_BD_HAS_DRE_OFFSET); should return 0x104, so WordLen = 0x04 and WordLen-1 = 0x03. These values make sense for the alignment-checking code that follows.

Maybe as a fix, do you think resetting the BD or the DMA itself might be a solution?

I wouldn't recommend this as the fix. It is unlikely that the DRE bit is the only corruption in the BDs. Whatever is corrupting the BDs is still there and could cause unpredictable behavior in other parts of the DMA driver/hardware.

We'd be happy to help with additional debugging, but we need a better understanding of exactly what changes you've made to the design. Have you modified the hardware design? Have you customized how the AUX BRAM memory is used (this is where the ETH A BDs are stored)? How did you implement the interface between the wireless bridge and the ETH B DMA?

Offline

 

#7 2019-Jan-16 07:18:48

hubmun
Member
Registered: 2017-May-09
Posts: 20

Re: DRE error when using both Ethernet ports

Hey,   

Are you certain this was the UART output? I can't figure out how this output is possible, as it implies WordLen = 0x10000000, but that should be impossible for a u8 variable.

Pretty sure yes :) although i might have forgotten an F in the FFFFFFF string on the output. I think the HasDRE value is anyway more bothering me.

I have not modified the hardware design, no. But i changed quite some things in the CPU high code indeed - but the wlan_platform_ethernet_send(...) function is the same as in the original design.

I had one idea what might cause this issue:

As i said I'm using both Ethernet interfaces to receive and transmit data - frames arriving at ETH B are sometimes depending upon the frame content forwarded to ETH port A (using the same wlan_platform_ethernet_send(...) function as for packets from wireless Rx). So in some situations it might happen that a frame from port ETH B is send out to ETH A - first call of  wlan_platform_ethernet_send(...). During the processing of frames on ETH B (where i reuse functions from the EXP-framework) any interrupts are disabled. Immediately after the processing of a frame from ETH B and forwarding to ETH A is done the interrupts are switched on again. Now it might happen that a wireless Rx interrupt might lead to another call of wlan_platform_ethernet_send(...), quite short after the first one. Do you see any issue if this function is called twice, with only a few tens of microseconds between each other? Could this lead to a BD corruption somehow? I mean they will both use the same BD for this purpose - maybe also the DMA isn't fast enough and might corrupt it?

Offline

 

#8 2019-Jan-16 09:33:31

murphpo
Administrator
From: Mango Communications
Registered: 2006-Jul-03
Posts: 5159

Re: DRE error when using both Ethernet ports

Do you see any issue if this function is called twice, with only a few tens of microseconds between each other? Could this lead to a BD corruption somehow? I mean they will both use the same BD for this purpose - maybe also the DMA isn't fast enough and might corrupt it?

Ah, this is the likely explanation. The axi_dma driver requires BDs be submitted to hardware in the same order they are allocated. The docs are not clear what happens if this is violated. The alloc/submit sequence could definitely be incorrect by the sequence of wlan_platform_ethernet_send() outside an ISR, then an interrupt, then calling wlan_platform_ethernet_send() inside the ISR.

One way test this theory would be to add a global variable to w3_eth.c to check whether wlan_platform_ethernet_send() is ever interrupted and re-entered.

Code:

int reent_flag = 0;

...

int wlan_platform_ethernet_send(u8* pkt_ptr, u32 length) {

	if(reent_flag) {
	  xil_printf("ERROR: re-entered wlan_platform_ethernet_send!\n");
	  while(1) {}
	} else {
	  reent_flag = 1;
	}
...

	reent_flag = 0;
	return 0;
}

It would be interesting to know whether this check asserts at roughly the same interval as you've observed the corrupt BD.

In any case I would suggest disabling interrupts around calls to wlan_platform_ethernet_send() that occur outside an ISR.

Offline

 

#9 2019-Jan-17 02:11:30

hubmun
Member
Registered: 2017-May-09
Posts: 20

Re: DRE error when using both Ethernet ports

Hey, yes that issue I was experiencing at the beginning - it causes this error indeed and much more frequent actually - so I guess interrupts occurred during the processing of wlan_platform_ethernet_send() in an non-ISR - then I put an interrupt blocking around the polling of the ETH B interface in the main while-loop - that helped but the error still occurred as I said after maybe 1 hour of runtime:

Code:

wlan_exp_ip_udp_buffer recv_buffer; //global variable for frames from ETH B

frame_recv_return_size = rapid_eth_recv_frame(ETH_B_MAC, &recv_buffer);
		 if (frame_recv_return_size > 0){
			current_state = wlan_mac_high_interrupt_stop();
			check_frame_from_ETH_B();//function that checks whether the frame from ETH B needs to be transmitted to ETH A; if yes wlan_platform_ethernet_send() is called.
	                eth_free_recv_buffers(ETH_B_MAC, recv_buffer.descriptor, 0x1);
	                wlan_usleep(30); //I introduced this waiting period here to get rid of this error
	                wlan_mac_high_interrupt_restore_state(current_state);
		 }else if(frame_recv_return_size == -1){
			 xil_printf("Error at ETH B\n\r");
		 }
		 wlan_usleep(20);//not sure if that is needed

As you can see I then introduced this wlan_usleep command after the check_frame_from_ETH_B() call - this actually helps somehow as now the error does appear much later (less often?), maybe after 3 hours of runtime.

Do you think I should add another wlan_usleep before the call of check_frame_from_ETH_B()?

Or do you see a possibility that this wlan_platform_ethernet_send() within the non-ISR check_frame_from_ETH_B() is still interrupted by a radio rx in a rare case?

BR

Hubertus

Last edited by hubmun (2019-Jan-17 03:58:42)

Offline

 

#10 2019-Jan-17 08:52:42

hubmun
Member
Registered: 2017-May-09
Posts: 20

Re: DRE error when using both Ethernet ports

Hey,

I was just testing you suggestion using the reent_flag. It is indeed happening (so UART output is ERROR: re-entered wlan_platform_ethernet_send()) although I don't understand why, because as you can see I disabled interrupts before entering the critical part in the non-ISR routine. Do you have any idea why still the function might be called twice?

BR

Hubertus

Offline

 

#11 2019-Jan-17 14:22:25

murphpo
Administrator
From: Mango Communications
Registered: 2006-Jul-03
Posts: 5159

Re: DRE error when using both Ethernet ports

You'll have to be more specific about how behaviors changed. After adding the reent_flag code, does the reent_flag error occur immediately? Or does it occur rarely (i.e. on the same ~30 minute time scale the as the XAxiDma_BdSetBufAddr error you first described)?

When an error occurs is it always in XAxiDma_BdSetBufAddr? And when this error occurs is the value of Addr always in the 0x90000000 address block (Rx packet buffers)? One possibility is that the error occurs in a different call to XAxiDma_BdSetBufAddr.

What context calls rapid_eth_recv_frame() (i.e. is it called inside an ISR or via the main() poll of wlan_exp)?

Offline

 

#12 2019-Jan-18 04:37:45

hubmun
Member
Registered: 2017-May-09
Posts: 20

Re: DRE error when using both Ethernet ports

Hey,

sorry if that was not clear in my last reply.

After adding the reent_flag code, does the reent_flag error occur immediately? Or does it occur rarely (i.e. on the same ~30 minute time scale the as the XAxiDma_BdSetBufAddr error you first described)?

It happens like it did initially, so after a certain runtime of maybe 30 mins or more. As I used the block of code you suggested below the board then of course hung up in the while(1) loop after printing the message on the UART output.

When an error occurs is it always in XAxiDma_BdSetBufAddr? And when this error occurs is the value of Addr always in the 0x90000000 address block (Rx packet buffers)? One possibility is that the error occurs in a different call to XAxiDma_BdSetBufAddr.

Yes it is (at least the UART output matches the pattern from that function). The address is always in the 0x90000000 block, so it is indeed a valid rx packet buffer. I think the reent_flag code output seems to be a hint that it is indeed the wlan_platform_ethernet_send() where BdSetBufAddr is called. Nevertheless I guess (although I haven't checked it yet) that the error is actually produced in  status  = XAxiDma_BdRingAlloc(tx_ring_ptr, 1, &cur_bd_ptr); if the single tx bd is already used then NULL is returned, but this is not checked in the program, only after XAxiDma_BdSetBufAddr has been called.

What context calls rapid_eth_recv_frame() (i.e. is it called inside an ISR or via the main() poll of wlan_exp)?

The block of code below is running in the while(1) loop in the main-function of CPU high:

Code:

while(1){
      frame_recv_return_size = rapid_eth_recv_frame(ETH_B_MAC, &recv_buffer);
		 if (frame_recv_return_size > 0){
			current_state = wlan_mac_high_interrupt_stop();
			check_frame_from_ETH_B();
	                eth_free_recv_buffers(ETH_B_MAC, recv_buffer.descriptor, 0x1);
	                wlan_usleep(30); //I introduced this waiting period here to get rid of this error
	                wlan_mac_high_interrupt_restore_state(current_state);
		 }else if(frame_recv_return_size == -1){
			 xil_printf("Error at ETH B\n\r");
		 }
		 wlan_usleep(20);//not sure if that is needed
}

Of course the  rapid_eth_recv_frame(ETH_B_MAC, &recv_buffer) is called out of the interrupt-save area, but it should only work on ETH B resources (bds and rings). And it does not call the wlan_platform_ethernet_send() function for sure.

I was now testing the code again using the following block in wlan_platform_ethernet_send() to avoid any re-calling of wlan_platform_ethernet_send() and without any wlan_usleep(30) after finishing eth_free_recv_buffers(ETH_B_MAC, recv_buffer.descriptor, 0x1):

Code:

	if(reent_flag) {
		reent_flag = 0;
		reent_counter++;
		return 0;
    	} else {
    	  reent_flag = 1;
    }

The reent-counter shows that during a long-run test of several hours this issue happened 6 times but in general the problem is resolved (although now 1 packet from radio rx is dropped every time the reent event occurs).


BR

Hubertus

Offline

 

#13 2019-Jan-18 11:20:14

murphpo
Administrator
From: Mango Communications
Registered: 2006-Jul-03
Posts: 5159

Re: DRE error when using both Ethernet ports

Thank you for the extra info. I'm baffled by the behavior you're observing.

One remote possibility is that wlan_platform_ethernet_send() is returning early via one of the return -1; calls, but these all have associated xil_printf() outputs. I'm guessing you don't see any of those "ERROR: " prints?

Any chance you changed num_tx_bd = 1; to >1 in w3_wlan_platform_ethernet_init()? The rest of  the code assumes exactly 1 Tx BD for ETH A.

I think the reent_flag code output seems to be a hint that it is indeed the wlan_platform_ethernet_send() where BdSetBufAddr is called. Nevertheless I guess (although I haven't checked it yet) that the error is actually produced in  status  = XAxiDma_BdRingAlloc(tx_ring_ptr, 1, &cur_bd_ptr); if the single tx bd is already used then NULL is returned, but this is not checked in the program, only after XAxiDma_BdSetBufAddr has been called.

This is a good theory - our code should check the status of XAxiDma_BdRingAlloc() to avoid referencing a NULL pointer. If this is occurring, the XAxiDma_BdSetBufAddr() function is actually looking at a memory region near 0x0 and not at a corrupt BD.

Offline

 

#14 2019-Jan-21 09:28:47

hubmun
Member
Registered: 2017-May-09
Posts: 20

Re: DRE error when using both Ethernet ports

Hey,

One remote possibility is that wlan_platform_ethernet_send() is returning early via one of the return -1; calls, but these all have associated xil_printf() outputs. I'm guessing you don't see any of those "ERROR: " prints?

Yes your are right, at first there would have been other UART outputs and second I also used reent_flag = 0; before every possible return -1; due to any error condition.

Any chance you changed num_tx_bd = 1; to >1 in w3_wlan_platform_ethernet_init()? The rest of  the code assumes exactly 1 Tx BD for ETH A.

No it is still num_tx_bd = 1.

Do you have ever experienced any issues with he interrupt handling, like e.g. some rare conditions where current_state = wlan_mac_high_interrupt_stop(); does not lead to a total stopping of all interrupts or specifically the Ethernet interrupt (like an undetected situation where it is not allowed to stop them or they will be re-enabled due to some special circumstances) ?

BR

Hubertus

Offline

 

#15 2019-Jan-22 15:53:08

murphpo
Administrator
From: Mango Communications
Registered: 2006-Jul-03
Posts: 5159

Re: DRE error when using both Ethernet ports

Do you have ever experienced any issues with he interrupt handling, like e.g. some rare conditions where current_state = wlan_mac_high_interrupt_stop(); does not lead to a total stopping of all interrupts or specifically the Ethernet interrupt (like an undetected situation where it is not allowed to stop them or they will be re-enabled due to some special circumstances) ?

No, we've always observed consistent behavior from the interrupt controller and handling of interrupts by its driver.

I've been trying to reproduce this behavior with a design modified to mimic the code you're using. So far I have not observed any interrupted/re-entered calls to wlan_platform_ethernet_send().

Can you try one more experiment? Modify wlan_mac_high_interrupt_stop() in wlan_mac_high.c to remove the if() before calling XInc_Stop(), like:

Code:

	interrupt_state_t curr_state = interrupt_state;
	XIntc_Stop(&InterruptController);
	interrupt_state = INTERRUPTS_DISABLED;
	return curr_state;

It's a (very weak) theory, but that if is the one place I see that a call to wlan_mac_high_interrupt_stop() could skip the call to XIntc_Stop(), so if somehow the InterruptController.IsStarted flag became inconsistent with the actual intc hardware, interrupts could be left enabled after that function runs (like I said, it's a very weak theory).

Offline

 

#16 2019-Jan-23 08:30:14

hubmun
Member
Registered: 2017-May-09
Posts: 20

Re: DRE error when using both Ethernet ports

Sure - I will test it and let you know (probably beginning of next week). Thanks!

Offline

 

#17 2019-Jan-28 02:44:56

hubmun
Member
Registered: 2017-May-09
Posts: 20

Re: DRE error when using both Ethernet ports

Hey murphpo,

I modified the wlan_mac_high_interrupt_stop() function the following way:

inline interrupt_state_t wlan_mac_high_interrupt_stop(){
    interrupt_state_t curr_state = interrupt_state;
    u32 tmp_is_ready = InterruptController.IsReady;
    u32 tmp_is_started = InterruptController.IsStarted;

    if(InterruptController.IsReady && InterruptController.IsStarted){
        XIntc_Stop(&InterruptController);
    }
    else{
        XIntc_Stop(&InterruptController);
        xil_printf("Ready: %d, Started: %d, interruptstate: %d\n\r", tmp_is_ready,tmp_is_started, interrupt_state);
    }
    interrupt_state = INTERRUPTS_DISABLED;
    return curr_state;
}

On the UART I get quite regular the output:

Ready: <high integer, never 0> , Started: 0, interruptstate: 1

This output appears maybe every 10s.

and at some point I get:

Ready: <high integer, never 0> , Started: 0, interruptstate: 0

and after this appeared once this then appears very frequently (more than 1 per second) which obviously destroys the runtime behavior, so as well stops the whole packet processing on the boards.

Actually I don't really know how to interpret that output... do you have an idea?

BR

Hubertus

Offline

 

#18 2019-Jan-28 07:39:31

murphpo
Administrator
From: Mango Communications
Registered: 2006-Jul-03
Posts: 5159

Re: DRE error when using both Ethernet ports

I would expect this code to print a lot, as there are nested stop/restore calls in the MAC code (that's why we use stop/restore, not stop/start). I'm more curious if a call to wlan_mac_high_interrupt_stop() ever fails to actually stop the interrupt controller because of inconsistent state between the MAC software and intc driver. I've tried this with the (otherwise unmodified) reference code and saw no change in behavior.

Offline

 

Board footer