You are not logged in.
Pages: 1
Hi forum,
I'm running a home-made 802.3 ethernet bridge application based on a combination of modified wlan_mac_high_ibss and wlan_mac_low_nomac (originally discussed in this forum thread). Overall, this bridge application works fine but I've recently discovered an issue with 'quick' transmissions.
At the SRC board, 5 fixed sized 100-byte UDP packets from a PC are received on the ethernet interface and transmitted verbatim over the air (coax). At a second board, DST, these 5 radio transmissions are received and transmitted out its ethernet interface to a destination PC.
I've observed that when these UDP packets arrive 'quickly' at SRC, then I have less than 5 packets received at the destination PC. If I slow the UDP inter-packet time down at the source PC, from e.g., 100us to 400us, then all 5 packets are received OK. Now, if I increase the UDP packet size from, say, 100 bytes to 200 bytes, then I have losses again. And again, if I increase the inter-packet times once more from 400us to 600us, then again all packets are received [N.B.: these byte and usec times are examples and serve only to illustrate].
My conclusion from this observation is that at the transmitter board SRC, wired ethernet packets are "bumping" into previous packets which are in mid-radio transmission. Packet A arrives on ethernet and is being transmitted out the radio. As it is in mid transmission, packet B arrives and "collides" with the outgoing A.
The code snippet where I do this transmission is:
queue_sel = MANAGEMENT_QID; if((qlen = queue_num_queued(queue_sel)) < MAX_TX_QUEUE_LEN){ // Put the packet in the queue enqueue_after_tail(queue_sel, curr_tx_queue_element_p); } else { // Packet was not successfully enqueued xil_printf("Exceed qlen: %d\n", qlen); return 0; }
I've tried different queues (other than MANAGEMENT_QID) but that doesn't help. I also never see the printf() message so I'm not exceeding the queue length with my 5 UDP packets.
I suspect that the problem is the enqueue_after_tail() function is not blocking. It effectively copies the packet for CPU low transmission and returns. It could potentially be called again quickly and repeat this copying - while CPU low hasn't transmitted the first packet.
Ideally, I'd like to call something like:
while (wlan_mac_get_status() & WLAN_MAC_STATUS_MASK_TX_PHY_ACTIVE) {}
either immediately before or after enqueue_after_tail() to block subsequent transmission attempts, but this function (from wlan_mac_nomac.c) is not available to CPU high.
Would anyone have any idea if my interpretation/conclusions are correct? Is there a mechanism in place to send occasional back-to-back received ethernet frames out the radio with the NOMAC application?
--vick
Offline
I think the possible explanations come down to a few things:
1) That the packets aren't actually being emitted from your PC at the rate you expect. You can rule this in or out by having your PC run Wireshark on its Ethernet interface to make sure that you actually see every expected transmission.
2) That packets are being received via Ethernet, but for some reason aren't being transmitted. You can rule this in or out with wlan_exp and retrieving the Tx Packet Counts. Specifically, is "data_num_tx_packets_total" 5 packets larger after your PC sends its 5 packets?
3) That the packets are being transmitted, but are not being successfully received for some reason. You can rule this in or out with a laptop running Wireshark in monitor mode. If the laptop sees every packet as an 802.11 frame, then you know they were all sent and the problem is on the Rx side. Note: I know you said that you are using coax and not going over the air. We've found that most laptops are sensitive enough to pick up the receptions just through leakage near the equipment. It should work fine if you just place your monitor laptop in close proximity to your kits.
Performing those debugging steps should help narrow down where the losses are occurring.
vick wrote:
My conclusion from this observation is that at the transmitter board SRC, wired ethernet packets are "bumping" into previous packets which are in mid-radio transmission. Packet A arrives on ethernet and is being transmitted out the radio. As it is in mid transmission, packet B arrives and "collides" with the outgoing A.
The code snippet where I do this transmission is:Code:
queue_sel = MANAGEMENT_QID; if((qlen = queue_num_queued(queue_sel)) < MAX_TX_QUEUE_LEN){ // Put the packet in the queue enqueue_after_tail(queue_sel, curr_tx_queue_element_p); } else { // Packet was not successfully enqueued xil_printf("Exceed qlen: %d\n", qlen); return 0; }I've tried different queues (other than MANAGEMENT_QID) but that doesn't help. I also never see the printf() message so I'm not exceeding the queue length with my 5 UDP packets.
I suspect that the problem is the enqueue_after_tail() function is not blocking. It effectively copies the packet for CPU low transmission and returns. It could potentially be called again quickly and repeat this copying - while CPU low hasn't transmitted the first packet.
That's not quite right. enqueue_after_tail() manipulates a doubly linked list to insert the queue entry and then calls the application callback that maps on to poll_tx_queues() in the IBSS. That function won't return until a packet has been completely copied into CPU_LOW for transmission. If you keep tracing the function calls down into the framework where the copy is occurring, you can see that the framework waits on the CDMA operation to finish before continuing. You can call poll_tx_queues() as rapidly as you want and you won't step on the previous dequeue.
vick wrote:
Ideally, I'd like to call something like:
Code:
while (wlan_mac_get_status() & WLAN_MAC_STATUS_MASK_TX_PHY_ACTIVE) {}either immediately before or after enqueue_after_tail() to block subsequent transmission attempts, but this function (from wlan_mac_nomac.c) is not available to CPU high.
That behavior isn't right. It's intended that CPU_HIGH can be dequeuing a packet into a Tx packet buffer while a transmission is currently active. That's why there are multiple Tx packet buffers. In the poll_tx_queues() function, the high-level MACs all ask the framework how many Tx packet buffers are currently available to be filled. CPU_HIGH has to keep them filled so that one is ready to go immediately after CPU_LOW finishes a transmission. It's the only way we are able to hit a minimum contention window transmission of a new packet immediately following the completion of a transmission of the previous packet. There just isn't time to dequeue and copy a packet down to CPU_LOW in the short window following a transmission.
It's an intentional design choice for us that CPU_HIGH is largely walled off from the instantaneous actions of the PHY. With very few exceptions, the two processors handle their tasks independently and don't really care what the other is doing at any point in time. Any dependence between them is handled asynchronously with mailbox handshakes. One notable exception to this philosophy is the way beacon transmissions are handled with the AP and DCF. Those are kind of a nightmare because they require their contents being updated live with CPU_HIGH state (like queue occupancy for the Traffic Indication Map), but also have extremely strict timing requirements on when they are supposed to hit the medium (which is the job of CPU_LOW). So that implementation is a hybrid CPU_HIGH/CPU_LOW design that is more complicated than I'd like, but it's the best way to adhere to the standard.
Offline
Hi Chris,
Thanks for the detailed response. Your description of how CPUs HIGH and LOW enqueue and dequeue packets independently of each other is pretty much how I'd have expected the design to work (and was what I gathered when digging through the code as well).
I ran a variant of the 3 points you suggested, just to verify that things are behaving the way I believed them to be.
For 1), I blasted my UDP stream between the two PCs connected via ethernet cable, and confirmed that both transmitter and receiver are able to transmit/receive these microsecond-separated packets (i.e., zero losses at receiver).
I don't have wlan_exp in place for 2), but did a variant of suggestion 3) to achieve a similar result. I used a 3rd WARP board running a custom "RSSI-sampler" design to sniff the (almost private) Wi-Fi channel my SRC and DST boards are communicating on and plotted the results below:
I'm now transmitting five 1118-octet 802.3 frames over the air every 700us (left plot) and every 500us (right plot). The sample points are 100us apart and the vertical axis shows RSSI readings.
The left plot shows five distinct packets. Each packet consists of 3-4 sample points, resulting in a total of ~18 "high" RSSI values. In this case the receiver PC successfully obtained all 5 UDP packets.
The right plot also shows ~18 "high" RSSI values. However, they appear more bunched together. For the measurement on the right the receiver PC only successfully obtained 3 UDP packets (of the 5 transmitted). The 1st, 2nd and 4th packets were successfully received.
Since the number of "high" RSSI points are identical (i.e., 18), I believe that the five packets are indeed being transmitted in both cases. However, the issue now seems to be that they sometimes appear to be concatenated (viewed with my 100us sample-rate resolution) which might be the problem.
After sending a packet out the air, CPU low finds the queue still occupied (due to the rapidly arriving ethernet packets) and so immediately sends out the next packet. Could this process in CPU low be too quick that this second packet appears combined or distorted? Or perhaps the tx-radio needs more time to "fire up" again? The receiver appears to understand the 1st, 2nd and 4th packets, but not the 'appended' 2nd and 5th.
Reminder: I'm running NOMAC on CPU LOW so there are no xIFS involved to give the transmitter any reprieve. So, perhaps the more general question is: how would NOMAC handle a "full-buffer' type transmission scenario?
Thanks,
vick
Offline
Addendum:
I've now modified the handle_tx_pkt_buf_ready() function in wlan_mac_low_nomac/wlan_mac_nomac.c to look like:
int handle_tx_pkt_buf_ready(u8 pkt_buf){ if( wlan_mac_low_prepare_frame_transmit(pkt_buf) == 0 ){ frame_transmit(pkt_buf); wlan_mac_low_finish_frame_transmit(pkt_buf); /* XXX VIX_KLUDGE: Let radio 'rest' a bit before returning; experiments show back-to-back transmissions getting lost at the receiver. N.B.: trial-n-error shows 23us pause OK. Using 30us to be safe. */ wlan_usleep(30); /* XXX */ return 0; } else { return -1; } }
and this setup works much better now. I'm generating bursts of 90 1518-byte 802.3 frames at my sending PC with ~5us inter-packet times, and they're going over the air and being received at the destination PC successfully (Limited to 90 since larger bursts overrun the transmit buffers of course; also works for smaller 100-octet packets).
As the code comment says, it seems to perform OK for usleep(23) and above. Is it possible that this 23us is similar to a (SIFS+slots) value which is why this issue doesn't commonly appear for DCF operation?
As a final note, can anyone suggest a cleaner way of achieving this (back-to-back) transmission "properly"?
--vick
Offline
The problem isn't that the transmissions are too fast in an absolute sense. It's that they are too fast for our receiver in certain SNR regimes (described below). The waveforms that are produced are perfectly valid without that sleep you added. Side note: those elongated RSSI durations from your plots just look that way because the 100us sampling interval was too slow to capture the brief idle time between transmissions. In my experiments, a fully backlogged NOMAC Tx produces back-to-back packets with only 27 usec of idle time between them. I'm actually not sure why you are seeing "bursty" transmissions where sometimes the packets are dequeued back-to-back and other times there is a delay. If you speed up your transmissions even more and send more than 5, do you eventually see a single long RSSI event?
In any case, the issue is actually on the Rx side. I've been able to reproduce the problem only at high Rx powers. How much attenuation do you have between your Tx and Rx over your coax? Our running theory is that this is related to the AGC and/or packet detection. At high powers, it takes the AGC longer to recover back to the maximum gain setting and get ready for the next reception. The amount of time it takes to recover is sufficient for the timings in the DCF, but the aggressive non-standard timings of a NOMAC transmitter are a bit too fast when powers are high. In my tests, I see packet losses with backlogged NOMAC transmissions for Rx powers greater than about -55 dBm (i.e. less than 70dB of attenuation when using the default 15dBm Tx power). Here's some data to back that up:
-55 dBm of Rx Power, MCS 0, NONHT, 1200-byte MAC Payloads
The above plot shows how the Packet Delivery Rate changes according to the LTG interval (i.e. the interval between enqueued packets). The flat region in the Inter-Packet Timing subplot is where the queue is effectively backlogged. In all cases, ~100% of packets were properly delivered.
However,
-25 dBm of Rx Power, MCS 0, NONHT, 1200-byte MAC Payloads
At higher Rx powers, the aforementioned AGC settling issues causes problems for backlogged NOMAC traffic. Only 75% of packets are getting through. I think this is the issue you are seeing.
So, there are two viable solutions:
1) If you aren't already, operate in a lower Rx power regime by adding attenuation.
2) Slow down your transmissions. I wouldn't put the delay in "handle_tx_pkt_buf_ready()". That means that even if a single packet comes down the pipe you are producing an unnecessary delay. You only need a minimum delay between packets. Instead, put the wlan_usleep on Line 288 after the call to wlan_mac_low_send_low_tx_details(). This will realize the same behavior for backlogged traffic, but if the traffic is bursty and the next packet doesn't get dequeued until after the amount of your delay, then no behavior is changed.
Finally, I really recommend you use wlan_exp for this stuff. It's a useful debugging and experiment design tool. The script I wrote to generate the above plots is available here.
Offline
Hi Chris,
Firstly, many thanks for taking the effort of carrying out experiments to replicate and investigate the issue I reported.
To answer your question, I don't see a long transmission even when my tx PC sends 90 1518-octet frames 1us apart. I transmitted 90 back-to-back frames and would expect the tx buffer to be backlogged but from the figure we see about 30 dips. Again, since these dips appear periodic, this could be due to my analyser's 100us resolution coinciding with the 27us idle times which should be there, that results in the figure below perhaps, and wouldn't dwell too long over this.
I'm not sure if I completely understood how to interpret the curves in your post ('ms' axis lables should be 'us'?). However, I understood the reasoning you put forward that the problem could be at the receiver-end relating to the AGC's recovery. I can further support this hypothesis by informing that I managed to reduce losses at my receiver for these fast transmission by lowering my sender's transmit power from 15dBm down to 0dBm, and eliminate losses altogether (for 90-packet bursts) for even lower powers. I'm testing over-the-air transmissions with the boards ~1m apart.
However, of the two solutions you proposed, I've chosen the 2nd as the "lesser of two evils". I've now moved my wlan_usleep() call to your recommended point in the frame_transmit() function (which coincidentally was my initial placing!).
On a more general note, this is a fix which is definitely needed for NOMAC. It has always been impossible to communicate with TCP over NOMAC, and I'd always attributed this to collisions over-the-air due to bidirectional DATA-ACK traffic and no medium-access control. However, it's clear now that the issue is not collisions, but rather these losses which I've reported due to back-to-back transmissions as TCP ramps up.
As an example of this, a 1MB file transfer using 'scp' between two PCs via a pair of boards looks like this with the 1.7.5-RELEASE NOMAC:
demo@terror:~> scp 10.10.129.103:data/1m /dev/null 1m 100% 977KB 11.4KB/s 01:26
With the introduction of the wlan_usec(30) in the frame_transmit() function it now looks like this:
demo@terror:~> scp 10.10.129.103:data/1m /dev/null 1m 100% 977KB 976.6KB/s 00:01
In short, an excruciatingly unusable minute and half vs. 1 second! A 10MB file takes 4 seconds and a 100MB file takes 44 seconds (~2.5MB/s).
Since NOMAC could be used for various (non-DCF/WiFi) research scenarios it would be nice if there was a solution which didn't include manual intervention in future releases, e.g., wlan_usleep(AGC_RECOVERY_USEC) in frame_transmit() which would make it TCP-friendly at least.
In conclusion:
chunter wrote:
Finally, I really recommend you use wlan_exp for this stuff. It's a useful debugging and experiment design tool. The script I wrote to generate the above plots is available here.
I can sense your frustration with my reluctance to use the experimental tools in place - apologies for that!
Thanks!
vick
Offline
Pages: 1