source: ReferenceDesigns/w3_802.11/python/wlan_exp/docs/source/log_overview.rst

Last change on this file was 6320, checked in by chunter, 5 years ago

1.8.0 release wlan-exp

File size: 15.8 KB
RevLine 
[6320]1.. _log_overview:
2
3.. include:: globals.rst
4
5##################
6Event Log Overview
7##################
8
9The 802.11 Reference Design implements a logging framework which records any
10user-specified event in the nodes of an experimental network. The 802.11
11Reference Design implements many useful log entry types, including Tx packets,
12Rx packets and low-level MAC re-transmissions. Users can create additional
13entry types to suit their research application.
14
15The basic flow for using log data in an experiments is:
16
17#. Retrieve log data from one or more 802.11 Reference Design nodes
18#. Generate an index of each node's log data
19#. Filter the index to select the required subset of log entry types
20#. Convert the log data and filtered index into structured arrays of log entries
21#. Process the log entries to calculate the statistics required for the experiment
22
23Log data retrieval (step 1) is implemented in the ``log_get_all_new()`` method.
24
25Log index generation and filtering (steps 2-3) and entry processing (steps 4-5) are described below.
26
27
28**********
29Components
30**********
31
32Log Data
33========
34
35The term ``log_data`` refers to a `bytearray` of raw log data retrieved from an
36802.11 Reference Design Node. The ``log_data`` is a tightly packed array of log
37entries, each composed of an entry header and arbitrary entry payload. The array
38of log data is a byte-for-byte copy of the log data retrieved from the node's
39DRAM.
40
41The log data format is documented in the :ug:`user guide <wlan_exp/log>`.
42
43In Python scripts log data is retrieved using the log methods implemented in
44`wlan_exp.node`.
45
46
47Raw Log Index
48=============
49
50Log data can be quite large, often many gigabytes for a long trial. Re-parsing
51the full log data array to summarize its contents would be expensive. A more
52efficient approach is to generate an index describing the contents of each log
53data array at the time the array is retrieved, then save this index with the
54log data for easier processing in the future.
55
56We call this index the **raw log index**. The raw log index contains the
57location of each log entry in the log data and the entry type ID of that entry.
58
59For example, consider the log data array illustrated below.
60
61.. figure:: _images/wlan_exp_log_layout.png
62    :align: center
63    :alt: Log data example
64   
65    Example 112-byte log data array with 5 entries of 3 different entry types.
66
67The blue areas show the log entry headers. Each header starts with a delimiter
68value, followed by a sequence number, the log entry type ID and the log entry
69length. Following the header is the log entry payload itself, illustrated as
70red, green and yellow here.
71
72This log data contains 5 log entries of 3 distinct types:
73
74* Two entries of type ID 10 (red) at byte offsets 8 and 88
75* Two entries of type ID 214 (green) at byte offsets 36 and 76
76* One entry of type ID 3 (yellow) at byte offset 56
77
78The raw log index for this log data would be the dictionary::
79
80    {10:  [8, 88],
81     214: [36, 76],
82     3:   [56]
83    }
84
85In actual experiments the log data and corresponding index will be **much** 
86larger. We have successfully tested these tools on log data with tens of
87millions of entries (on a 64-bit machine, of course).
88
89.. note::  Notice that the dictionary keys are integer entry type IDs. This is
90    by design, as it allows the raw log index to be generated using only the
91    log data itself, with no dependence on the formats of the log entries
92    themselves. The integer IDs will be translated into names in the log index
93    filtering step, described below.
94
95
96Tools
97=====
98
99The ``log.util.gen_raw_log_index(log_data)`` method will read a raw log data
100array and generate the log data index.
101
102The ``log.util_hdf.log_data_to_hdf5(log_data, filename)`` method will by
103default create and save the raw log index when saving log data to an HDF5 file.
104To disable the creation of the raw log index, pass ``gen_index=False`` to the
105command.
106
107The ``log.util_hdf.hdf5_to_log_index(filename)`` method will return the raw
108log index previously saved to an HDF5 file. 
109
110
111Archiving Log Data
112==================
113
114Log data retrieved from an 802.11 Reference Design node will initially be
115stored in RAM as a bytearray. In most experiments it is useful to write the
116log data to a file for archival and future processing.
117
118We recommend storing log data in `HDF5 files <http://www.hdfgroup.org/HDF5/>`_ 
119using the `h5py package <http://docs.h5py.org/en/latest/index.html>`_. The HDF5
120format is open, fast, well documented and supported by a wide variety of tools.
121
122
123HDF5 Log Data Format
124--------------------
125
126The HDF5 format is built from two types of objects:
127
128* **Dataset** - an array of homogenous data with arbitrary dimensions
129* **Group** - a named level of hierarchy which can contain datasets and other groups
130
131Datasets and groups can also store **attributes**. Datasets and attributes
132retain their data types and dimensions when written to HDF5 files. The h5py
133package uses numpy arrays and datatypes as the Python interface to the
134underlying HDF5 data.
135
136One important concept is the **root group**. Every HDF5 file has a root group
137named ``'/'``. Named datasets and groups can be added to the root group to
138build more complex hierarchy. Sub-groups have names, forming Unix-like paths
139to datasets and other groups, always starting with the root group ``'/'``.
140
141The h5py package supports building HDF5 files with arbitrary hierarchy. We
142define a simple HDF5 hierarchy for storing 802.11 Reference Design log data in
143an HDF5 group. We call this group format a ``wlan_exp_log_data_container``.
144When an HDF5 group is used as a ``wlan_exp_log_data_container`` it must have
145the format illustrated below::
146
147    wlan_exp_log_data_container (HDF5 group):
148           |- Attributes:
149           |      |- 'wlan_exp_log'         (1,)      bool
150           |      |- 'wlan_exp_ver'         (3,)      uint32
151           |      |- <user provided attributes>
152           |- Datasets:
153           |      |- 'log_data'             (1,)      voidN  (where N is the size of the data in bytes)
154           |- Groups (optional):
155                  |- 'raw_log_index'
156                         |- Datasets:
157                            (dtype depends if largest offset in raw_log_index is < 2^32)
158                                |- <int>    (N1,)     uint32/uint64
159                                |- <int>    (N2,)     uint32/uint64
160                                |- ...
161
162The elements of this format are:
163
164* ``wlan_exp_log`` attribute: must be present with boolean value True
165* ``wlan_exp_ver`` attribute: 3-tuple of integers recording the
166  `(major, minor, rev)` version of the wlan_exp package that wrote the file
167* ``log_data`` dataset: the raw bytearray retrieved from the 802.11 Reference
168  Design node, stored as a scalar value using the HDF5 opaque type
169* ``raw_log_index`` sub-group (optional): if present, must be a group with one
170  dataset per log entry type, where each dataset contains the array of integers
171  indicating the location of each log entry in the ``log_data``. This
172  group-of-datasets encodes the dictionary-of-arrays normally used to represent
173  the ``raw_log_index``.
174* User provided attributes: additional attributes provided at the time of file
175  creation. The ``log.util_hdf`` methods store these attributes when supplied
176  by the user code. These can be useful to store additional experiment-specific
177  details about the log data (i.e. date/time of the experiment, physical
178  location of the nodes, etc.).
179
180
181Writing Log Data Files
182----------------------
183
184The ``log_data_to_hdf5(log_data, filename)`` method will create an HDF5 file
185with name ``filename`` for the supplied ``log_data`` bytearray. This method
186will automatically generate and store a raw log index for the ``log_data``.
187
188The ``log_data_to_hdf5`` method will create an HDF5 file with a single
189``log_data`` array (i.e. with log data from a single node) stored in the root
190group.
191
192
193Reading Log Data Files
194----------------------
195
196The ``hdf5_to_log_data(filename)`` method will read a ``log_data`` array from
197the HDF5 file named ``filename``. The format of the returned array is identical
198to the bytearray retrieved from an 802.11 Reference Design node and can be used
199wherever the original ``log_data`` array would have been used.
200
201The ``hdf5_to_log_index(filename)`` method will read a raw log index from the
202HDF5 file named ``filename``. The dictionary returned will be identical to
203re-generating the index from scratch (i.e. by calling
204``log.util.gen_raw_log_index(hdf5_to_log_data(filename))``). Retrieving the
205raw log index from an HDF5 file is typically must faster than re-generating
206the index from the log data.
207
208
209Filtered Log Indexes
210====================
211
212In most cases the log data retrieved from a node will contain entries that are
213not required for a particular analysis. User scripts can select a subset of
214entry types for further processing by filtering the raw log index, then
215passing the log data and filtered index to downstream tools for further
216parsing.  Filtering the log index can be much faster than filtering the log
217data itself, especially for multi-gigabyte log data arrays.
218
219Log index filtering is implemented in the ``log.util.filter_log_index(...)`` 
220method.
221
222The ``filter_log_index`` method takes a raw log index, stored as a dictionary,
223as input and produces a new log index, which is also a dictionary. The method
224implements two processes:
225
226* Translation of dictionary keys
227* Selection of a subset of entry types to include in the output dictionary
228
229Entry Type Translation
230----------------------
231
232Raw log indexes use integer entry type IDs as dictionary keys. These IDs are
233taken directly from the log data itself, which allows index generation even if
234the corresponding entry types are not understood by the wlan_exp Python code.
235But remembering these "magic" numbers is inconvenient when building analysis
236scripts.
237
238The ``filter_log_index`` output dictionary uses entry type names as keys
239[#entry_type_names]_.
240
241For example, assume the following log entry type definitions::
242
243    ENTRY_TYPE_RX_OFDM = 10
244    ENTRY_TYPE_RX_DSSS = 11
245    ENTRY_TYPE_TX_HIGH = 20
246
247    entry_rx_ofdm = WlanExpLogEntryType(name='RX_OFDM', entry_type_id=ENTRY_TYPE_RX_OFDM)
248    entry_rx_dsss = WlanExpLogEntryType(name='RX_DSSS', entry_type_id=ENTRY_TYPE_RX_DSSS)
249    entry_tx_high = WlanExpLogEntryType(name='TX_HIGH', entry_type_id=ENTRY_TYPE_TX_HIGH)
250
251    # Entry type fields omitted for clarity - actual field definitions are required!
252
253And a raw log index with multiple instances of each entry type::
254
255    >>> my_raw_log_index
256    {10: [7724, 8116, 8428, 9716],
257     11: [3572, 4468, 6900],
258     20: [144, 336, 528, 720, 912, 1104, 1296, 1488]}
259
260Using the ``filter_log_index`` method to translate the entry type keys will give::
261
262    >>>log_index = filter_log_index(my_raw_log_index)
263    >>>log_index
264    {RX_OFDM: [7724, 8116, 8428, 9716],
265     RX_DSSS: [3572, 4468, 6900],
266     TX_HIGH: [144, 336, 528, 720, 912, 1104, 1296, 1488]}
267
268Notice that the lists of log entry locations are unchanged, only the dictionary
269keys have been replaced. Now this index can be accessed by entry type name::
270
271    >>>log_index['TX_HIGH']
272    [144, 336, 528, 720, 912, 1104, 1296, 1488]
273
274.. [#entry_type_names] Technically, ``filter_log_index`` uses *instances* of
275    the ``wlan_exp.log.entry_types.WlanExpLogEntryType`` class as keys in its
276    output dictionary. The ``wlan_exp.log.entry_types.WlanExpLogEntryType.__repr__`` 
277    method returns the entry type name. The class itself overloads the ``__eq__`` 
278    and ``__hash__`` methods so an instance will "match" its name when the name
279    is used to access a dictionary.
280
281
282Entry Type Filtering
283--------------------
284
285The ``log.util.filter_log_index`` method has three additional arguments which
286are used to construct the output dictionary:
287
288* ``include_only``: List of entry type names to keep in output
289* ``exclude``: List of entry type names to exclude from output
290* ``merge``: Dictionary of entry type names to merge together in output
291
292The filter follows the a few basic rules:
293
294#. If the ``include_only`` argument is present the ``exclude`` argument will be ignored
295#. Every requested output key in the ``include_only`` argument will be present in the output dictionary, even if its list of log entry locations is empty
296#. An instance of the ``wlan_exp.log.entry_types.WlanExpLogEntryType`` class must be previously created for each entry type included in the output
297
298One caveat when using ``merge`` is that, while the underlying data of the entry
299does not change, the offsets to access the data within that entry will now
300follow the merged entry type.  This means that entries should only be merged
301if one is a strict sub-set of another. For example, the ``TX_HIGH`` and the
302``TX_HIGH_LTG`` entries are alomst identical; the ``TX_HIGH_LTG`` entries
303have the same fields in the same order, but have additional fields defined
304after the ``TX_HIGH`` definition ends.  The ``TX_HIGH`` entry is a strict
305sub-set of the ``TX_HIGH_LTG`` entry. Therefore, it is possible to merge
306the ``TX_HIGH_LTG`` and ``TX_HIGH`` entries together and treat all of them
307like ``TX_HIGH`` entries:  ``merge={'TX_HIGH': ['TX_HIGH', 'TX_HIGH_LTG']}``.
308By doing this, all ``TX_HIGH_LTG`` entries will now be processsed as part of
309any processing on ``TX_HIGH`` entries.  However, the opposite merge should not
310be done: ``merge={'TX_HIGH_LTG': ['TX_HIGH', 'TX_HIGH_LTG']}`` since
311``TX_HIGH`` entries do not have the extra LTG fields and will return garbage
312data if used. 
313
314The following code snippets illustrate this include/exclude/merge behavior::
315
316    >>> my_raw_log_index
317    {10: [7724, 8116, 8428, 9716],
318     11: [3572, 4468, 6900],
319     20: [144, 336, 528, 720, 912, 1104, 1296, 1488],
320     21: [10743, 11091]}
321
322    >>> log_index = filter_log_index(my_raw_log_index)
323    >>> log_index
324    {RX_OFDM:     [7724, 8116, 8428, 9716],
325     RX_DSSS:     [3572, 4468, 6900],
326     TX_HIGH:     [144, 336, 528, 720, 912, 1104, 1296, 1488]}
327     TX_HIGH_LTG: [10743, 11091]}
328
329    >>> log_index = filter_log_index(my_raw_log_index, include_only=['TX_HIGH', 'RX_OFDM'])
330    >>> log_index
331    {RX_OFDM: [7724, 8116, 8428, 9716],
332     TX_HIGH: [144, 336, 528, 720, 912, 1104, 1296, 1488]}
333
334    >>> log_index = filter_log_index(my_raw_log_index, exclude=['TX_HIGH', 'TX_HIGH_LTG'])
335    >>> log_index
336    {RX_OFDM: [7724, 8116, 8428, 9716],
337     RX_DSSS: [3572, 4468, 6900]}
338
339    >>> log_index = filter_log_index(my_raw_log_index, include_only=['TX_HIGH', 'RX_OFDM', 'NODE_INFO'])
340    >>> log_index
341    {RX_OFDM: [7724, 8116, 8428, 9716],
342     TX_HIGH: [144, 336, 528, 720, 912, 1104, 1296, 1488],
343     NODE_INFO: []}
344
345    >>> log_index = filter_log_index(my_raw_log_index, include_only=['TX_HIGH'], merge={'TX_HIGH': ['TX_HIGH', 'TX_HIGH_LTG']})
346    >>> log_index
347    {TX_HIGH: [144, 336, 528, 720, 912, 1104, 1296, 1488, 10743, 11091]}
348
349
350
351Processing Log Data
352===================
353
354After log data is retrieved and the log index is generated, there are many
355possible tool flows to parse and process the log entries. A few recommended
356processing flows are described below and implemented in our examples. This is
357not an exhaustive or static list- this list will evolve as we and our users
358find new ways to use data produced by the 802.11 Reference Design logging
359framework.
360
361
362NumPy Structured Arrays
363-----------------------
364
365The `NumPy package <http://www.numpy.org/>`_ provides many tools for
366processing large datasets. One very useful NumPy resource is
367`structured arrays <http://docs.scipy.org/doc/numpy/user/basics.rec.html#structured-arrays-and-record-arrays>`_.
368
369The ``wlan_exp.log.util.log_data_to_np_arrays(log_data, log_index)`` method
370will process a log data array with its corresponding index and return a
371dictionary of NumPy structured arrays. The dictionary will have one key-value
372pair per log entry type in the ``log_index`` dictionary. Each dictionary value
373will be a NumPy structured array.
374
375The names and data types of each field for a log entry type are defined by
376that type's WlanExpLogEntryType instance. The formats for log entry types
377implemented in the 802.11 Reference Design are defined in the
378``wlan_exp.log.entry_types`` module.
379
Note: See TracBrowser for help on using the repository browser.