[6320] | 1 | .. _log_overview: |
---|
| 2 | |
---|
| 3 | .. include:: globals.rst |
---|
| 4 | |
---|
| 5 | ################## |
---|
| 6 | Event Log Overview |
---|
| 7 | ################## |
---|
| 8 | |
---|
| 9 | The 802.11 Reference Design implements a logging framework which records any |
---|
| 10 | user-specified event in the nodes of an experimental network. The 802.11 |
---|
| 11 | Reference Design implements many useful log entry types, including Tx packets, |
---|
| 12 | Rx packets and low-level MAC re-transmissions. Users can create additional |
---|
| 13 | entry types to suit their research application. |
---|
| 14 | |
---|
| 15 | The basic flow for using log data in an experiments is: |
---|
| 16 | |
---|
| 17 | #. Retrieve log data from one or more 802.11 Reference Design nodes |
---|
| 18 | #. Generate an index of each node's log data |
---|
| 19 | #. Filter the index to select the required subset of log entry types |
---|
| 20 | #. Convert the log data and filtered index into structured arrays of log entries |
---|
| 21 | #. Process the log entries to calculate the statistics required for the experiment |
---|
| 22 | |
---|
| 23 | Log data retrieval (step 1) is implemented in the ``log_get_all_new()`` method. |
---|
| 24 | |
---|
| 25 | Log index generation and filtering (steps 2-3) and entry processing (steps 4-5) are described below. |
---|
| 26 | |
---|
| 27 | |
---|
| 28 | ********** |
---|
| 29 | Components |
---|
| 30 | ********** |
---|
| 31 | |
---|
| 32 | Log Data |
---|
| 33 | ======== |
---|
| 34 | |
---|
| 35 | The term ``log_data`` refers to a `bytearray` of raw log data retrieved from an |
---|
| 36 | 802.11 Reference Design Node. The ``log_data`` is a tightly packed array of log |
---|
| 37 | entries, each composed of an entry header and arbitrary entry payload. The array |
---|
| 38 | of log data is a byte-for-byte copy of the log data retrieved from the node's |
---|
| 39 | DRAM. |
---|
| 40 | |
---|
| 41 | The log data format is documented in the :ug:`user guide <wlan_exp/log>`. |
---|
| 42 | |
---|
| 43 | In Python scripts log data is retrieved using the log methods implemented in |
---|
| 44 | `wlan_exp.node`. |
---|
| 45 | |
---|
| 46 | |
---|
| 47 | Raw Log Index |
---|
| 48 | ============= |
---|
| 49 | |
---|
| 50 | Log data can be quite large, often many gigabytes for a long trial. Re-parsing |
---|
| 51 | the full log data array to summarize its contents would be expensive. A more |
---|
| 52 | efficient approach is to generate an index describing the contents of each log |
---|
| 53 | data array at the time the array is retrieved, then save this index with the |
---|
| 54 | log data for easier processing in the future. |
---|
| 55 | |
---|
| 56 | We call this index the **raw log index**. The raw log index contains the |
---|
| 57 | location of each log entry in the log data and the entry type ID of that entry. |
---|
| 58 | |
---|
| 59 | For example, consider the log data array illustrated below. |
---|
| 60 | |
---|
| 61 | .. figure:: _images/wlan_exp_log_layout.png |
---|
| 62 | :align: center |
---|
| 63 | :alt: Log data example |
---|
| 64 | |
---|
| 65 | Example 112-byte log data array with 5 entries of 3 different entry types. |
---|
| 66 | |
---|
| 67 | The blue areas show the log entry headers. Each header starts with a delimiter |
---|
| 68 | value, followed by a sequence number, the log entry type ID and the log entry |
---|
| 69 | length. Following the header is the log entry payload itself, illustrated as |
---|
| 70 | red, green and yellow here. |
---|
| 71 | |
---|
| 72 | This log data contains 5 log entries of 3 distinct types: |
---|
| 73 | |
---|
| 74 | * Two entries of type ID 10 (red) at byte offsets 8 and 88 |
---|
| 75 | * Two entries of type ID 214 (green) at byte offsets 36 and 76 |
---|
| 76 | * One entry of type ID 3 (yellow) at byte offset 56 |
---|
| 77 | |
---|
| 78 | The raw log index for this log data would be the dictionary:: |
---|
| 79 | |
---|
| 80 | {10: [8, 88], |
---|
| 81 | 214: [36, 76], |
---|
| 82 | 3: [56] |
---|
| 83 | } |
---|
| 84 | |
---|
| 85 | In actual experiments the log data and corresponding index will be **much** |
---|
| 86 | larger. We have successfully tested these tools on log data with tens of |
---|
| 87 | millions of entries (on a 64-bit machine, of course). |
---|
| 88 | |
---|
| 89 | .. note:: Notice that the dictionary keys are integer entry type IDs. This is |
---|
| 90 | by design, as it allows the raw log index to be generated using only the |
---|
| 91 | log data itself, with no dependence on the formats of the log entries |
---|
| 92 | themselves. The integer IDs will be translated into names in the log index |
---|
| 93 | filtering step, described below. |
---|
| 94 | |
---|
| 95 | |
---|
| 96 | Tools |
---|
| 97 | ===== |
---|
| 98 | |
---|
| 99 | The ``log.util.gen_raw_log_index(log_data)`` method will read a raw log data |
---|
| 100 | array and generate the log data index. |
---|
| 101 | |
---|
| 102 | The ``log.util_hdf.log_data_to_hdf5(log_data, filename)`` method will by |
---|
| 103 | default create and save the raw log index when saving log data to an HDF5 file. |
---|
| 104 | To disable the creation of the raw log index, pass ``gen_index=False`` to the |
---|
| 105 | command. |
---|
| 106 | |
---|
| 107 | The ``log.util_hdf.hdf5_to_log_index(filename)`` method will return the raw |
---|
| 108 | log index previously saved to an HDF5 file. |
---|
| 109 | |
---|
| 110 | |
---|
| 111 | Archiving Log Data |
---|
| 112 | ================== |
---|
| 113 | |
---|
| 114 | Log data retrieved from an 802.11 Reference Design node will initially be |
---|
| 115 | stored in RAM as a bytearray. In most experiments it is useful to write the |
---|
| 116 | log data to a file for archival and future processing. |
---|
| 117 | |
---|
| 118 | We recommend storing log data in `HDF5 files <http://www.hdfgroup.org/HDF5/>`_ |
---|
| 119 | using the `h5py package <http://docs.h5py.org/en/latest/index.html>`_. The HDF5 |
---|
| 120 | format is open, fast, well documented and supported by a wide variety of tools. |
---|
| 121 | |
---|
| 122 | |
---|
| 123 | HDF5 Log Data Format |
---|
| 124 | -------------------- |
---|
| 125 | |
---|
| 126 | The HDF5 format is built from two types of objects: |
---|
| 127 | |
---|
| 128 | * **Dataset** - an array of homogenous data with arbitrary dimensions |
---|
| 129 | * **Group** - a named level of hierarchy which can contain datasets and other groups |
---|
| 130 | |
---|
| 131 | Datasets and groups can also store **attributes**. Datasets and attributes |
---|
| 132 | retain their data types and dimensions when written to HDF5 files. The h5py |
---|
| 133 | package uses numpy arrays and datatypes as the Python interface to the |
---|
| 134 | underlying HDF5 data. |
---|
| 135 | |
---|
| 136 | One important concept is the **root group**. Every HDF5 file has a root group |
---|
| 137 | named ``'/'``. Named datasets and groups can be added to the root group to |
---|
| 138 | build more complex hierarchy. Sub-groups have names, forming Unix-like paths |
---|
| 139 | to datasets and other groups, always starting with the root group ``'/'``. |
---|
| 140 | |
---|
| 141 | The h5py package supports building HDF5 files with arbitrary hierarchy. We |
---|
| 142 | define a simple HDF5 hierarchy for storing 802.11 Reference Design log data in |
---|
| 143 | an HDF5 group. We call this group format a ``wlan_exp_log_data_container``. |
---|
| 144 | When an HDF5 group is used as a ``wlan_exp_log_data_container`` it must have |
---|
| 145 | the format illustrated below:: |
---|
| 146 | |
---|
| 147 | wlan_exp_log_data_container (HDF5 group): |
---|
| 148 | |- Attributes: |
---|
| 149 | | |- 'wlan_exp_log' (1,) bool |
---|
| 150 | | |- 'wlan_exp_ver' (3,) uint32 |
---|
| 151 | | |- <user provided attributes> |
---|
| 152 | |- Datasets: |
---|
| 153 | | |- 'log_data' (1,) voidN (where N is the size of the data in bytes) |
---|
| 154 | |- Groups (optional): |
---|
| 155 | |- 'raw_log_index' |
---|
| 156 | |- Datasets: |
---|
| 157 | (dtype depends if largest offset in raw_log_index is < 2^32) |
---|
| 158 | |- <int> (N1,) uint32/uint64 |
---|
| 159 | |- <int> (N2,) uint32/uint64 |
---|
| 160 | |- ... |
---|
| 161 | |
---|
| 162 | The elements of this format are: |
---|
| 163 | |
---|
| 164 | * ``wlan_exp_log`` attribute: must be present with boolean value True |
---|
| 165 | * ``wlan_exp_ver`` attribute: 3-tuple of integers recording the |
---|
| 166 | `(major, minor, rev)` version of the wlan_exp package that wrote the file |
---|
| 167 | * ``log_data`` dataset: the raw bytearray retrieved from the 802.11 Reference |
---|
| 168 | Design node, stored as a scalar value using the HDF5 opaque type |
---|
| 169 | * ``raw_log_index`` sub-group (optional): if present, must be a group with one |
---|
| 170 | dataset per log entry type, where each dataset contains the array of integers |
---|
| 171 | indicating the location of each log entry in the ``log_data``. This |
---|
| 172 | group-of-datasets encodes the dictionary-of-arrays normally used to represent |
---|
| 173 | the ``raw_log_index``. |
---|
| 174 | * User provided attributes: additional attributes provided at the time of file |
---|
| 175 | creation. The ``log.util_hdf`` methods store these attributes when supplied |
---|
| 176 | by the user code. These can be useful to store additional experiment-specific |
---|
| 177 | details about the log data (i.e. date/time of the experiment, physical |
---|
| 178 | location of the nodes, etc.). |
---|
| 179 | |
---|
| 180 | |
---|
| 181 | Writing Log Data Files |
---|
| 182 | ---------------------- |
---|
| 183 | |
---|
| 184 | The ``log_data_to_hdf5(log_data, filename)`` method will create an HDF5 file |
---|
| 185 | with name ``filename`` for the supplied ``log_data`` bytearray. This method |
---|
| 186 | will automatically generate and store a raw log index for the ``log_data``. |
---|
| 187 | |
---|
| 188 | The ``log_data_to_hdf5`` method will create an HDF5 file with a single |
---|
| 189 | ``log_data`` array (i.e. with log data from a single node) stored in the root |
---|
| 190 | group. |
---|
| 191 | |
---|
| 192 | |
---|
| 193 | Reading Log Data Files |
---|
| 194 | ---------------------- |
---|
| 195 | |
---|
| 196 | The ``hdf5_to_log_data(filename)`` method will read a ``log_data`` array from |
---|
| 197 | the HDF5 file named ``filename``. The format of the returned array is identical |
---|
| 198 | to the bytearray retrieved from an 802.11 Reference Design node and can be used |
---|
| 199 | wherever the original ``log_data`` array would have been used. |
---|
| 200 | |
---|
| 201 | The ``hdf5_to_log_index(filename)`` method will read a raw log index from the |
---|
| 202 | HDF5 file named ``filename``. The dictionary returned will be identical to |
---|
| 203 | re-generating the index from scratch (i.e. by calling |
---|
| 204 | ``log.util.gen_raw_log_index(hdf5_to_log_data(filename))``). Retrieving the |
---|
| 205 | raw log index from an HDF5 file is typically must faster than re-generating |
---|
| 206 | the index from the log data. |
---|
| 207 | |
---|
| 208 | |
---|
| 209 | Filtered Log Indexes |
---|
| 210 | ==================== |
---|
| 211 | |
---|
| 212 | In most cases the log data retrieved from a node will contain entries that are |
---|
| 213 | not required for a particular analysis. User scripts can select a subset of |
---|
| 214 | entry types for further processing by filtering the raw log index, then |
---|
| 215 | passing the log data and filtered index to downstream tools for further |
---|
| 216 | parsing. Filtering the log index can be much faster than filtering the log |
---|
| 217 | data itself, especially for multi-gigabyte log data arrays. |
---|
| 218 | |
---|
| 219 | Log index filtering is implemented in the ``log.util.filter_log_index(...)`` |
---|
| 220 | method. |
---|
| 221 | |
---|
| 222 | The ``filter_log_index`` method takes a raw log index, stored as a dictionary, |
---|
| 223 | as input and produces a new log index, which is also a dictionary. The method |
---|
| 224 | implements two processes: |
---|
| 225 | |
---|
| 226 | * Translation of dictionary keys |
---|
| 227 | * Selection of a subset of entry types to include in the output dictionary |
---|
| 228 | |
---|
| 229 | Entry Type Translation |
---|
| 230 | ---------------------- |
---|
| 231 | |
---|
| 232 | Raw log indexes use integer entry type IDs as dictionary keys. These IDs are |
---|
| 233 | taken directly from the log data itself, which allows index generation even if |
---|
| 234 | the corresponding entry types are not understood by the wlan_exp Python code. |
---|
| 235 | But remembering these "magic" numbers is inconvenient when building analysis |
---|
| 236 | scripts. |
---|
| 237 | |
---|
| 238 | The ``filter_log_index`` output dictionary uses entry type names as keys |
---|
| 239 | [#entry_type_names]_. |
---|
| 240 | |
---|
| 241 | For example, assume the following log entry type definitions:: |
---|
| 242 | |
---|
| 243 | ENTRY_TYPE_RX_OFDM = 10 |
---|
| 244 | ENTRY_TYPE_RX_DSSS = 11 |
---|
| 245 | ENTRY_TYPE_TX_HIGH = 20 |
---|
| 246 | |
---|
| 247 | entry_rx_ofdm = WlanExpLogEntryType(name='RX_OFDM', entry_type_id=ENTRY_TYPE_RX_OFDM) |
---|
| 248 | entry_rx_dsss = WlanExpLogEntryType(name='RX_DSSS', entry_type_id=ENTRY_TYPE_RX_DSSS) |
---|
| 249 | entry_tx_high = WlanExpLogEntryType(name='TX_HIGH', entry_type_id=ENTRY_TYPE_TX_HIGH) |
---|
| 250 | |
---|
| 251 | # Entry type fields omitted for clarity - actual field definitions are required! |
---|
| 252 | |
---|
| 253 | And a raw log index with multiple instances of each entry type:: |
---|
| 254 | |
---|
| 255 | >>> my_raw_log_index |
---|
| 256 | {10: [7724, 8116, 8428, 9716], |
---|
| 257 | 11: [3572, 4468, 6900], |
---|
| 258 | 20: [144, 336, 528, 720, 912, 1104, 1296, 1488]} |
---|
| 259 | |
---|
| 260 | Using the ``filter_log_index`` method to translate the entry type keys will give:: |
---|
| 261 | |
---|
| 262 | >>>log_index = filter_log_index(my_raw_log_index) |
---|
| 263 | >>>log_index |
---|
| 264 | {RX_OFDM: [7724, 8116, 8428, 9716], |
---|
| 265 | RX_DSSS: [3572, 4468, 6900], |
---|
| 266 | TX_HIGH: [144, 336, 528, 720, 912, 1104, 1296, 1488]} |
---|
| 267 | |
---|
| 268 | Notice that the lists of log entry locations are unchanged, only the dictionary |
---|
| 269 | keys have been replaced. Now this index can be accessed by entry type name:: |
---|
| 270 | |
---|
| 271 | >>>log_index['TX_HIGH'] |
---|
| 272 | [144, 336, 528, 720, 912, 1104, 1296, 1488] |
---|
| 273 | |
---|
| 274 | .. [#entry_type_names] Technically, ``filter_log_index`` uses *instances* of |
---|
| 275 | the ``wlan_exp.log.entry_types.WlanExpLogEntryType`` class as keys in its |
---|
| 276 | output dictionary. The ``wlan_exp.log.entry_types.WlanExpLogEntryType.__repr__`` |
---|
| 277 | method returns the entry type name. The class itself overloads the ``__eq__`` |
---|
| 278 | and ``__hash__`` methods so an instance will "match" its name when the name |
---|
| 279 | is used to access a dictionary. |
---|
| 280 | |
---|
| 281 | |
---|
| 282 | Entry Type Filtering |
---|
| 283 | -------------------- |
---|
| 284 | |
---|
| 285 | The ``log.util.filter_log_index`` method has three additional arguments which |
---|
| 286 | are used to construct the output dictionary: |
---|
| 287 | |
---|
| 288 | * ``include_only``: List of entry type names to keep in output |
---|
| 289 | * ``exclude``: List of entry type names to exclude from output |
---|
| 290 | * ``merge``: Dictionary of entry type names to merge together in output |
---|
| 291 | |
---|
| 292 | The filter follows the a few basic rules: |
---|
| 293 | |
---|
| 294 | #. If the ``include_only`` argument is present the ``exclude`` argument will be ignored |
---|
| 295 | #. Every requested output key in the ``include_only`` argument will be present in the output dictionary, even if its list of log entry locations is empty |
---|
| 296 | #. An instance of the ``wlan_exp.log.entry_types.WlanExpLogEntryType`` class must be previously created for each entry type included in the output |
---|
| 297 | |
---|
| 298 | One caveat when using ``merge`` is that, while the underlying data of the entry |
---|
| 299 | does not change, the offsets to access the data within that entry will now |
---|
| 300 | follow the merged entry type. This means that entries should only be merged |
---|
| 301 | if one is a strict sub-set of another. For example, the ``TX_HIGH`` and the |
---|
| 302 | ``TX_HIGH_LTG`` entries are alomst identical; the ``TX_HIGH_LTG`` entries |
---|
| 303 | have the same fields in the same order, but have additional fields defined |
---|
| 304 | after the ``TX_HIGH`` definition ends. The ``TX_HIGH`` entry is a strict |
---|
| 305 | sub-set of the ``TX_HIGH_LTG`` entry. Therefore, it is possible to merge |
---|
| 306 | the ``TX_HIGH_LTG`` and ``TX_HIGH`` entries together and treat all of them |
---|
| 307 | like ``TX_HIGH`` entries: ``merge={'TX_HIGH': ['TX_HIGH', 'TX_HIGH_LTG']}``. |
---|
| 308 | By doing this, all ``TX_HIGH_LTG`` entries will now be processsed as part of |
---|
| 309 | any processing on ``TX_HIGH`` entries. However, the opposite merge should not |
---|
| 310 | be done: ``merge={'TX_HIGH_LTG': ['TX_HIGH', 'TX_HIGH_LTG']}`` since |
---|
| 311 | ``TX_HIGH`` entries do not have the extra LTG fields and will return garbage |
---|
| 312 | data if used. |
---|
| 313 | |
---|
| 314 | The following code snippets illustrate this include/exclude/merge behavior:: |
---|
| 315 | |
---|
| 316 | >>> my_raw_log_index |
---|
| 317 | {10: [7724, 8116, 8428, 9716], |
---|
| 318 | 11: [3572, 4468, 6900], |
---|
| 319 | 20: [144, 336, 528, 720, 912, 1104, 1296, 1488], |
---|
| 320 | 21: [10743, 11091]} |
---|
| 321 | |
---|
| 322 | >>> log_index = filter_log_index(my_raw_log_index) |
---|
| 323 | >>> log_index |
---|
| 324 | {RX_OFDM: [7724, 8116, 8428, 9716], |
---|
| 325 | RX_DSSS: [3572, 4468, 6900], |
---|
| 326 | TX_HIGH: [144, 336, 528, 720, 912, 1104, 1296, 1488]} |
---|
| 327 | TX_HIGH_LTG: [10743, 11091]} |
---|
| 328 | |
---|
| 329 | >>> log_index = filter_log_index(my_raw_log_index, include_only=['TX_HIGH', 'RX_OFDM']) |
---|
| 330 | >>> log_index |
---|
| 331 | {RX_OFDM: [7724, 8116, 8428, 9716], |
---|
| 332 | TX_HIGH: [144, 336, 528, 720, 912, 1104, 1296, 1488]} |
---|
| 333 | |
---|
| 334 | >>> log_index = filter_log_index(my_raw_log_index, exclude=['TX_HIGH', 'TX_HIGH_LTG']) |
---|
| 335 | >>> log_index |
---|
| 336 | {RX_OFDM: [7724, 8116, 8428, 9716], |
---|
| 337 | RX_DSSS: [3572, 4468, 6900]} |
---|
| 338 | |
---|
| 339 | >>> log_index = filter_log_index(my_raw_log_index, include_only=['TX_HIGH', 'RX_OFDM', 'NODE_INFO']) |
---|
| 340 | >>> log_index |
---|
| 341 | {RX_OFDM: [7724, 8116, 8428, 9716], |
---|
| 342 | TX_HIGH: [144, 336, 528, 720, 912, 1104, 1296, 1488], |
---|
| 343 | NODE_INFO: []} |
---|
| 344 | |
---|
| 345 | >>> log_index = filter_log_index(my_raw_log_index, include_only=['TX_HIGH'], merge={'TX_HIGH': ['TX_HIGH', 'TX_HIGH_LTG']}) |
---|
| 346 | >>> log_index |
---|
| 347 | {TX_HIGH: [144, 336, 528, 720, 912, 1104, 1296, 1488, 10743, 11091]} |
---|
| 348 | |
---|
| 349 | |
---|
| 350 | |
---|
| 351 | Processing Log Data |
---|
| 352 | =================== |
---|
| 353 | |
---|
| 354 | After log data is retrieved and the log index is generated, there are many |
---|
| 355 | possible tool flows to parse and process the log entries. A few recommended |
---|
| 356 | processing flows are described below and implemented in our examples. This is |
---|
| 357 | not an exhaustive or static list- this list will evolve as we and our users |
---|
| 358 | find new ways to use data produced by the 802.11 Reference Design logging |
---|
| 359 | framework. |
---|
| 360 | |
---|
| 361 | |
---|
| 362 | NumPy Structured Arrays |
---|
| 363 | ----------------------- |
---|
| 364 | |
---|
| 365 | The `NumPy package <http://www.numpy.org/>`_ provides many tools for |
---|
| 366 | processing large datasets. One very useful NumPy resource is |
---|
| 367 | `structured arrays <http://docs.scipy.org/doc/numpy/user/basics.rec.html#structured-arrays-and-record-arrays>`_. |
---|
| 368 | |
---|
| 369 | The ``wlan_exp.log.util.log_data_to_np_arrays(log_data, log_index)`` method |
---|
| 370 | will process a log data array with its corresponding index and return a |
---|
| 371 | dictionary of NumPy structured arrays. The dictionary will have one key-value |
---|
| 372 | pair per log entry type in the ``log_index`` dictionary. Each dictionary value |
---|
| 373 | will be a NumPy structured array. |
---|
| 374 | |
---|
| 375 | The names and data types of each field for a log entry type are defined by |
---|
| 376 | that type's WlanExpLogEntryType instance. The formats for log entry types |
---|
| 377 | implemented in the 802.11 Reference Design are defined in the |
---|
| 378 | ``wlan_exp.log.entry_types`` module. |
---|
| 379 | |
---|