Was working to get fritzcap to emit a list of interfaces so I could specify which one to capture.
For that I needed to parse the output of http://fritz.box/capture.lua which consists of HTML fragments like below.
What I needed was for each consecutive entries of [WayBack] th and first [WayBack] button tags:
- content of the
th tag
- content of the
value attribute of the button tag having a type="submit" attribute and name=start attribute
So before starting to work on it, I created [WayBack] In order to fix #5, print a list of available interfaces to potentially capture from · Issue #6 · jpluimers/fritzcap
The goal was to get a series of key/value pairs:
4-138 = AP2 (2.4 + 5 GHz, ath1) - Interface 1
4-137 = AP2 (2.4 + 5 GHz, ath1) - Interface 0
4-132 = AP (2.4 GHz, ath0) - Interface 1
4-131 = AP (2.4 GHz, ath0) - Interface 0
4-129 = HW (2.4 GHz, wifi0) - Interface 0
4-128 = WLAN Management Traffic - Interface 0a
So I built a class descending from [WayBack] HTMLParser — Simple HTML and XHTML parser that ships with the [WayBack] Python standard libraries.
If in the future I need more complex HTML parsing, then these links will help me choosing more feature rich parsers:
Back to the HTMLParser descendant in interfaces_dumper.py which can basically be condensed down to the code below.
handle_data is called for both start tags and end tags. The th value in data is only present in the start tag (at the time of end tag the data is empty), so you need to keep track of both last_start_tag and last_end_tag.
handle_endtag maintains last_end_tag to help handle_data.
handle_starttag maintains last_start_tag to help handle_data and also handles the button behaviour.
- The
buttonis only relevant if it has type="submit" and name="start" and a value attribute in that order.
- Output is in
data which is an array of key/value pairs.
Read the rest of this entry »