Was working to get fritzcap to emit a list of interfaces so I could specify which one to capture.
For that I needed to parse the output of http://fritz.box/capture.lua which consists of HTML fragments like below.
What I needed was for each consecutive entries of [WayBack] th and first [WayBack] button tags:
- content of the
th
tag
- content of the
value
attribute of the button
tag having a type="submit"
attribute and name=start
attribute
So before starting to work on it, I created [WayBack] In order to fix #5, print a list of available interfaces to potentially capture from · Issue #6 · jpluimers/fritzcap
The goal was to get a series of key/value pairs:
4-138 = AP2 (2.4 + 5 GHz, ath1) - Interface 1
4-137 = AP2 (2.4 + 5 GHz, ath1) - Interface 0
4-132 = AP (2.4 GHz, ath0) - Interface 1
4-131 = AP (2.4 GHz, ath0) - Interface 0
4-129 = HW (2.4 GHz, wifi0) - Interface 0
4-128 = WLAN Management Traffic - Interface 0a
So I built a class descending from [WayBack] HTMLParser — Simple HTML and XHTML parser that ships with the [WayBack] Python standard libraries.
If in the future I need more complex HTML parsing, then these links will help me choosing more feature rich parsers:
Back to the HTMLParser descendant in interfaces_dumper.py which can basically be condensed down to the code below.
handle_data
is called for both start tags and end tags. The th
value in data
is only present in the start tag (at the time of end tag the data
is empty), so you need to keep track of both last_start_tag
and last_end_tag
.
handle_endtag
maintains last_end_tag
to help handle_data
.
handle_starttag
maintains last_start_tag
to help handle_data
and also handles the button
behaviour.
- The
button
is only relevant if it has type="submit"
and name="start"
and a value
attribute in that order.
- Output is in
data
which is an array of key
/value
pairs.
Read the rest of this entry »
Like this:
Like Loading...