The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,975 other subscribers

Archive for November 29th, 2018

Some links that might help migrating from Mantis to GitLab

Posted by jpluimers on 2018/11/29

I might give a few of these a shot:

–jeroen

Posted in Development, DVCS - Distributed Version Control, GitLab, Software Development, Source Code Management | Leave a Comment »

Parsing simple html in Python

Posted by jpluimers on 2018/11/29

Was working to get fritzcap to emit a list of interfaces so I could specify which one to capture.

For that I needed to parse the output of http://fritz.box/capture.lua which consists of HTML fragments like below.

What I needed was for each consecutive entries of [WayBack] th and first [WayBack] button tags:

  • content of the th¬†tag
  • content of the value¬†attribute of the button¬†tag having a type="submit"¬†attribute and name=start¬†attribute

So before starting to work on it, I created [WayBack] In order to fix #5, print a list of available interfaces to potentially capture from · Issue #6 · jpluimers/fritzcap

The goal was to get a series of key/value pairs:

4-138 = AP2 (2.4 + 5 GHz, ath1) - Interface 1
4-137 = AP2 (2.4 + 5 GHz, ath1) - Interface 0
4-132 = AP (2.4 GHz, ath0) - Interface 1
4-131 = AP (2.4 GHz, ath0) - Interface 0
4-129 = HW (2.4 GHz, wifi0) - Interface 0
4-128 = WLAN Management Traffic - Interface 0a

So I built a class descending from¬†[WayBack]¬†HTMLParser ‚ÄĒ Simple HTML and XHTML parser¬†that ships with the [WayBack]¬†Python standard libraries.

If in the future I need more complex HTML parsing, then these links will help me choosing more feature rich parsers:

Back to the HTMLParser descendant in interfaces_dumper.py which can basically be condensed down to the code below.

  • handle_data¬†is called for both start tags and end tags. The th¬†value in data is only present in the start tag (at the time of end tag the data is empty), so you need to keep track of both last_start_tag and last_end_tag.
  • handle_endtag¬†maintains last_end_tag¬†to help handle_data.
  • handle_starttag¬†maintains last_start_tag¬†to help handle_data¬†and also handles the button¬†behaviour.
    • The buttonis only relevant if it has type="submit"¬†and name="start"¬†and a value¬†attribute in that order.
    • Output is in data¬†which is an array of key/value¬†pairs.

Read the rest of this entry »

Posted in Development, Fritz!, Fritz!Box, fritzcap, Internet, Power User, Python, Scripting, Software Development | Leave a Comment »

 
%d bloggers like this: