The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,640 other followers

Archive for November 29th, 2018

Some links that might help migrating from Mantis to GitLab

Posted by jpluimers on 2018/11/29

I might give a few of these a shot:

–jeroen

Posted in Development, DVCS - Distributed Version Control, GitLab, Software Development, Source Code Management | Leave a Comment »

Parsing simple html in Python

Posted by jpluimers on 2018/11/29

Was working to get fritzcap to emit a list of interfaces so I could specify which one to capture.

For that I needed to parse the output of http://fritz.box/capture.lua which consists of HTML fragments like below.

What I needed was for each consecutive entries of [WayBack] th and first [WayBackbutton tags:

  • content of the th tag
  • content of the value attribute of the button tag having a type="submit" attribute and name=start attribute

So before starting to work on it, I created [WayBackIn order to fix #5, print a list of available interfaces to potentially capture from · Issue #6 · jpluimers/fritzcap

The goal was to get a series of key/value pairs:

4-138 = AP2 (2.4 + 5 GHz, ath1) - Interface 1
4-137 = AP2 (2.4 + 5 GHz, ath1) - Interface 0
4-132 = AP (2.4 GHz, ath0) - Interface 1
4-131 = AP (2.4 GHz, ath0) - Interface 0
4-129 = HW (2.4 GHz, wifi0) - Interface 0
4-128 = WLAN Management Traffic - Interface 0a

So I built a class descending from [WayBackHTMLParser — Simple HTML and XHTML parser that ships with the [WayBackPython standard libraries.

If in the future I need more complex HTML parsing, then these links will help me choosing more feature rich parsers:

Back to the HTMLParser descendant in interfaces_dumper.py which can basically be condensed down to the code below.

  • handle_data is called for both start tags and end tags. The th value in data is only present in the start tag (at the time of end tag the data is empty), so you need to keep track of both last_start_tag and last_end_tag.
  • handle_endtag maintains last_end_tag to help handle_data.
  • handle_starttag maintains last_start_tag to help handle_data and also handles the button behaviour.
    • The buttonis only relevant if it has type="submit" and name="start" and a value attribute in that order.
    • Output is in data which is an array of key/value pairs.

Read the rest of this entry »

Posted in Development, Fritz!, Fritz!Box, fritzcap, Internet, Power User, Python, Scripting, Software Development | Leave a Comment »

 
%d bloggers like this: