The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,318 other followers

Archive for the ‘Python’ Category

GitHub – pastpages/savepagenow: A simple Python wrapper for archive.org’s “Save Page Now” capturing service

Posted by jpluimers on 2021/03/11

This makes it way easier to save WayBack content:

[WayBack] GitHub – pastpages/savepagenow: A simple Python wrapper for archive.org’s “Save Page Now” capturing service

A poor-mans alternative is the below bash script from [WayBack] Saving of public Google+ content at the Internet Archive’s Wayback Machine by the Archive Team has begun : plexodus:

For Linux, MacOS / OSX, BSD, and other Unix-like operating systems (including Android with Termux, or Windows, with a Unix/Linux environment), the following script (I’ve saved this as archive-url) will archive the requested URL:

#!/bin/bash
# archive-url
# Archive selected URL at the Internet Archive

curl -s -I -H "Accept: application/json" "https://web.archive.org/save/${1}" |
grep '^x-cache-key:' | sed "s,https,&://,; s,\(${1}\).*$,\1,"

Save that to your execution path (I’ve chosen ~/bin, you might use /usr/local/bin or another location on your $PATH, and invoke as, say (again referring to the G+MM homepage):

$ archive-url https://plus.google.com/communities/112164273001338979772

If you have a list of URLs in a file (or pipelined from command output), you can request all of them to be archived in a simple bash loop. I’m using xargs here to run ten simultaneous requests from the file gplus-urllist:

cat gplus_urllist | while read url do xargs -I{} -P 10 archive-url {}; done

I’ve run this on over 10,000 URLs over a modest residential broadband connection in a hair over two hours.

Note that such requests trigger an archive by the Internet Archive from one of its archiving nodes, you’re not sending the page to the Archive yourself. In particular, archival from regions defaulting to another language may result in the Google+ site content (but not post or comments) being in a different language. I’ve frequently seen my pages turning up in Japanese, for instance.

–jeroen

Posted in bash, Development, Python, Scripting, Software Development | Leave a Comment »

Python: saving a web page to a jpeg image file by using the Google base64url encoded screenshot of it

Posted by jpluimers on 2021/02/19

As a follow-up on Still looking for base64url decoding tools, both on-line and for MacOS homebrew: this is in Python, works on MacOS, Linux and Windows, and can be integrated in a web page.

It is based on the ideas in [WayBack] Python-Twitter-Hacks/websiteScreenshot.py at master · edent/Python-Twitter-Hacks · GitHub, which was more like a code snippet with hard coded literals.

It downloads a jpeg web-site screenshot using the Google PageSpeed API V1, which generates the screenshot as a base64url encoded blob inside a JSON structure.

Python does not have native Python base64url support, but the concept of it is fairly straightforward: [WayBack] RFC 4648 – The Base16, Base32, and Base64 Data Encodings: Base 64 Encoding with URL and Filename Safe Alphabet, which allows data to be passed inside URLs without reverting to [WayBack] Percent-encoding – Wikipedia.

My changes work, but are by no means in canonical form or Idiomatic Python. I have a long way to go to reach that level of Python.

So I forked the repository, and fixed the script basing it on Python 3.

I might make it V2 compatible in the future. More information on V2 in [WayBack] Google APIs Explorer: Services > PageSpeed Insights API v2 > pagespeedonline.pagespeedapi.runpagespeed

Content is in the below gist.

–jeroen

Read the rest of this entry »

Posted in Development, Python, Scripting, Software Development | Leave a Comment »

Making it dead simple to implement @haveibeenpwnd in your applications, including strength warning if found in @troyhunt’s password collection.

Posted by jpluimers on 2020/12/02

I wasn’t aware that Troy Hunt created an API [WayBack] for [WayBack] Have I Been Pwned: Check if your email has been compromised in a data breach.

He did, as I noticed through [WayBack] Michelangelo van Dam on Twitter: “Making it dead simple to implement @haveibeenpwnd in my applications, including strength warning if found in @troyhunt’s password collection. Check out to try it out yourself. #ImproveSecurity #haveibeenpwnd”.

There are in fact plenty of other packages, web-sites and apps using the API as seen on [WayBack] Have I Been Pwned: API consumers.

Many people ask “if it is safe” (often assuming passwords are sent in clear, or hashes are sent in full; my fear is that those people implement security somewhere).

It is safe:

PHP source is at [WayBack] GitHub – DragonBe/hibp: A composer package to verify if a password was previously used in a breach using Have I Been Pwned API.

There is also a [WayBack] composer package at [WayBack] dragonbe/hibp – Packagist.

A really cool thing on it is this:

This project was also the subject of my talk [WayBack] Mutation Testing with Infection where the code base was not only covered by unit tests, but also was subjected to Mutation Testing using [WayBack] Infection to ensure no coding mistakes could slip into the codebase.

Apart from the tests, the most important source is at [WayBack] hibp/Hibp.php at master · DragonBe/hibp · GitHub

Related:

–jeroen

Posted in Development, Mobile Development, PHP, Python, Scripting, Software Development, Web Development | Leave a Comment »

Brew reminder to self

Posted by jpluimers on 2020/08/05

From the update process:

==> Caveats
==> hub
Bash completion has been installed to:
  /usr/local/etc/bash_completion.d

zsh completions have been installed to:
  /usr/local/share/zsh/site-functions
==> python
Python has been installed as
  /usr/local/bin/python3

Unversioned symlinks `python`, `python-config`, `pip` etc. pointing to
`python3`, `python3-config`, `pip3` etc., respectively, have been installed into
  /usr/local/opt/python/libexec/bin

If you need Homebrew's Python 2.7 run
  brew install python@2

You can install Python packages with
  pip3 install 
They will install into the site-package directory
  /usr/local/lib/python3.7/site-packages

See: https://docs.brew.sh/Homebrew-and-Python
==> youtube-dl
Bash completion has been installed to:
  /usr/local/etc/bash_completion.d

zsh completions have been installed to:
  /usr/local/share/zsh/site-functions
==> mpv
zsh completions have been installed to:
  /usr/local/share/zsh/site-functions
==> node
Bash completion has been installed to:
  /usr/local/etc/bash_completion.d

–jeroen

Posted in Apple, Development, Home brew / homebrew, Power User, Python, Scripting, Software Development | Leave a Comment »

pip install –user and your path

Posted by jpluimers on 2020/06/09

I’ve added this to my ~/.bashrc to stuff installed by pip install --user is accessible from interactive shells:

# set PATH so it includes user's private python "pip --user" bin if it exists
if [ -d "$HOME/.local/bin" ] ; then
    PATH="$PATH:$HOME/.local/bin"
fi

The addition is at the end of the path. It is a choice: it means machine installs take prevalence over user installs. That’s usually what I want. For more considerations (including non-interactive shells), see [WayBack] bash – How to correctly add a path to PATH? – Unix & Linux Stack Exchange.

The --user installs do not affect the full system, nor other users.

Further reading:

–jeroen

Posted in Development, Python, Scripting, Software Development | Leave a Comment »

 
<span>%d</span> bloggers like this: