The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,483 other followers

Archive for the ‘Python’ Category

Twitter thread by thread by @0xdade; More unicode shit: zero width space and a zero width nonjoiner in filenames

Posted by jpluimers on 2021/09/22

[WayBack] Thread by @0xdade: Today I learned that you can put zero width spaces in file names on Linux. Have fun. I’m playing with this because punycode/IDN is fascinati…

Today I learned that you can put zero width spaces in file names on Linux. Have fun.

I’m playing with this because punycode/IDN is fascinating, and I wanted to know what happened when I started shoving unicode in the path portion of the url, which isn’t part of how browsers try to protect URLs, as far as I can tell

wiki.mozilla.org/IDN_Display_Al…

I think it’s more entertaining to have a file that is named *only* a zero width space, but I think using them throughout a filename is better to break tab completion and not stand out too much. A filename that is just blank looks strange in ls output.
Thank goodness adduser is looking out for our best interests.
Oooh this one is pretty subtle.
Just about pissed myself with this one.

Not related to the terminal fun, but related to zero width characters:

You can:
– Break url previews https://0xda​​​​​​.​de
– @​0xdade without tagging
– Make a word like system​d not searchable twitter.com/search?q=from%…

Okay but back to command line crap. I really like this one. Create a directory named .[ZWS]

One thing that is cool about using zero width spaces is that “ls” has a flag, “-b”, that is meant to escape non-graphic characters. Inserting a newline, for instance, would be escaped to \n. But the zero width space is technically a graphic character, so nothing happens.

Fun.

Have no fear, though. It’s not unbeatable. It’s only fun if the language and LC settings are set to support utf-8. If you set LC_ALL=C or whatever that isn’t utf-8, then it looks like this.

Putting a link to this tweet here so that I don’t lose it again in the future.

dade@0xdade

My god, it is beautiful. I mean except all the whitespace I can’t get rid of before the command lmao.

View image on Twitter
But on the other hand if you just have a search for the zws, then whatever you find is probably worth investigating. 
I guess I’ll start the hashtag before @QW5kcmV3 does for #irresponsibleutf8 🤭😏😂 

And these tweets:

[WayBack] Thread by @Plazmaz: @0xdade Was doing some real fucking around with urls recently: gist.github.com/Plazmaz/565a5c… (was gonna flesh it out more but didn’t find…:

mentions Was doing some real fucking around with urls recently:
mentions This one is my fave:
‘⁄’ (\u2044)
or
‘∕’ (\u2215)
Allow for this:
google.com⁄search⁄query⁄.example.com
google.com⁄search⁄query⁄@example.com 

[WayBack] url-screwiness.md · GitHub:

This is a list of methods for messing with urls. These are often useful for bypassing filters, SSRF, or creating convincing links that are difficult to differentiate from legitimate urls.

And a bit of documentation links:

–jeroen

 

Posted in *nix, .NET, C#, Development, NTFS, Power User, Python, Scripting, Software Development, Windows | Leave a Comment »

Overview of Client Libraries · Internet Archive

Posted by jpluimers on 2021/09/14

Besides manual upload at [Archive.is] Upload to Internet Archive, there are also automated ways of uploading content.

One day I need this to archive pages or sites into the WayBack machine: [WayBack] Overview of Client Libraries · Internet Archive (most of which is Python based):

Overview of Client Libraries


The Internet Archive and its community have developed several tools to give developers more control the Archive’s content and services:

`internetarchive` Command Line Tool (Python, CLI)


The internetarchive tool by Jake enables programatic access to archive.org item metadata and bulk upload of content to the Internet Archive.

Download instructions are available at [WayBack] https://github.com/jjjake/internetarchive/

Read documentation at [WayBackhttps://internetarchive.readthedocs.io/en/latest/

`openlibrary-client` Client Library (Python, CLI)


The openlibrary-client is the equivalent of the internetarchive tool for OpenLibrary. It provides developers with programatic access to Book edition and author metadata, as well as the ability to create new works.

Download instructions and documentation are available at [WayBackhttps://github.com/internetarchive/openlibrary-client

`warc` Client Library (Python)


WARC (Web ARChive) is a file format for storing web crawls (learn more at: [WayBackhttp://bibnum.bnf.fr/WARC)

This warc library makes it very easy to work with WARC files.

Download instructions and documentation are available at [WayBackhttps://github.com/internetarchive/warc

Related:

Via: [WayBack] Uploading to the Internet Archive – Archiveteam

Uploading to archive.org

[WayBack] Upload any content you manage to preserve! Registering takes a minute.

Tools

The are three main methods to upload items to Internet Archive programmatically:

Don’t use FTP upload, try to keep your items below 400 GiB size, add plenty of metadata.

Wayback machine save page now

Many scripts have been written to use the live proxy:

Torrent upload

Torrent upload, useful if you need resume (for huge files or because your bandwidth is insufficient for upload in one go):

  • Just create the item, make a torrent with your files in it, name it like the item, and upload it to the item.
  • archive.org will connect to you and other peers via a Transmission daemon and keep downloading all the contents till done;
  • For a command line tool you can use e.g. mktorrent or buildtorrent, example: mktorrent -a udp://tracker.publicbt.com:80/announce -a udp://tracker.openbittorrent.com:80 -a udp://tracker.ccc.de:80 -a udp://tracker.istole.it:80 -a http://tracker.publicbt.com:80/announce -a http://tracker.openbittorrent.com/announce "DIRECTORYTOUPLOAD" ;
  • You can then seed the torrent with one of the many graphical clients (e.g. Transmission) or on the command line (Transmission and rtorrent are the most popular; btdownloadcurses reportedly doesn’t work with udp trackers.)
  • archive.org will stop the download if the torrent stalls for some time and add a file to your item called “resume.tar.gz”, which contains whatever data was downloaded. To resume, delete the empty file called IDENTIFIER_torrent.txt; then, resume the download by re-deriving the item (you can do that from the Item Manager.) Make sure that there are online peers with the data before re-deriving and don’t delete the torrent file from the item.

Formats

Formats: anything, but:

  • Sites should be uploaded in [WayBackWARC format;
  • Audio, video, [WayBackbooks and other prints are supported from [WayBack] a number of formats;
  • For .tar and .zip files archive.org offers an online browser to search and download the specific files one needs, so you probably want to use either unless you have good reasons (e.g. if 7z or bzip2 reduce the size tenfold).

This [WayBackunofficial documentation page explains various of the special files found in every item.

Upload speed

Quite often, it’s hard to use your full bandwidth to/from the Internet Archive, which can be frustrating. The bottleneck may be temporary (check the current [WayBacknetwork speed and [WayBacks3 errors) but also persistent, especially if your network is far (e.g. transatlantic connections).

If your connection is slow or unreliable and you’re trying to upload a lot of data, it’s strongly recommended to use the bittorrent method (see above).

Some users with Gigabit upstream links or more, on common GNU/Linux operating systems (such as [WayBackAlpine), have had some success in increasing their upload speed by using more memory on [WayBackTCP congestion control and telling the kernel to live with higher latency and lower responsiveness, as in this example:

# sysctl net.core.rmem_default=8388608 net.core.rmem_max=8388608 net.ipv4.tcp_rmem="32768 131072 8388608" net.core.wmem_default=8388608 net.core.wmem_max=8388608 net.ipv4.tcp_wmem="32768 131072 8388608" net.core.default_qdisc=fq net.ipv4.tcp_congestion_control=bbr
# sysctl kernel.sched_min_granularity_ns=1000000000 kernel.sched_latency_ns=1000000000 kernel.sched_migration_cost_ns=2147483647 kernel.sched_rr_timeslice_ms=100 kernel.sched_wakeup_granularity_ns=1000000000

–jeroen

Posted in Development, Internet, InternetArchive, Power User, Python, Scripting, Software Development, WayBack machine | Leave a Comment »

Coming Back to Old Problems: How I Finally Wrote a Sudoku Solving Algorithm – DEV Community 👩‍💻👨‍💻

Posted by jpluimers on 2021/09/02

It is always fun to see how Sudoku solving algorithms are created and implemented. This is no exception: [WayBack] Coming Back to Old Problems: How I Finally Wrote a Sudoku Solving Algorithm – DEV Community 👩‍💻👨‍💻

(backtracking image from Wikimedia commons)

For a visual Sudoku solver, I usually take [WayBack] Sudoku Solver by Andrew Stuart. Shows the logic behind solving Sudoku square by square which is part of [WayBack] SudokuWiki.org – Getting Started having many visual explanations on how to solve these puzzles, for instance:

It’s a kind of sudo ku, but visually and never failed me solve one.

–jeroen

Posted in Algorithms, Development, Python, Scripting, Software Development | Leave a Comment »

Fixing hg.exe “ImportError: DLL load failed: %1 is not a valid Win32 application.”

Posted by jpluimers on 2021/07/21

If you get the below error when running hg.exe, then you are mixing a 64-bit Mercurial with 32-bit dependencies:

C:\>hg --version
Traceback (most recent call last):
  File "hg", line 43, in 
  File "hgdemandimport\demandimportpy2.pyc", line 150, in __getattr__
  File "hgdemandimport\demandimportpy2.pyc", line 94, in _load
  File "hgdemandimport\demandimportpy2.pyc", line 43, in _hgextimport
  File "mercurial\dispatch.pyc", line 22, in 
  File "hgdemandimport\demandimportpy2.pyc", line 248, in _demandimport
  File "hgdemandimport\demandimportpy2.pyc", line 43, in _hgextimport
  File "mercurial\i18n.pyc", line 28, in 
  File "hgdemandimport\demandimportpy2.pyc", line 150, in __getattr__
  File "hgdemandimport\demandimportpy2.pyc", line 94, in _load
  File "hgdemandimport\demandimportpy2.pyc", line 43, in _hgextimport
  File "mercurial\encoding.pyc", line 24, in 
  File "mercurial\policy.pyc", line 101, in importmod
  File "mercurial\policy.pyc", line 63, in _importfrom
  File "hgdemandimport\demandimportpy2.pyc", line 164, in __doc__
  File "hgdemandimport\demandimportpy2.pyc", line 94, in _load
  File "hgdemandimport\demandimportpy2.pyc", line 43, in _hgextimport
  File "mercurial\cext\parsers.pyc", line 12, in 
  File "mercurial\cext\parsers.pyc", line 10, in __load
ImportError: DLL load failed: %1 is geen geldige Win32-toepassing.

The equivalent English error is [WayBack] ImportError: DLL load failed: %1 is not a valid Win32 application.” – Google Search.

The problem is the bitness of hg.exe: [WayBack] python – Error while installing Mercurial on IIS7 64bit: “DLL Load Failed: %1 is not a valid Win32 application” – Stack Overflow

You can quickly figure out the bitness of hg.exe:

C:\>where hg
C:\Program Files\Mercurial\hg.exe

C:\>sigcheck "C:\Program Files\Mercurial\hg.exe"

Sigcheck v2.72 - File version and signature viewer
Copyright (C) 2004-2019 Mark Russinovich
Sysinternals - www.sysinternals.com

c:\program files\mercurial\hg.exe:
        Verified:       Unsigned
        Link date:      17:49 9-7-2019
        Publisher:      n/a
        Company:        n/a
        Description:    Fast scalable distributed SCM (revision control, version control) system
        Product:        mercurial
        Prod version:   5.0.2
        File version:   5.0.2
        MachineType:    64-bit

Forcing x86 of Mercurial

Since I use chocolatey for most my installs, I forced x86 the Chocolatey way:

So after these:

choco uninstall --yes hg
choco install --yes --force86 hg

I got this signature check:

C:\>sigcheck "C:\Program Files (x86)\Mercurial\hg.exe"

Sigcheck v2.72 - File version and signature viewer
Copyright (C) 2004-2019 Mark Russinovich
Sysinternals - www.sysinternals.com

c:\program files (x86)\mercurial\hg.exe:
        Verified:       Unsigned
        Link date:      17:50 9-7-2019
        Publisher:      n/a
        Company:        n/a
        Description:    Fast scalable distributed SCM (revision control, version control) system
        Product:        mercurial
        Prod version:   5.0.2
        File version:   5.0.2
        MachineType:    32-bit

–jeroen

Posted in Development, DVCS - Distributed Version Control, Encoding, Mercurial/Hg, Python, Scripting, Software Development, Source Code Management | Leave a Comment »

GitHub – jjjake/internetarchive: A Python and Command-Line Interface to Archive.org

Posted by jpluimers on 2021/06/16

On my list of things to play with: [WayBack] GitHub – jjjake/internetarchive: A Python and Command-Line Interface to Archive.org.

Via:

Related:

  • [WayBack] The Internet Archive Python Library — Internet Archive item APIs 1.8.5 documentation
  • [WayBack] Command-Line Interface — Internet Archive item APIs 1.8.5 documentation
  • [WayBack] Quickstart — Internet Archive item APIs 1.8.5 documentation, including:

    Configuring

    Certain functionality of the internetarchive Python library requires your archive.org credentials. Your IA-S3 keys are required for uploading, searching, and modifying metadata, and your archive.org logged-in cookies are required for downloading access-restricted content and viewing your task history. To automatically create a config file with your archive.org credentials, you can use the ia command-line tool:

    $ ia configure
    Enter your archive.org credentials below to configure 'ia'.
    
    Email address: user@example.com
    Password:
    
    Config saved to: /home/user/.config/ia.ini
    

    Your config file will be saved to $HOME/.config/ia.ini, or $HOME/.ia if you do not have a .configdirectory in $HOME. Alternatively, you can specify your own path to save the config to via ia --config-file '~/.ia-custom-config' configure.

    If you have a netc file with your archive.org credentials in it, you can simply run ia configure --netrc. Note that Python’s netrc library does not currently support passphrases, or passwords with spaces in them, and therefore not currently suported here.

–jeroen

Read the rest of this entry »

Posted in Development, Internet, InternetArchive, Power User, Python, Scripting, Software Development, WayBack machine | Leave a Comment »

 
%d bloggers like this: