The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,262 other subscribers

Exporting your Twitter content, converting to Markdown and getting the image alt-texts (thanks @isotopp/@HBeckPDX/@weiglemc for the info and @kcgreenn/@dreamjar for the comic!)

Posted by jpluimers on 2022/11/12

This is fine #Twitter

This is fine #Twitter (illustration inspired by KC Green; creation video below)

(Edit 20221114: script for high-res images; more tweets from Jan) (Edit 20221116: hat-tip to Sam) (Edit 20221120: archiving t.co links by Michele Weigle) (Edit 20221122: added article by Johan van der Knijff) (20221128 Tapue export tool by Mike Hucka)

Time to be prepared:

The below will help you exporting your Twitter content (Tweets, DMs, media), perform some conversions on them and optionally delete (parts of) your content.

Important: keep your Twitter account afterwards (to prevent someone from creating a new account with the same handle).

This is fine #Twitter creation video

The above illustration is based on the famous [Wayback/ArchiveGunshow – On Fire 6 panel comic by [Wayback/Archive] kcg (@kcgreenn), of which most people know the meme based on the first two panels:

[Wayback/Archive] Matt Carlson 🇺🇸🏳️‍🌈#BLM✊🏿✊🏾✊🏽 (@dreamjar) adopted it into on November 5th 2022 [Wayback/Archive] Matt Carlson 🇺🇸🏳️‍🌈#BLM✊🏿✊🏾✊🏽 on Twitter: “@BriannaWu @SEP” “This is fine” as a response to [Wayback/Archive] Brianna Wu on Twitter: “1/ Here’s everything we know about the mass Twitter advertising exodus in plain language. So, Elno acquires Twitter and advertisers are FREAKED. Would be a good time to reassure them, right? Except Twitter’s Chief Revenue officer @SEP quits on Friday. Literally walks out.”

He published the above “making of” video a week later at [Wayback/Archive] Matt Carlson 🇺🇸🏳️‍🌈#BLM✊🏿✊🏾✊🏽 on Twitter: “@jpluimers @BriannaWu @SEP I redrew this based on, and directly inspired by, KC Green’s original image. All credit goes to him. KC Green’s original “this is fine dog” was like the patron saint of Twitter when I worked there from 2014 to 2017, a time when everything felt like it was on 🔥. “.

(A large version of this video is at the bottom of this blog post)

Oh and: [Wayback/Archive] Sam on Twitter: “@waDNR @JimSycurity Found the source of this, with help from @jpluimers! /HT @dreamjar and @kcgreenn”

Similar article by Johan

a few days ago, [Wayback/Archive] Johan van der Knijff @bitsgalore@digipres.club (@bitsgalore) wrote another great article on various aspects of migration at [Wayback/Archive] How to preserve your personal Twitter archive and keeps updating it. Recommended reading!

Via [Wayback/Archive] Johan van der Knijff @bitsgalore@digipres.club on Twitter: “@jpluimers @weiglemc @Felienne FWIW I also wrote this, which covers similar ground, but provides some more additional context (and also adds FediFinder to preserve the followers/followees/lists graphs): … I only found out about your earlier post just now, so I added some references to it!”.

[Wayback/Archive] How to preserve your personal Twitter archive

Johan is also on Mastodon: [Wayback/Archive] Johan van der Knijff (@bitsgalore@digipres.club) – digipres.club.

Exporting tweets and converting them to markdown

I was really glad Isotopp was already figuring this out before I had a chance too. Thanks Kris!

  1. [Wayback/Archive] Download an archive of your data / Twitter, wait for the confirmation (sometimes 24+ hours) that the download is ready, then download it from twitter.com/settings/your_twitter_data/data.
  2. Download or clone [Wayback/Archive] timhutton/twitter-archive-parser: Python code to parse a Twitter archive and output in various ways.
  3. Ensure Python 3 is installed.
  4. Unzip the download in a new directory.
  5. Open a command-prompt in that directory.
  6. Run python parser.py (likely you need to replace parser.py with the full path to it).
  7. Optionally (but recommended): download high-resolution versions of your images
  8. Optionally upload the markdown and other files to a web-server for publishing.
  9. Keep your Twitter account (so nobody steals your Twitter handle).

It already converts to markdown with embedded images and links and replaces t.co URLs with their original versions and work is in progress to add more features.

A separate script downloads high-resolution versions of your images.

This means the most important step is the first: in the future more export features will likely become available.

Extracting the alt-text from the exported tweets

[Wayback/Archive] Alt Text Archiver (which like TweetDelete a few sections further below requires you to upload your archive)

When parsing fails, you might need this Python script to extract from the tweets.js inside your Twitter archive into only the ones having media: [Wayback/Archive] tweets-media.py (via [Wayback/Archive] Ada Lovecraft on Twitter: “@HBeckPDX of course!”).

Via:

Twitter bookmarks

Another story is getting the bookmarks as that is not contained in the downloaded archive. For those, you need to download the bookmarks separately which

  1. is a separate JSON file
  2. does not require waiting
  3. requires manual steps to download

The script is at [Wayback/Archive] Python code to get text and link of the bookmarked tweets and save in markdown (which has various versions as the file format changed over time; at the time of writing you need get_twitter_bookmarks_v3.py).

[Wayback/Archive] Exporting your Twitter bookmarks in markdown file explains the how and why of the script and how to download the bookmarks JSON data.

The script does not convert the t.colinks to their final destination; that’s explained in these twets:

  1. [Wayback/Archive] Kris on Twitter: “Der Twitter Export hat Eure Bookmarks nicht mit drin. @uschebit
    fand divyajyotiuk.hashnode.dev/exporting-your-twitter-bookmarks-in-markdown-file und das mit gist.github.com/divyajyotiuk/9fb29c046e1dfcc8d5683684d7068efe#file-get_twitter_bookmarks_v3-py hat für mich funktioniert, ist aber nicht trivial.”
  2. [Wayback/Archive] Kris on Twitter: “In den Bookmarks sind dann noch t.co Links und die müßt Ihr noch auflösen. Das kann das Python Package urlexpander tun (aber das ist weit überdimensioniert) oder ihr macht einen response = requests.get() und greift Euch dann response.url.”
  3. [Wayback/Archive] Janek Bevendorff on Twitter: “@isotopp Besser requests.head() ohne Redirect-Follow und dann Location-Header auslesen.”

Optional: delete (parts of) your Twitter content

I think deleting DMs is a manual task. Since they often contain private information it is something you really need to consider carefully: twitter has no CISO any more which means privacy is even less guaranteed than in the past.

With that in mind and living in Europe, you can also try the GDPR way: [Wayback/Archive] Michael Veale is @mikarv@someone.elses.computer on Twitter: “Twitter is now an insecure platform, haemmorhaging security experts. It could haemmorhage your DMs, through leak or sale. It’s no hard guarantee, but your best chance to delete them is with is with #GDPR rights. I’ve written a blog on how: …”

[Wayback/Archive] Deleting DMs from Twitter using the GDPR

You can delete Tweets using [Wayback/Archive] TweetDelete – Easily delete your old tweets.

Kris explained this a while ago:

  1. [Wayback/Archive] Kris on Twitter: “As a Tweet Delete Subscriber, you can upload a tweets.js file to delete more than the 3000 most recent Tweets. isotopp@chaos.social can!”

    Tweet Delete Upload Interface Screenshot, busy uploading.

     

  2. [Wayback/Archive] Kris on Twitter: “Guess who else is overloaded and currently vacations in Bad Gateway?”
  3. [Wayback/Archive] Kris on Twitter: “Tweet Delete Upload is broken with Firefox, and works only in Chrome. It accepts a tweet.js zipped into a tweets.zip, so recompressing the original file helps a lot.”

Archiving t.co redirect links in the Wayback Machine

Archived Tweets contain many t.co redirect links which will die when Twitter dies.

[Wayback/Archive] Michele Weigle (@weiglemc) created a script and instructions on how to archive your Twitter t.co links in the Wayback Machine.

The instructions start at [Wayback/Archive] Michele Weigle on Twitter: “If you’ve downloaded your Twitter archive, note that the “Your archive.html” page renders links in your tweets as t .co links. If Twitter dies, so does the t .co redirection service. See below to further preserve the links in your tweets. “ and are threaded at [Wayback/Archive] Thread by @weiglemc on Thread Reader App:

Image

Use this awk line to grab all of the t .co URLs in your tweets.js file:
awk -F '\"' '/\"url\" :/ {print $4}' tweets.js

I learned that if you have both an account on the Wayback Machine and Google Docs, that the Internet Archive has a service to amend a Google Sheets spreadsheet having URLs in the first column with a second column containing the archived URLs. How cool is that?!

There are even more Wayback Machine services you can find via [Wayback/Archive] site:archive.org/services -site:archive.org/services/borrow – Google Search (language search parameter via [Wayback/Archive] How to restrict a Google search to results of a specific language? – Web Applications Stack Exchange (thanks [Wayback/Archive] ZygD)).

Links from that thread and referenced threads:

Converting much of the archive to CSV

Many post-processing tools like CSV over JSON, so this tool is very useful:

[Wayback/Archive] mhucka/taupe: Taupe takes a downloaded Twitter archive ZIP file, extracts the URLs corresponding to tweets, retweets, replies, quote tweets, and liked tweets, and outputs the results in a comma-separated values (CSV) format that you can use with other software tools.

[Wayback/Archive] Mike Hucka (@mhucka) wrote it and mentioned it to me in

[Wayback/Archive] Mike Hucka – @mhucka@scholar.social on Twitter: “@jpluimers @waybackmachine @weiglemc You may also be interested in a small program I wrote for extracting URLs from Twitter archives: github.com/mhucka/taupe

Thanks!

Lots of thanks for this download conversion script to [Wayback/Archive] timhutton (Tim Hutton) / [Wayback/Archive] Tim Hutton (@tim_hutton)(for now) / [Wayback/Archive] Tim Hutton (@timhutton@mathstodon.xyz) – Mathstodon (for the future).

Later I found out that Jan also explained how to download and convert in [Wayback/Archive] @jwildeboer@social.wildeboer.net on Twitter: “Waiting for this to finish, put it in a safe place and then I will start deleting my tweets. There’s a nice script out there to markdownify that archive. I might put the results up someplace so my tweets can still be found.”

Image

The response shows a user that’s lucky for it taking only 26 hours for the download to become available: [Wayback/Archive] Marcus Schwarz 🇪🇺 on Twitter: “@jwildeboer Took 26 hrs until the archive was ready to download for me. Maybe the service is overloaded at the moment^^”

Via

  1. [Wayback/Archive] Kris on Twitter: “In case you have trouble parsing the last two Retweets: Run. Do not have assets on Twitter. Do not have dependencies on Twitter. Keep the account to lock the handle, if you want, but activate Plan B right now. Do not wait, do not hesitate.”
  2. {Wayback/Archive] Casey Newton on Twitter: “According to messages shared in Twitter Slack, Twitter’s CISO, chief privacy office, and chief compliance officer all resigned last night. An employee says it will be up to engineers to “self-certify compliance with FTC requirements and other laws.””
  3. [Wayback/Archive] Lea Kissner on Twitter: “I’ve made the hard decision to leave Twitter. I’ve had the opportunity to work with amazing people and I’m so proud of the privacy, security, and IT teams and the work we’ve done. I’m looking forward to figuring out what’s next, starting with my reviews for @USENIXSecurity 😁”
  4. [Wayback/Archive] Geoff Bowser on Twitter: “The idea of engineers self-certifying compliance with an FTC consent decree jumped out to me as patently absurd. So I found and read the consent decree. This 🧵 discusses how this policy violates that decree and why I believe these people had no option but to resign. 1/15”
  5. Jan Wildeboer after converting his archive too:
    1. [Wayback/Archive] @jwildeboer@social.wildeboer.net on Twitter: “Using [1] it took around 35 seconds to go through my 2.7GB #Twitter archive and generate 169 .md files containing 102917 tweets with 15926 media files referenced. It took #Jekyll 21 seconds to generate a website with static files from that. Nice! [1]”
    2. [Wayback/Archive] @jwildeboer@social.wildeboer.net on Twitter: “It’s being updated and extended as we speak. With the current version it correctly handles tweets with multiple images and generates #jekyll friendly filenames.”
    3. [Wayback/Archive] @jwildeboer@social.wildeboer.net on Twitter: “So thanks a gazillion, @_tim_hutton_ who is also already on the other side as …”
    4. [Wayback/Archive] @jwildeboer@social.wildeboer.net on Twitter: “I need to work on the css to make it all look shiny and stuff, but the basics are in place to make sure my twitter history is conserved in an open, transparent and cookie/tracker free way :)”

      Image

–jeroen


Large “This is fine #Twitter” creation video

It is also saved in the Wayback machine in case Twitter dies:

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.