The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My work

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,346 other followers

Happy “check your backups day”; does your restore process work? And how is the rest of your admin process doing?

Posted by jpluimers on 2018/02/01

Today is [WayBack] Check your backups Day! started by @CyberShambles in dedication of the @Gitlab outage on 20170201.

Please check your restoration process now. As people screw up and accidents happen (I know first hand from a client).

Why isn’t this date on January 31st? Long short story: the failure started that date, but restoration took most of 20170201. So February 1st it is.

Others will follow and GitLab wasn’t alone, as a few days before soup.io had to restore a 2015 database backup.

It all comes back to

Nobody wants backup.

Everybody wants restore.

which made it to the 2008 [WayBackadminzen.org – The Admin Zen and has been attributed to various people including [WayBackto Kristian Köhntopp and [WayBackto Martin Seeger who told Kristian Köhntopp that it was coined by Sun’s Michael Nagorsnik at one of the early [WayBackNuBIT. Martin was there; he knows (:

The oldest mention of the phrase I could find was in 2006 by Volker Bir at [WayBackSpy Sheriff – so how do people get infected w/ this thing?.

Keeping clients in the loop

Since soup.io hosts their updates blog on their own platform, the restore resulted in the post prior to [Archive.isUpdate after crash ;) – Soup Updates sort of ironically being the mid-2015 [WayBackGive us your money! – Soup Updates. Usually dogfooding is a good thing though.

During such a downtime, it is crucial to stay in touch through alternative channels. Soup.io didn’t do a good job on their twitter account: they only announced the “update after crash”, not being down, why or progress.

They also deny the WayBack machine access to updates.soup.io because of [WayBack] robots.txt because how they redirect through /remotes, but luckily Archive.is doesn’t care about that and has less old updates.soup.io archived as recent as end of 2015.

GitLab did a much better job on their GitLabStatus account.

Postmortems and organisation culture.

Everybody can screw up, and usually a severe outage happens even when everybody tries to do the right thing. The only way to learn from it is to have [WayBackBlameless PostMortems and a Just Culture – Code as Craft.

These postmortems are invaluable as you will fail, despite using everything from [WayBack] AdminZen:

The Admin Zen

Keep it up and running.

  1. [WayBack] Know your tools.
  2. [WayBackAnticipate.
  3. [WayBack] Expect problems.
  4. [WayBackDesign it.
  5. [WayBackScale.
  6. [WayBackBackup.
  7. [WayBackCommunicate.
  8. [WayBackDocument.

Grab the [WayBackPDF or [WayBackPNG printout

It isn’t without reason that you find a lot of WayBack or Archive.is links on my blog: hopefully they will cover as backup when any of the original links disappear.

It can happen to anyone

I had a similar issue at a client where I was hired to fix bugs in their server side software. When accepting, I asked them if they had a good backup-restore procedure which they confirmed and outsourced.

Though I was not supposed to perform any infrastructure tasks, when investigating production performance, I had made manual backups of the production database and some of the most important files. Days after that, the RAID array of the main production server collapsed and they found out the outsourced backups restore only produced one from 3 months old.

Manually stitching everything together took more than a week. About a week later, they had hourly backups on different hardware and daily transfers to an external offline location.

The GitLab story

Back to the GitLab story: some great write-ups on the story are in the links below. Recommended reading.

Oh: and the reference pointing me to soup.io: [WayBackPost like it is 2015 – The Isoblog.

One of the issues they want to solve is to make clear on which system they are working.

Part of that could be using liquidprompt using this ~/.config/liquidpromptrc (or ~/.liquidpromptrc) setting:

LP_HOSTNAME_ALWAYS=1
LP_ENABLE_SSH_COLORS=1

That will always show the hostname and have different prompt colours on each host.

Other references

–jeroen

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: