
Today is [WayBack] Check your backups Day! started by @CyberShambles in dedication of the @Gitlab outage on 20170201.
Please check your restoration process now. As people screw up and accidents happen (I know first hand from a client).
Why isn’t this date on January 31st? Long short story: the failure started that date, but restoration took most of 20170201. So February 1st it is.
Others will follow and GitLab wasn’t alone, as a few days before soup.io had to restore a 2015 database backup.
It all comes back to
Nobody wants backup.
Everybody wants restore.
which made it to the 2008 [WayBack] adminzen.org – The Admin Zen and has been attributed to various people including [WayBack] to Kristian Köhntopp and [WayBack] to Martin Seeger who told Kristian Köhntopp that it was coined by Sun’s Michael Nagorsnik at one of the early [WayBack] NuBIT. Martin was there; he knows (:
The oldest mention of the phrase I could find was in 2006 by Volker Bir at [WayBack] Spy Sheriff – so how do people get infected w/ this thing?.
Keeping clients in the loop
Since soup.io hosts their updates blog on their own platform, the restore resulted in the post prior to [Archive.is] Update after crash ;) – Soup Updates sort of ironically being the mid-2015 [WayBack] Give us your money! – Soup Updates. Usually dogfooding is a good thing though.
During such a downtime, it is crucial to stay in touch through alternative channels. Soup.io didn’t do a good job on their twitter account: they only announced the “update after crash”, not being down, why or progress.
They also deny the WayBack machine access to updates.soup.io because of [WayBack] robots.txt because how they redirect through /remotes, but luckily Archive.is doesn’t care about that and has less old updates.soup.io archived as recent as end of 2015.
GitLab did a much better job on their GitLabStatus account.
Postmortems and organisation culture.
Everybody can screw up, and usually a severe outage happens even when everybody tries to do the right thing. The only way to learn from it is to have [WayBack] Blameless PostMortems and a Just Culture – Code as Craft.
Read the rest of this entry »
Like this:
Like Loading...