Pi Massacre

This site is hosted on a Raspberry Pi 3, I like to tinker with these things. I don’t use this particular Raspberry Pi for anything else but the WordPress server on which this site runs for fear of breaking it.

However, I usually update/upgrade the underlying operating system on a regular basis to ensure it’s running the latest and greatest fixes for peace of mind. This includes runningsudo rpi-update to update the firmware to the latest and greatest. Today I had a panic inducing incident:-

Updated like normal, rebooted, waited to check if engineervsheep.com came back online after a few minutes (like it always does)…. nothing.

Ok, tried logging in via ssh… nothing.

Tried pinging server internal IP… nothing.

Realised I don’t think I ever created a backup of the SD card, or even my WordPress database… oh shit, this happens to other people.

Rebooted again, as I’d noticed on some other RPi’s I have round the house, that sometimes they require another hard reboot after an update to the firmware… repeated all of the above… nothing.

I was starting to panic, I mean I can set it up again, but it was the effort I’d put into the content without a backup that was a bugger. What a dumbarse, kicking myself!

I jumped on the google, seems it happens from time to time. Firmware commits seem to cause some issues or things get naturally corrupted. I also blindly updated several other RPi’s prior to doing the update on the WordPress server, I had not checked those were successfully rebooted. So attempting to log into those machines I also got the same radio silence from VNC/ssh/ping. Wiped out all but one of my Pi’s it seems, a Pi massacre. Seemed I’d hosed all RPi’s except one I had not updated yet… shit shit shit.

You know how it goes, worst case scenario gets even worse. Bloody corrupt firmware or something I guess, don’t they test this stuff… grrr. Too much of a coincidence that multiple machines suffered the same fate and it was anything but the firmware.

in the last day there had been a few commits, perhaps the cause of the issue I was seeing?

Onto google, apparently the firmware isn’t a hardware flash thing like a PC’s firmware or bios, so it’s not the hardware that’s borked, its the boot partition on the SD card.

Seems the boot partition holds the firmware files, boot partition is FAT32 and can be read in windows thankfully. Because its software in nature, I was guessing that I might be able to fix this. I few searches later, it was looking as simple as copying off the files that were there, reformatting the partition, and then copying the files back, replacing some of the critical boot files from the official rpi-firmware repository.

Clearly it worked because here I am, having lost nothing except time. I guess it was either one of those random things, or they actually reverted some change. Because the latest files grabbed from the official repository got me back in business.

In the hope it helps someone else at some point, I found this advice by the user dom back in 2013 to be the most helpful as it saved me from my stupidity. In my case I was missing the start.elf file, but I copied across all the minimum files they recommended (bootcode.bin, start.elf, fixup.dat and kernel.img) and crossed my fingers. I had no idea if this latest version fixed the issue, or was the cause of the issue. Waited, logging in via ssh, success!

Wait for web server to come up… nothing.

Try start web server, can’t start due to some configuration error, what the hell, it was working, don’t tell me it is truly corrupted something. My heart sinks once again, no backups = screwed. I don’t have any other Linux machines that would be able to third party access the 2nd ext4 partition on the SD card either (though I’m sure I could virtual machine something up to do so).

Turns out via google the cryptic error message I got meant that I inadvertently/intentionally turned off IPv6 somehow a while ago when setting up the domain name. But had not done the same in part of the web server config, and probably hadn’t rebooted the server or restarted nginx since this time (though I swear I had done so several times). Fixed that and up and running, no further issues.

It took an hour or so sorting it out, and I still have a number of dormant RPi’s around the house, but at least now I know what the fix is, and its easy to deal with. Just another hour or so of my life I won’t get back, damn computers.

Probably should start thinking about backing up something, though to be fair the issue wasn’t strictly related to this, but I guess it would have helped if things were truly stuffed as I’d feared.

Also noted on the firmware repository that RPi4 is out, cool. Must look into that another day, probably should go do some work.

Must resistsudo rpi-update in future…

Leave a Reply