Worldwide MS Outage

Worldwide outage this morning.

A faulty update from cybersecurity provider CrowdStrike is knocking affected PCs and servers offline, forcing them into a recovery boot loop so machines can’t start properly.

1 Like

I was thinking about how the CrowdStrike outage could have been handled better on workers’ individual computers.

While the bug only affected Windows PCs, there are some examples from Linux which would have made it easier to recover.

openSUSE creates BTRFS snapshots before every update. These snapshots are accessible from the bootloader menu. So, if openSUSE won’t boot then you can rollback to the last snapshot and use your computer normally.

One weakness of this system is that if the bootloader becomes corrupted then it won’t work. And you’ll have to use a rescue disc.

Vanilla OS is planning a feature which takes things one step further. They imagine using two independent filesystems and updating them independently. This would also protect the bootloader. This is similar to the way Android works.

Even with the bootloader protected, the disc itself could become corrupted. Or, some other issue can cause the device to fail. So, no system is perfect.

But in this case, with a software component updating itself, snapshots would have still been helpful because you could have at least rebooted into a previous snapshot which is much easier than manually fixing every machine.

2 Likes

The overriding reason why I use this system + onsite and offsite backups. My whole setup takes up a very decent amount of space, but it’s worth it for the peace of mind :smile:

2 Likes