• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Catastrophic Internet Failure

Gord_in_Toronto

Penultimate Amazing
Joined
Jul 22, 2006
Messages
26,460
This is being discussed in the Computers and the Internet sub-forum as a technical issue.

But the impact is far wider than that.

Global IT chaos persists as Crowdstrike boss admits outage could take time to fix

I think this illustrates a severe vulnerability in that civilization has come to depend on the Internet working. If it stops, everything stops.

From the link:

Tesla and X boss Elon Musk earlier branded today's outage as the "biggest IT fail ever" - but is he right?

In terms of immediate impact on people, it’s hard to think of a worse one. No other incident has affected such a broad swathe of industry and society.

The most recent mega outage was when Meta, the company that owns Facebook, WhatsApp and Instagram, fell over in 2021. That affected billions of social media users as well as millions of businesses.

But this Crowdstrike outage is on another level. The closest case we’ve had is all the way back in 2017 when two deliberate cyberattacks took hundreds of thousands of computers offline, and had a massive impact on NHS services.

But again, this incident has potentially affected many more computers and businesses around the world. The true test to see if Musk is right will be how quickly it takes for normality to return, and how much the clean-up will cost.

There are some bad actors who might be able to trigger a worse failure. If whole chunks go down for days, the cost in dollars and lives would be considerable.

:scared:
 
First hand report from Mrs. Mike!. It's brought the chemistry lab at the biggest hospital in the area to all but a grinding halt, which is bad if you are suffering a medical emergency that requires lab tests.
 
If we are looking for a valid reason why cash must never be eliminated, we just found one.

I love Star Trek, but always thought the "we got rid of money" bit to be really stupid.
At to the incident, Civilzation has always been dependent on a couple of industires, which if they suffer disaster, means disaster for the whole society. Aofr a long time a bad harvest meant posting the famine imminent messages.
 
Last edited:
But a lot of damage was done.
I should have known we would here from the Linux fanboys.

Just correcting your obvious error.

Microsoft had a major Azure outage.

Crowdstrike pushed an update that broke many WINDOWS systems.

Especially those that rely on external hosts being available for the computer to boot.

Linux and the internet were not affected.
 
It's interesting that Microsoft are advising that virtual machines may be recovered by rebooting them 15 times.

Speculation is rife that the corrupted .sys file becomes more corrupted with each boot, until it fails so badly the system is able to boot without being able to read it.

I'm intrigued that it is not possible to replace the corrupted file via Azure management consoles, but I haven't had to use Azure for more than two years now, so am hopelessly out of touch.
 
Just correcting your obvious error.

Microsoft had a major Azure outage.

Crowdstrike pushed an update that broke many WINDOWS systems.

Especially those that rely on external hosts being available for the computer to boot.

Linux and the internet were not affected.

Absolutely. The thread title is completely wrong.

I had no idea there was a "major internet outage," since the internet (DNS, POP3, HTTPS, etc) were working with no problems at all.

Also, anyone running MacOS, Android, or iOS, or people running Windows but without Crowdstrike (which I had never heard of until today) weren't affected.
 
Absolutely. The thread title is completely wrong.

I had no idea there was a "major internet outage," since the internet (DNS, POP3, HTTPS, etc) were working with no problems at all.

Also, anyone running MacOS, Android, or iOS, or people running Windows but without Crowdstrike (which I had never heard of until today) weren't affected.

But it DID affect banking systems, airlines, television networks, hospitals, retail outlets and government agencies. That is far more serous that just a few punters not able to get their internet fix.
 
Last edited:
... and now scammers are trying to take advantage of the outage

https://www.nzherald.co.nz/nz/crowd...rs-take-advantage/A3MJIFYNXVFQ7CXGLE26B7CUMI/

Opportunistic “malicious cyber actors” are trying to take advantage of the global IT outage to rip off unsuspecting users online, the National Cyber Security Centre (NCSC) says.

The Government’s cyber intelligence agency today warned Kiwis to be vigilant as individuals and organisations’ IT systems are slowly returned to normal...

"The NCSC has no information to indicate these [outage] issues are related to malicious cyber security activity", a spokesperson for the centre said today.

"However, there has been an observed increase in phishing referencing this outage as opportunistic malicious cyber actors seek to take advantage of the situation. We encourage organisations and individuals to be alert to this activity."
 
Don't know why the downed systems have to be manually fixed one-by-one, taking hours or even days.

At least 20 years ago I was using software that could bulk-update (or indeed, bulk image) dozens of servers at one time, including streaming in scripted updates. All via the server management console (all servers have one). So a bit of scripting to wipe the affected files in Safe Mode should be very simple. All affected severs are going to need rebooting anyway. So a couple of reboots later - maybe 30 mins? - should have seen the vast majority of them back in action. Of course there would have been a few weirdo one-of-a-kind stragglers, but they could have been done manually alongside the automated fix run.

Which begs the questions:

1) Why was this fix sent out without proper testing? Because obviously it was. Surely a global release would be tested thoroughly beforehand? Crowdstrike and Microsoft can afford at least a handful of sandbox test platforms to smash releases against a few walls to see what breaks. So what of a test regime?

2) Why did these major organisations allow MS-backed updates WITHOUT testing it themselves? Seems they just blithely rolled them out on production platforms with no prior screening. Do I think this will change very soon? I sure hope so...
 
It's interesting that Microsoft are advising that virtual machines may be recovered by rebooting them 15 times.

Speculation is rife that the corrupted .sys file becomes more corrupted with each boot, until it fails so badly the system is able to boot without being able to read it.

I'm intrigued that it is not possible to replace the corrupted file via Azure management consoles, but I haven't had to use Azure for more than two years now, so am hopelessly out of touch.

No, there is a mechanism, where the machine fails to boot for 15 times, it reloads last known working drivers configuration.
 
Honestly, it's hard to imagine how it could become more corrupt than already is, without getting elected to the Senate or something ;)
 
Just to kinda duplicate what I wrote in the computers and internet thread, the file doesn't even contain any code that could do anything to cause even more corruption, or really do anything. It's just a long block of 00 bytes, including the DLL file header. (A driver is just a renamed DLL.) It's actually that corrupt header that causes the loader to crash when trying to load it. No code in the driver itself was even executed at this point. Except since it's a system driver, loaded by the kernel with kernel rights, this causes the kernel to fail.

TL;DR version for the everyman: it IS an exploit/attack on the Windows DLL loader, using a malformed file. Just it needs to be a system driver to work, and an update to an anti-virus provided that setup.
 
Last edited:
As I explained in the computers and internet thread, it might actually have been a cyber attack.

Having been a professional tester for 7 years it is hard to imagine that getting through test.
Then I remembered some of the idiocy the build teams did "packaging" the final build. They were supposed to report any errors to the appropriate team to fix then the fixed code would go back through the test cycle. Unfortunately there's always someone who knows better.
 

Back
Top Bottom