return to table of content

Multiple airlines disrupted due to Microsoft Azure outage

crazytony
36 replies
11h39m

This rollercoaster is not over yet. There's a crowdstrike issue causing windows machines/servers to brick globally and this industry is heavily windows dependent. It may or may not be related to the Azure issue but it's suspicious to me.

https://www.reddit.com/r/crowdstrike/comments/1e6vmkf/bsod_e...

bun_terminator
14 replies
10h38m

I have never heard of crowdstrike. Is that some kind of antivirus? How is that related to PCs not booting? And why does it affect so many PCs if I've never heard of it? I'm so confused

shrikant
10 replies
10h32m

It's enterprise anti-malware that [in addition to other bits] has a client component installed on all PCs in the corporate network. An update to that client component (called an "endpoint") is causing those Windows machines to BSOD.

It's unlikely you'd have heard of it unless you've worked at a large enterprise that runs primarily Microsoft IT.

Crowdstrike does have Mac/Linux "endpoints" also (IIRC) but I'm unsure if they're affected as well.

vladvasiliu
2 replies
9h2m

Crowdstrike does have Mac/Linux "endpoints" also (IIRC) but I'm unsure if they're affected as well.

We have this crap running on our computers, and only Windows boxes seem affected.

On Linux this isn't running in kernel mode (our kernels are too up-to-date) and we don't seem to have any issue there.

Haven't heard anything about macs though.

red_Seashell_32
1 replies
8h50m

MacOS seems to be fine (or I was too late to get an update)

jurmous
0 replies
8h40m

MacOS does not allow kernel extensions anymore luckily

jachee
2 replies
9h52m

My company MacBook with the falcon client does not seem to be affected by this.

jurmous
0 replies
8h39m

MacOS does not allow kernel extensions anymore so these kinds of crashes cannot happen. The falcon client on Mac hooks into another layer

Klonoar
0 replies
9h43m

The problem is seemingly specifically in the Windows driver, you're unlikely to see an issue if you're not running Windows.

bun_terminator
1 replies
10h30m

I've never seen a non-Windows machine tbh. But our IT just send out an update that we don't use crowdstrike. Strange that I never heard of it if it's so widespread. But thanks

bravetraveler
0 replies
10h21m

You'll see this software more in highly regulated areas. Think Government, finance, travel. It exists mainly to check a compliance box.

The Windows claim is a little misleading. We used Linux where I last encountered this. I expect Windows is where problems are manifesting this time; BSOD and kernel panics with this aren't new!

CrowdStrike seemingly came out of nowhere but has existed for a while... I think it's suspicious.

Have we not learned from SolarWinds and company? The vendors become part of your posture. Consolidating far too much

andyjohnson0
1 replies
8h19m

Crowdstrike does have Mac/Linux "endpoints" also (IIRC) but I'm unsure if they're affected as well.

The problem seems to be in a device driver installed by Crowdstrike - so I'm guessing whatever the bug is, it's specific. to their Windows product.

vladvasiliu
0 replies
8h11m

Windows complains about some page fault or something in a file name csagent.sys. On my machine this file hasn't changed in several days, but the issue only happened this morning like for everyone else.

This looks suspiciosly a case of "let's download random crap from the web and run it in kernel space. what could possibly go wrong?"

shadow28
2 replies
10h34m

They make security software that is really popular in various industries.

moogly
0 replies
8h39m

Baffling name. Sounds more apt for a DDoS service.

davidmurdoch
0 replies
9h6m

They make malware that steals funds from corporations (willingly!) so these corps can tick a security checkbox for some certification investors have been told is paramount; it's just disguised as security software.

mirekrusin
8 replies
11h21m

Europe will wake up to flood of problems as well. This needs to be at the top of HN. We are experiencing multiple issues here in EU.

This issue feels extremely widely spread.

Maybe we don't hear ppl complaining because we're offline? :)

javaunsafe2019
2 replies
9h31m

Berlin airport is down

nicbou
0 replies
9h25m

A few others as well, it seems.

vesinisa
1 replies
9h6m

Maybe they have rolled back the update and Windows boxes in Europe are no longer pulling it?

vladvasiliu
0 replies
8h14m

They did around 8:30 CEST (6:30 GTM) as I understand it. Some of our servers managed to unbork themselves after a number of boot loops, but not all.

surfingdino
1 replies
8h48m

Ryanair is unable to check in passengers online. You can check in at the airport.

ben_w
0 replies
6h37m

I had to physically stand in a queue for about 8 hours for a Ryanair customer support desk in an airport when the airport runway was closed by 1-2cm of snow.

I forget the exact timing and can't be bothered to look up my notes, but it was something like 11pm to 7am at the origin airport for a flight that was supposed to have landed at the destination around 8pm, as we were also stuck on the runway for an hour or so and even getting that far had been delayed.

The replacement flight the next day was also cancelled even though the airport was open.

I ended up taking a ferry and a train, and that was still simultaneously faster than the next available Ryanair replacement flight and cheaper than any other provider on short notice. Fortunately I had an understanding boss who didn't mind me arriving 4 days later than expected, and also a place to crash for free while working out the best route home.

geek_at
0 replies
8h25m

it's already noon in central europe and yes everythings fucked. except for the linux powered companies

dagaci
5 replies
10h54m

On the wireless they are reporting a bad Crowdstrike update and a major Azure failover in central USA as separate events, are they they the same or different?

SSLy
3 replies
10h40m

They seem either unrelated, or the Azure one was caused by CrowdStrike.

pantulis
2 replies
10h21m

Too many black swans the same day, I'd guess Azure is running Crowdstrike software.

gliptic
1 replies
9h53m

Azure having problems is not a black swan event.

pantulis
0 replies
9h49m

Fair point.

dagaci
0 replies
10h28m

A whole lot of people are running Crowdstrike in the cloud and on local PC An crowdstrike update last night caused a windows kernel panic Azure/Crowdstrike personel have spend rolled back the update in the cloud Local IT people will have to revert it from local machines manually

maeil
1 replies
9h51m

Almost certainly Azure using Crowdstrike on Windows in one way or another.

Not surprising that AWS and GCP don't seem to be hit as they wouldn't run anything on Windows, unlike Azure, who I'm sure are forced to do so under MS' infamous interdepartmental structure.

raverbashing
0 replies
6h56m

Ugh.

Though I can image there's a Azure market for "Citrix server" kinda thing in the cloud

(or maybe it's SaaS - Solitaire as a Service)

classified
0 replies
8h9m

Companies using Windoze for anything touching customer business should get sued by their customers.

_the_inflator
13 replies
9h49m

The dawn of the hybrid model: on premise will be back soon.

The impact is vast: imagine being blocked by an hostile administration in this way. Disaster recovery of this magnitude is like a global pandemic.

AI won’t save us. Network topology and admins will have their comeback.

medion
5 replies
8h47m

It’s a general political trend too: de-globalization. Everyone sold the idea of globalisation and off-premise ethereal globalized cloud services. Both good when they work, but a total disaster when they don’t.

Decentralization is the way.

zmgsabst
2 replies
8h39m

There’s a reason that I admire the federal system of the US:

For all the US’s problems, devolving critical functions to layers of differing granularity has proven surprisingly robust to many faults.

I suspect we’ll see economic equivalents, where critical functions are spread around at various scales. We dont need to be either totally globalized or totally domestic.

Yizahi
1 replies
8h30m

Do you admire federated system of Germany too? They famously have many department operating semi-autonomous, causing immense friction in adopting any new changes. Do you admire federated system EU? Where countries can run unchecked for years and it is very hard to fix issues in any specific member country.

I'm not criticizing any country or union here, and not praising them. I merely highlighting that maybe federation on its own is not the main cause of success of USA and there are some other more important factors at play.

zmgsabst
0 replies
8h6m

When you centralize updates you get the outage which is the subject of discussion.

Or the Four Pests campaign.

Yizahi
1 replies
8h35m

Except that anything local will be more expensive than anything globally scaled, and the same people complaining about globalization "suddenly" don't want to pay more for the same.

It's not like alternatives for MS services doesn't exist for decades. There smaller and more people friendly hostings, email services, file shares, office software etc. The problem is that people complaining about MS services don't wan't to use them or pay for them.

jrs235
0 replies
5h11m

Yup. Resiliency and redundancy cost a little bit more money.

DanielHB
3 replies
9h39m

I dunno, I have been around and I never seen an on-prem infra being more reliable than your average cloud.

The only difference is that when on-prem goes down you can shout to your infra engineers, when cloud goes down you shout at your enterprise cloud representative. The first is more effective, but even with that it still doesn't achieve the same reliability and disaster-recovery of your average cloud provider.

tankenmate
0 replies
9h30m

With small events cloud engineers scale better, with large events local engineers scale better.

rcxdude
0 replies
9h38m

It also has the property of being less correlated with other failures, which may or may not be an advantage.

ocdtrekkie
0 replies
9h17m

I have never seen an on-premise environment down as much as Azure is.

seydor
1 replies
9h40m

Unless attacks take down infrastructure regularly, we won't go back to decentralized model. The internet itself was created decentralized to withstand a a war .

KineticLensman
0 replies
9h34m

No, the ‘internet’ was created to allow US defence researchers to access shared computers more easily. It adopted the packet switched model that had been developed theoretically to support command and control applications, but which was never actually implemented by the extant C2 providers (in a dentist’s waiting room now so don’t have links)

totallywrong
0 replies
5h39m

I wish. Only (mostly small) tech savvy companies might maybe make that move at some point. Herd mentality and short term convenience have already won that battle. AI will only add to that since for 99.9% of people LLMs and cloud are synonyms.

protosam
11 replies
11h39m

The article is pay walled. Seems like this would be the fault of the airlines though. There is a reason to be distributed between different geographic areas.

tanelpoder
10 replies
11h34m

But if the Azure outage is due to Windows machines crashing because of the currently ongoing CrowdStrike crash/reboot loop issue, then such servers might end up being down in all regions. Looks like there might be some advanced lessons to be learned about blast radius here...

switch007
7 replies
11h19m

Azure VM host machines are running CrowdStrike...??

tanelpoder
2 replies
10h29m

Probably not, but maybe they rely on some Active Directory server that is running it?

mnw21cam
0 replies
9h15m

And this explains why the (Linux) HPC that I'm using is having so many troubles today, and keeps complaining that I don't exist.

jaggederest
0 replies
9h32m

VM storage is probably on Windows Server, plus AD. I'd bet out of band management is all in the impact zone too. Might be back to someone pushing physical switches and hooking up a KVM.

bugbuddy
1 replies
10h48m

Maybe because Windows Defender Advanced Threat Protection is an enormous resource hog that scans every byte of memory and storage accessed by the Hypervisor and performs a quadratic time computation on the data? I am just guessing because my “fastest” Windows laptop CPU money could buy feels like a hot smelting furnace and a sloth at the same time when I use VMWare Workstation. What the &$@* is it scanning the VMWare guests for?

pjc50
0 replies
10h2m

Oh, Crowdstrike is also a massive resource hog.

pjc50
0 replies
10h2m

IF they're running windows, they'll be running it.

mgrund
0 replies
9h32m

More likely crash looping of so many VMs overloading some system with insufficient back pressure, possibly combined with unfortunate cluster management scheduler behavior at this scale of crash looping (e.g. too eager to retry scheduling instances, maybe even on new hosts which causes more infrastructure load).

politelemon
1 replies
11h15m

It's more coincidental than likely, since these are managed services that were down, while the crowdstrike issue is closer to company deployments.

But never say impossible, it's best to wait and see what the actual problem is rather than throwing shade so early.

politelemon
0 replies
6h37m

Update. Throw shade. They've confirmed it. I was wrong.

FearNotDaniel
6 replies
10h53m

A side issue, but since we’re on the subject of global tech being generally fscked: I’m currently on holiday in Italy and just discovered the entire archive.ph domain is blocked by the government, apparently due to kiddie porn. Shrug emoji…

amarcheschi
3 replies
10h28m

I'm in italy and I can access archive.ph

Perhaps just the ISP you're roaming with blocks the site?

mr_world
2 replies
9h17m

There was a law passed that allows ISPs to block sites that host copyrighted content illegally. It's not just Italy, also the Netherlands and many other countries. You can still access it with a vpn or tor. Tor browser on the phone works fine for mobile carriers that block the sites.

giucal
0 replies
8h51m

You don't even need a VPN. Just replace ISP's DNS servers with, say, Cloudflare's

    1.1.1.1
    1.0.0.1
or maybe OpenDNS's.

[Edit: fixed wrong fallback address.]

FearNotDaniel
0 replies
1h15m

Yes that seems to be the case - the blocking page is headed with "Ministero dell'Interno - Dipartimento della Pubblica Sicurezza - Direzione Centrale per la Polizia Scientifica e la Sicurezza Cibernetica - Servizio Polizia Postale e Sicurezza Cibernetica" along with official looking government logos but text underneath (in Italian and English) talks about a collaboration between govt agencies and ISPs. From mobile cell service it was not blocked.

The text does however mention it's a specific measure against child pornography, not re-hosting copyright content.

temporarely
1 replies
8h38m

what does archive.ph have to do with that? Isn't it just a way to get around paywall?

tkubacki
6 replies
10h59m

That’s why it’s important to demonopolize desktop OS market. Microsoft should be divided long time ago

arcxi
2 replies
10h13m

nobody is forcing these companies to depend on MS products and services

logicchains
1 replies
9h3m

Many in regulated industries actually are forced to use crapware like Crowdstrike, to tick a box on the security checkbox.

arcxi
0 replies
6h45m

if they'd used crowdstrike on linux they wouldn't be hit with BSOD bootloops now

YeahThisIsMe
1 replies
10h11m

How would that change anything about Windows market share?

switch007
0 replies
10h0m

You don't think the bundling of windows with pcs has anything to do with their market share?

BSDobelix
5 replies
11h6m

I love it! Just keep putting all your eggs in one basket, because at the end of the day, it's not your fault, but Azure's.

SteveSmith16384
2 replies
8h45m

"No-one ever got fired for choosing Microsoft".

undebuggable
0 replies
8h12m

Although many got locked out for choosing them.

BSDobelix
0 replies
5h47m

Well, the older version is about IBM, and it would probably even be true (today) if we were talking about mainframes, because they are one hell of a stable basket ;)

switch007
1 replies
10h1m

I've had large customers seriously see that as a big appeal, when we were selling them some IT. They loved being able to blame someone else!

tormeh
0 replies
9h9m

The corporate training always says you are responsible for your chosen cloud vendor's problems, but in reality everyone assumes they can successfully blame the vendor.

openopenopen
3 replies
11h13m

Does Microsoft use canary deployments? or was it deployed on everything?

pplante
2 replies
10h57m

Sounds like it's more of a problem caused by Crowdstrike. I'm sure azure fell over when millions of servers all freaked out simultaneously.

jaggederest
0 replies
9h31m

Good lesson in rollout and heterogenous software stacks.

eecc
0 replies
9h24m

I guess a canary rollout of the Crowdstrike patch would have contained the impact of this.

dools
3 replies
10h5m

I'm so old that I was like "Do that many people really play Crowdstrike?" then I realised that's Counter Strike, then I looked up Counter Strike and it came out 24 years ago.

yells at cloud

robertlagrant
2 replies
10h1m

And yes, they do! Counterstrike 2 is out now, to a mixed reception.

ClassyJacket
1 replies
9h13m

Which is actually Counter Strike 4

lesuorac
0 replies
8h44m

Heh, can't wait for Google to release Angular 3.

lifestyleguru
1 replies
8h24m

Obviously you've never been working at enterprise purchasing department and participating in business meetings with Microsoft sales people.

benterix
0 replies
7h52m

I see no relationship between understanding that Azure has a poor security posture and participating in business meetings with salespeople.

totallywrong
0 replies
5h3m

Yeah, you're completely out of touch of the industry. As horrendous as it is Windows on Azure (and the entire ecosystem it comes with) is an easy sell for the dinosaur IT leaders that haven't seen a Linux server in their life. You'd be surprised how common that is.

switch007
2 replies
11h17m

Better described as a worldwide IT outage https://www.bbc.com/news/live/cnk4jdwp49et

Sky News UK is off the air. Some UK train companies are having an IT outage. Berlin, Melbourne airports flights disrupted...

vergessenmir
1 replies
10h38m

Waitrose tills and card machines aren't working

switch007
0 replies
10h3m

OK now the outage is getting real. How will Tarquin get his cinnamon and gooseberry yoghurt?

kachinga123
2 replies
11h26m

No flights at Germanys Berlin BER airport as well

lifestyleguru
1 replies
8h51m

I bet BER staff is screaming at the passengers right now "you wanted this, it's your own fault".

kolme
0 replies
5h4m

Oh, you know them as well.

totallywrong
1 replies
4h58m

Interesting that I happened to transit 3 different Asian airports today and had zero issues. I haven't seen anything at all related to the outage over here.

angra_mainyu
0 replies
4h53m

Hmm, maybe they've migrated their systems? I know SK had a plan to migrate govt machines to Linux.

totallywrong
0 replies
4h52m

How is this update released at all? Something that affects literally every machine does not get caught in testing? From a 80B company no less.

totallywrong
0 replies
5h8m

Antivirus company causing exponentially more harm in hours than it's prevented during its entire existence.

surfingdino
0 replies
8h39m

So, anti-malware software turns out to be malware? How odd. SecOps out on anti-open source training, I presume?

steve1977
0 replies
8h57m

I wonder if these airlines were really affected by that Azure problem or if they were affected by the CrowdStrike issue and were just mixing things up.

robertlagrant
0 replies
9h7m

Don't buy tills that run Windows. You must be crazy.

kaptainscarlet
0 replies
9h44m

This is why I sometimes think on prem hosting is better.

javaunsafe2019
0 replies
9h28m

Berlin airport is down. I guess it’s related

davidmurdoch
0 replies
9h9m

Waiting in a queue at an airport in Palma, Mallorca, Spain right now and the check in staff are currently flipping through printed sheets of paper to check us all in. It's going to be a very long wait.

adamsiem
0 replies
8h55m

ALB airport down. Check in systems are in BSOD ‘Recovery Mode’.