return to table of content

CrowdStrike's impact on aviation

rdtsc
53 replies
21h56m

From the included link: https://www.techradar.com/pro/security/southwest-airlines-av...

To give you an idea of just how outdated this operating system is, Windows 3.1 was originally launched in 1992, and Microsoft ended support for it on December 31, 2001, except for the embedded version, which was officially retired in 2008.

I keep hearing the Windows 3.1 story repeated. I mean here it comes from TechRadar and even has the "Pro" in the name, they can't possibly make stuff up, right? But still don't quite believe it.

Can anyone working at Southwest confirm that their main scheduling system is running on Windows 3.1?

dsr_
25 replies
21h50m

Tech Radar quotes Tom's Hardware; Tom's Hardware quotes a tweet.

Not a tweet from Southwest, mind you. Not even a tweet from someone who says that they used to work for Southwest. Just... a tweet.

shombaboor
12 replies
21h39m

I just wish there was some type of identifiable credit / penalty system for writing accurately as a news source. And this would include quotes / retweets. Never been a better time to be wrong about everything.

JumpCrisscross
6 replies
21h38m

wish there was some type of identifiable credit / penalty system for writing accurately as a news source

Good starting point is if the news is free. A shocking fraction of people get their news from solely free sources.

kspacewalk2
2 replies
20h43m

What's "solely free"? Does the ad-driven model count as free? Why do you think an outlet that works for you will necessarily deliver better quality news that the one that works for advertisers? There are obvious bias downsides to both.

sxg
0 replies
20h26m

The ad-driven model does count as free, and it's far less likely to deliver better quality news than a subscription service users pay for. The core metric for ad-driven news sites is maximizing views—it doesn't matter how you get views as long as you get them. This means free sites are heavily incentivized to be the first to break a news story even if the details are wrong or sparse. Sure, they'll issue corrections and updates later, but only a small percentage of the initial viewers will ever see these, and there's essentially zero cost for having made the mistake.

The core metric for subscription news sites is minimizing churn. A mistake will cost a subscription site subscribers who have a massive lifetime value. These sites are heavily incentivized to report high quality, accurate news even if they're not the first to break the story.

JumpCrisscross
0 replies
18h29m

What's "solely free"? Does the ad-driven model count as free?

Yes, in this context.

Why do you think an outlet that works for you will necessarily deliver better quality news that the one that works for advertisers?

I can’t explain the mechanics precisely. But it’s pretty clear when I compare my subscription and non-subscription sources where the quality lies.

mewpmewp2
1 replies
21h21m

And why would someone put in effort for free?

cgriswald
0 replies
17h47m

This is a misunderstanding of the problem. Effort is made in both cases. In one case effort is made to find verifiable truth as a service. In the other effort is made to provide eyeballs to advertisers.

torginus
0 replies
8h23m

Yet paying for news is a very weak guarantee of not being fed propaganda/inaccurate reporting.

If we held food safety to the same standards as paid news sources are held, people would get salmonella once a week.

treflop
1 replies
20h33m

There just isn’t. You just have to read enough of one source to determine your own opinion.

Just like with anyone you meet: you are the judge if they are trustworthy, nice, mean, funny, etc.

That said, I think tech journalism is the bottom of the barrel. I just feel like they focus more on tech than journalism.

shombaboor
0 replies
16h28m

the cost of producing bs is too low, back in the day it would at least require time and money to print / distribute.

Analemma_
1 replies
21h11m

Your cure is worse than the disease. The second such a system existed, it would be gamed to hell and back, and nobody would believe it anyway since they'd all angrily insist that "you shouldn't have counted X" or "you should've counted Y more" and it would just turn into a war over who got to control the system and use it to deplatform their enemies.

andrewflnr
0 replies
20h33m

It doesn't have to, and indeed shouldn't, be a single system. We'd rather have a handful of independent news checker orgs, maybe some topic-specific ones. Funding remains an exercise for the reader.

mardifoufs
0 replies
19h41m

Community notes on twitter is the closest thing to what you're describing I've seen yet. It's been very helpful too imo

thereddaikon
7 replies
21h33m

A great example of why people don't trust journalists anymore. They don't even perform a basic amount of fact checking before publishing.

ThrowawayTestr
2 replies
15h50m

People don't pay for news anymore so we get what we pay for.

shiroiushi
1 replies
14h1m

People never paid for news really. If you're thinking of the days when you had to pay 25 cents for a newspaper at the convenience store, that didn't come even close to the cost of running a newspaper in those days. Your quarter only covered (maybe) the cost of the paper and printing it. These days, we don't need paper, and running a web service is probably cheaper per-reader than physical paper.

Newspapers got the bulk of their funding from advertising back then, just as they do now.

ThrowawayTestr
0 replies
53m

The important thing is you were able to justify not paying for stuff.

torginus
0 replies
8h20m

Also, there is the effect of a lie oft repeated becoming the truth - the times I've seen small outlets writing nonsense, and having it picked up by progressively bigger papers citing the smaller ones as credible sources is too much to count.

Generally there is a chain of trust in news publishing that goes nowhere and there's nothing we can do about it, as more often than not, someone credible repeats the hearsay nonsense down the line, at which point they count as a primary source.

So much of news publishing I would describe as not even wrong.

jxy
0 replies
21h5m

I don't trust any kind of generalization like this, which only serves further disinformation and misinformation.

There are bad journalists (if they can be called journalists at all) and good journalists. At this point in history, our only hope lies with diligent reporters from reputable publishers.

colechristensen
0 replies
21h18m

Articles from the likes of Tech Radar or Toms Hardware I would trust to a higher standard than a random tweet, but really I wouldn't label them as "real journalists"

I question the ethics and standards of the New York Times at least a little at this point so it's not like great journalism is common.

Bud
0 replies
21h20m

It's unfair to pretend that all journalists have the same level of professionalism (or lack thereof) with regard to sourcing.

They don't.

nostromo
1 replies
20h6m

I've started seeing this on Wikipedia.

Wikipedia sources an article from a semi-legit source. That semi-legit source either just says "sources" or points to something less-legit, like a Tweet.

You can bring new "facts" into existence by just laundering them from lower- and lower-quality sources.

Arrath
0 replies
17h14m

Source-laundering is a bit catchy, I have to say.

Terr_
1 replies
21h18m

It's kind of depressing to think that we have had this world-spanning system of knowledge and "hyperlinks" for decades now, individual pieces that should've enabled an easy chain of attribution/citation...

Y_Y
0 replies
20h20m

And encourage the reader to move away from your site‽ No self respecting PHB could condone such a thing.

JumpCrisscross
20 replies
21h52m

keep hearing the Windows 3.1 story repeated

It’s wrong [1] and serves as a litmus test for whether an outlet independently verifies its claims.

(“The systems [Southwest] developed internally, SkySolver and Crew Web Access, look ‘historic like they were designed on Windows 95’.” That got mangled into they run 3.1.)

[1] https://www.osnews.com/story/140301/no-southwest-airlines-is...

xp84
11 replies
21h18m

Wow, that’s even more frustrating considering it’s conflating an unfashionable UI (which I’d argue is a good thing, since all modern UI trends are towards slick, minimalism-worshiping messes which hide everything from users) and old, provably-flawed technological foundations (like a 16-bit system without things like filesystem access control or memory protection).

I knew this story was false immediately though because no company ever even in 1993 had production server systems which ran a desktop OS like Win 3.1. It just wasn’t up to the task. They would have used NT if anything.

btown
8 replies
20h46m

http://www3.alpa.org/LinkClick.aspx?fileticket=IO7kd%2Bfm2Do... shows the system as of 2020. To the parent’s point, it’s actually quite a reasonable UX, with colored outputs, filter banks, and just enough abbreviations and whitespace to balance density with intuitiveness.

But that doesn’t mean this is the only modern design system that meets those requirements. And conflating all modern UI with consumer design trends is an equally frustratingly broad statement.

qingcharles
4 replies
20h16m

OK, this is definitely unfashionable looking if your main exposure to apps is the latest doodah on your phone that was literally updated yesterday.

Very standard looking legacy Win32 looking app. Which, admittedly, would have probably look very similar had it been on Windows 3, but is probably running on LTSC Windows 10 or something in reality.

numpad0
3 replies
11h55m

Doesn't look Microsoft at all to me, just colored to mimic XP. Java on some Unix?

goodcanadian
1 replies
8h53m

Perhaps you just aren't old enough? It looks very Windows 95 to me.

stoltzmann
0 replies
4h0m

Age has nothing to do with it, the interface just doesn't look like Windows 95.

The button shapes, minimize/close window buttons, the titlebar are all looking wrong for Windows 95.

It looks significantly more like Swing, but then the buttons don't match that either.

mjevans
0 replies
3h22m

Page 7 (as labeled) of the slides. The tabs and checkboxes layout have a distinctly Win 9x era look/feel. I do agree that it's missing an obvious menu, and the theme for the window decorations reminds me of win 3.1, but that was probably an option for software of that era just as it is in this if someone pushes hard enough.

veggieroll
0 replies
20h21m

Link worked for me but took a long time to load. It just seems like their server is overloaded.

quotemstr
0 replies
20h28m

Broken link

Shorel
0 replies
12h20m

It looks like every single hospital or car rental software I have managed to peek.

It's not old-fashioned, it is _timeless_ B)

cjbprime
1 replies
20h50m

Windows 95 is an "unfashionable" OS which has not received any security updates since 2001.

andrewxdiamond
0 replies
20h44m

Yes and the fact that my software’s UI looks like Windows 95 makes it vulnerable to all the same security vulnerabilities.

/s

The systems don’t run on W95, they look like W95

zitterbewegung
2 replies
20h28m

I know this is a hot take but companies have to figure out if modernization of a UI will be worth it to retrain everyone in the new UI. Many people were involved with its creation and maintenance and due to its age the UI may have a large amount of glue code that can't be separated unless you build an API around the other software. Especially if there is some kind of change in the system that moving off the old one is meaningless. Southwest is also making changes to their operations so they probably might be in maintenance mode for the software especially when the outage of their current software was done since they will have to not have anyone choose any seat at this time. [1]

[1] https://www.cnn.com/2024/07/25/investing/southwest-airlines-...

stavros
1 replies
19h40m

I don't know, I like the classic Windows UI. I don't think modern UIs are an improvement on that.

suzzer99
0 replies
19h14m

No no no. We must now have floating headers that don't give any indication they belong to the columns below them, much less that you can click them to sort the columns. 95% of possible actions must only appear when hovered over. Buttons should not look like buttons, nor should they provide any feedback that they've actually been clicked. Etc.

jjwiseman
1 replies
21h42m

Thanks, I updated the post.

kragen
0 replies
21h8m

i miss the lemonodor blog

xarope
0 replies
14h52m

to be fair, some of the java-era software with their default toolkits do look very windows 3.1/95'ish (all that blue and teal)

spookie
0 replies
21h3m

Being blasted by media for running your own software, incredible. As others have commented, just a single tweet was enough to propagate this story. Quite concerning how easy it is to fake reality nowadays.

madeofpalk
0 replies
20h58m

This is the same as the "Olympic cardboard beds are anti-sex" fake story that persisted. Anyone who publishes it demonstrates they don't actually research.

umvi
0 replies
21h20m

Can anyone working at Southwest confirm that their main scheduling system is running on Windows 3.1?

I can't confirm that, but I can certainly confirm lots of hospital equipment is still running Windows XP and lots of hospital personnel browse the internet with Internet Explorer.

technick
0 replies
12h46m

I worked for SITA ( https://en.wikipedia.org/wiki/SITA_(business_services_compan... )back in the late 2000's. They had a massive X25 serial network connecting airlines across the globe. Some of its customers were still running Windows 3.11 in the data center on old AT system. We would buy old computers on craigslist and ebay to keep hardware around for when it failed. I wouldn't be surprised if those systems are still in use today.

ponector
0 replies
20h41m

This story is another example how hallucinations from LLM can successfully replace many "news" portals.

brianpan
0 replies
20h14m

The San Francisco subway runs off of 5-inch floppy disks.

https://sfstandard.com/2023/02/02/sfs-market-street-subway-r...

That article links to an (only slightly older) article about British Airways loading navigation updates every month off of the fancy new 3.5-inch floppy disks.

bustling-noose
40 replies
14h34m

The outage highlighted a different kind of digital divide. On one side, gmail, Facebook, and Twitter kept running, letting us post photos of blue screens located on the other side: the Windows machines responsible for actually doing things in the world like making appointments, opening accounts, and dispatching police.

At this point using windows for these tasks seems like using legacy software because training people to use an iPad or a web browser seems too complicated or because no one wants to move their age old systems to a more modern web based system because of costs. Native apps work great, but I think the world is moving to the cloud and that means web based everything should be the norm. Yes AWS AZURE outages can still happen but those can be fixed by spinning up a VM in different clouds.

This is also why software jobs aren’t going anywhere thanks for a while. Many systems need to be changed to more modern and robust clouds. It might take decades for this transformation across the globe.

amluto
25 replies
11h50m

Your “modern and robust cloud” is my “why on Earth doesn’t this thing work offline”.

The world is absolutely full of things that have worked for decades to centuries without the Internet, are eventually more or less consistent (remember carbon paper credit card machines?), and did an amazing job of keeping the world running despite, wars, network partitions (the “network” would basically always be partitioned), mistakes, entire branches offline, etc.

Sure, a lot of things are easier when centralized, and “the cloud” is incredibly powerful. But it’s not necessarily more robust. Also, depending on any sort of cloud means you’re also depending on the network, and networks are far from infallable. There’s a reason that a lot of stored-value transit systems still track balances on the card and will let people in even if a fare gate cannot connect to a cloud service.

And CrowdStrike took out plenty of cloud instances, and recovering them can be worse than recovering physical hardware, as the “robust cloud” has an absolutely terrible ability to do anything outside the happy path of booting an instance normally.

dailykoder
24 replies
9h55m

Okay this sounds all very reasonable, but how do you know when your washing machine is finished, when it's not connected to the cloud and you won't get notified in your app? It sure is not an easy thing and the cloud helps very much here

nihzm
7 replies
9h4m

I hope this is sarcasm, but if it isn't washing machine cycles have a fixed duration so a timer on your phone is more than enough, no cloud necessary.

4ad
6 replies
9h2m

I wish washing machines had a fixed cycle duration. When I start the cycle my washing machines tells me the same duration, always, but in actuality it takes different amounts of time every time. Madness. I've been told this is a feature.

throw0101b
1 replies
6h31m

When I start the cycle my washing machines tells me the same duration, always, but in actuality it takes different amounts of time every time.

If it says (e.g.) 43 minutes, but sometimes it takes 40 and sometimes 49 or 53, set your timer for 60 minutes and get on with life. Your laundry sitting for 17 or 7 minutes isn't the end of the world. If your timer goes off and it's still not done, set it for another 20 and do something else.

Of all the things to fill your head with worry and annoyance with, laundry is near the bottom of the list for me.

4ad
0 replies
6h18m

Except when you live in a building with communal washing machines and where you need to book time for laundry, as it is common in many European cities.

krige
1 replies
7h19m

My washing machine is kind enough to both indicate time to end in minutes, but also allows me to delay start so that the cycle is finished in [x] hours. It's not even that modern.

broeng
0 replies
5h6m

My modern dishwasher is also very kind, and displays the time to end in minutes throughout the wash. Counting down from an hour. But I don't know what kind of upbringing it had, for some reason, the sneaky bastard always adds another 25 minutes, when there is supposedly only 10 minutes left.

I guess dishwasher years are like dog years. At least it definitely behaves like a teenager at 2 years old, finishing when it wants to finish. Estimates be damned.

scrlk
0 replies
8h45m

Do you always load your machine up to the same level? A low load will trigger a shorter cycle time to save energy and water.

mschuster91
0 replies
7h15m

Madness. I've been told this is a feature.

It actually is. Fixed length cycles haven't been a thing for many years now - modern washing machines adjust the washing cycle length by the weight of the laundry and its behavior during spin-drying, both its vibration behavior aka weight distribution (that can have multiple adjustment cycles to achieve reasonably even distribution) and how much water it loses - when no more water comes out during spinning, it will cut the cycle short to save energy.

julian_t
3 replies
8h58m

When the noise from the white box stops, then I know. And if I'm not at home to hear it, I'm not quite sure why I'd need to know.

mschuster91
2 replies
7h17m

Well, for people in an apartment it doesn't matter all that much, but if your laundry washer or dryer is in the basement, you don't necessarily hear it if you're out in the garden.

dailykoder
0 replies
5h10m

Sure, it might be a "nice to have" thing. But the machines usually show how long they'll take. And even if it's a newer one with sensors that make the whole process vary in time. I'd still be like "Oh, okay it'll take about 3 hours, so ill be back at 6pm". It doesn't really matter if the clothes chill out for about an hour, especially the newer machines don't stink that fast. And on top of that, I don't think that it has to go over the internet if you needed some sorta notification. Local would be suffiecient.

If I buy something new like this and have a few choices, I intentionally pick the one with as few smart features as possible.

dTP90pN
0 replies
7h0m

What happened to the good old tin can telephone down the side of the house to the washing room?

geoduck14
2 replies
5h56m

I think you are joking, but I'll reply with a serious answer.

Where I went to college, our dorms had (free) shared washing machines. This was "pre cloud", but wifi was throughout. One student rugged up a hall-effect sensor and attached it to each power cable. It could detect if the washers and driers were on. It sent this info to a specific website that the students could monitor to see if there were any available washers or driers.

mmikeff
1 replies
5h46m

Wasn't the first webcam setup to show whether a coffee pot was full?

red-iron-pine
0 replies
4h8m

Also the reason we got Hyper Text Coffee Pot Control Protocol (HTCPCP) in RFC 2324

belter
2 replies
6h8m

First I though you were joking, then got hit by the disbelief of realizing you were not...

dailykoder
0 replies
6h2m

Nah, you're good. I was joking.

arminiusreturns
0 replies
6h1m

Wait. They aren't being sarcastic?

In all seriousness, I think never has there been a better time to educate people on the fundamental philosophy of computing freedom, and I usually start with Eben Moglen and RMS's talks with people.

I don't know how much of this is generational, or how much of this is corporate sell out, or maybe even sockpuppetry for consensus cracking and other psyop techniques, but relearning the lessons of early computing (such as being able to do things offline, locally, as a core part of a functioning decentralized system), seems highly in order.

baq
1 replies
8h39m

My home assistant does approximately this without the cloud, but it isn't magic: cloud is just 'someone else's servers' and I just host it on my own raspberry pi.

quectophoton
0 replies
8h14m

At this point I'm tempted to start using "the ground" as the opposite of "the cloud".

I'm already mentally replacing "cloud" with "clown" anyway, to the point I have to stop myself from accidentally saying "clown computing" out loud.

throw0101b
0 replies
6h37m

Okay this sounds all very reasonable, but how do you know when your washing machine is finished

1. Check back in an hour (like my (grand)mother did—and she managed to do laundry without Wifi).

2. Or: have a washer that beeps.

3. Or: set a countdown kitchen timer (or a timer on my phone) that will beep if my washer does not have a washer.

There are complicated situations in life: doing laundry is not one of them.

rozenmd
0 replies
9h41m

I can't tell if this comment is sarcastic but maybe washing doesn't need to be hyperoptimised down to the instant the machine finished

ds_opseeker
0 replies
5h30m

I'm really hoping your comment is sarcastic.

If it is serious, you could always set a timer.

__alexs
0 replies
7h2m

BTLE exists and is good and cheap.

ta1243
4 replies
10h10m

This could have been fixed by having a minimal baseline of machines not running the same software

Resilience comes from diversity, in computing and in biology. Whether that's having critical workloads on multiple cloud providers or having one user interface on windows on network A (Arista) with crowdstrike and one on a mac on network B (cisco) with Sentinal one

Sometimes perhaps you can't eliminate a single point of failure, but you can sure reduce them to a minimum.

Or you can choose to increase next years bottom line and thus your bonus by not having a robust DR plan or system. You can also skip on boring things like raid and backups.

The trick for a CxO is to ensure that when failure happens, it's massive and widespread. Then it's not your fault. The CxOs in a given industry won't be fired because their DR plans didn't work because they believed Gartner and all their CxO chums in competitors did the same thing.

Nobody got fired for choosing IBM/Microsoft/Cisco/Crowdstrike/Azure, even if it's worse than the alternatives. People do get fired for bucking the trend even when it's measurably more reliable.

nolist_policy
2 replies
9h57m

Diversity increases your attack surface however. You rather want redundancy and easy deployment or rollback of your clients and servers

ta1243
1 replies
7h2m

Diversity means a successful attack will take out part of your operation.

Monoculture means a successful attack will take out all of your operation.

nolist_policy
0 replies
6h18m

That is not a good model.

Cyber attacks rarely take down stuff directly. Rather attackers will establish a bridge head into your organization first and inspect the network and gather data for further (phishing) attacks.

Diversity only means more opportunities to install bridge heads.

incorrecthorse
0 replies
9h54m

The update affected less than 1% of all Windows machines. [1] Although maybe the biggest software failure in history, far from the biggest possible one. The level of cloud connectivity in the world could basically break the world if we didn't have diversity.

[1] https://blogs.microsoft.com/blog/2024/07/20/helping-our-cust...

alibarber
2 replies
10h32m

I'm not sure I follow, I doubt the web vs native implementation of an application makes much difference when the terminal used to access it is unavailable. A cloud based web-app is not much help if no one has a working computer and browser.

I'm not sure we're quite at the stage where a check-in agent using their personal un-managed devices to handle passenger data via a web-app is a great idea.

nolist_policy
1 replies
9h51m

It does make a difference, because now you can give end-users iPads or Chromebooks which don't need all this "security" BS.

rob74
0 replies
9h30m

They might not need them, but I'd be surprised if at least some companies don't install security BS on them anyway (just like they do on Linux machines), because of compliance reasons. It can't hurt, can it? (at least that was what most IT departments thought before CrowdStrike)

mike_hearn
1 replies
10h14m

> training people to use an iPad or a web browser seems too complicated

iPads aren't designed to be turned into kiosks or airport departure displays and web browsers aren't operating systems (except maybe ChromeOS). So this advice boils down to don't run Windows, but CrowdStrike has caused outages of Linux as well.

nolist_policy
0 replies
9h48m

By the way, ChromeOS is a perfect fit for digital signage and kiosks. It's officially supported.

OvbiousError
1 replies
11h15m

Try making a graph in excel online and then come back to tell us everything needs to move to the cloud asap.

vel0city
0 replies
3h33m

Ok, just did. It went just about as smoothly as the desktop client. What's the hold up again?

dzonga
0 replies
7h57m

let's not throw the baby with the bath water.

native desktop apps are absolutely necessary for most professional / serious work and native desktop apps need offline support too.

with cloud - your risk factor goes up massively.

the risk here is that most of these companies are reliant on windows and of course snake-oil salesman of antivirus tools.

if you have a proper native desktop app, that runs in a sandboxed environment then you simply wouldn't need crowdstrike and the likes.

unikernels / bsd jails are things that have been well known and will easily mitigate "security" issues.

even windows these days has sandbox mode.

but incentives rule the world.

BiteCode_dev
0 replies
5h31m

Counterpoints:

- Latency

- Security

- Legal obligations

- Offline work

- Managing the different sources of locking.

- Avoiding a single point of failure (I get the irony).

feyman_r
39 replies
22h18m

> Why were other airlines able to get back to normal so much faster than Delta?

I read somewhere that their crew tracking software was hit hard and took time to recover. Will look for source on that.

(Edited) source: https://news.delta.com/update-delta-customers-ceo-ed-bastian

“… and in particular one of our crew tracking-related tools was affected and unable to effectively process the unprecedented number of changes triggered by the system shutdown…”

Onavo
23 replies
22h17m

Because they used Windows 3.1

ZeWaka
12 replies
22h6m

In the article it says Southwest used 3.1, not Delta (though, that's apparently incorrect according to other posters).

Someone1234
11 replies
22h2m

And Southwest had two crew-management outages in 2022[0], so let's not sing their praises for escaping the CrowdStrike disruption. Southwest has been widely critized for under-investment in technology, Delta on the other hand purchased one of the best security products on the market and that backfired.

[0] https://en.wikipedia.org/wiki/2022_Southwest_Airlines_schedu...

chgs
9 replies
21h23m

Delta put all their eggs in one basket and had no DR capability

Someone1234
8 replies
20h50m

What basis do you have for saying that? It is likely their DR was running on a mirror of their production systems, and was similarly impacted by the Crowdstrike outage. So they fell back to Windows Servers similarly stuck in a boot-loop.

Keep in mind there was no way to opt out or delay CS Channel updates.

chgs
3 replies
19h55m

If your DR system is susceptible to the same faults as your main system it’s not a DR system.

It would be like claiming raid1 is a backup.

TheDong
1 replies
19h29m

Or it would be like claiming my backup isn’t a backup because both systems run openssh, so a remote code execution vuln there could take down both systems.

Any DR system will have to accept some risks, and those don’t necessarily invalidate it in general, just make it insufficient for some scenarios.

Conversely, if they ran the main system on windows with crowdstrike and the DR one on poorly configured linux with no security software, they probably would have needed more sysadmins, had more trouble maintaining software for both, and been vulnerable to risk from both linux and windows bugs, so I feel like they made the right tradeoff in general.

I’m sure you, who can deride this DR system, have devised your own system such that it is resilient to a meteor destroying the earth.

shagie
0 replies
18h1m

I’m sure you, who can deride this DR system, have devised your own system such that it is resilient to a meteor destroying the earth.

That reminds me one of Corey Quinn's comfortable AWS truths.

https://x.com/QuinnyPig/status/1173371749808783360

If your DR plan assumes us-east-1 dies unrecoverably, what you're really planning for is 100 square miles of Northern Virginia no longer existing. Good luck with that ad farm in a nuclear wasteland, buddy!
freeopinion
1 replies
19h48m

Keep in mind there was no way to opt out or delay CS Channel updates.

Do CS updates somehow work over airgaps? You know, the kind that production systems have to prevent any access to or from external networks? Well... some production systems anyway.

nradov
0 replies
18h26m

What's your point? An air gapped disaster recovery system would be useless. An airline operations application has to connect to a bunch of other external systems to be of any use.

amluto
1 replies
17h44m

One idea: build a DR system and turn it off. Ideally it would be cloneable, but even without that ability, one could test it every few months to make sure it boots adequately quickly and then turn it back off. The attack surface of a bunch of computers or instances that are powered down is pretty low.

compiler-guy
0 replies
15h18m

Better yet, alternate between them every month or two.

shiroiushi
0 replies
18h28m

Delta on the other hand purchased one of the best security products on the market and that backfired.

It looks like it wasn't a good security product after all...

shagie
9 replies
22h1m

I chased through this chain the other day...

https://www.tomshardware.com/software/windows/windows-31-sav...

https://www.forbes.com/sites/tedreed/2024/07/20/meltdown-wha...

A story on the website govtech.com on Friday asked the question, “Why isn’t Southwest affected by the CrowdStrike/Microsoft outage?

“That’s because major portions of the airline’s computer systems are still using Windows 3.1, a 32-year-old version of Microsoft’s computer operating software,” the website said. “It’s so old that the CrowdStrike issue doesn’t affect it so Southwest is still operating as normal. It’s typically not a good idea to wait so long to update, but in this one instance Southwest has done itself a favor.”

The govetech.com article is https://www.govtech.com/question-of-the-day/why-isnt-southwe...

which linked to https://www.digitaltrends.com/computing/southwest-cloudstrik...

which linked to an earlier Forbes article - https://www.forbes.com/sites/hershshefrin/2022/12/31/can-sou...

The December 2022 scheduling fiasco was the result of skimping on information technology. I am old enough to remember when Microsoft introduced a new operating system called Windows 95, to replace its predecessor operating system Windows 3.1. The 95 in Windows 95 refers to the year of its introduction: 1995. By some accounts, major portions of Southwest’s scheduling system for pilots and flight attendants is built on the Windows 95 platform. That platform is now more than 25 years old.
JumpCrisscross
8 replies
21h53m

Southwest does not run Windows 3.1:

“That’s it. That’s where all these stories can trace their origin to. These few paragraphs do not say that Southwest is still using ancient Windows versions; it just states that the systems they developed internally, SkySolver and Crew Web Access, look ‘historic like they were designed on Windows 95’.”

https://www.osnews.com/story/140301/no-southwest-airlines-is...

shagie
7 replies
21h49m

The other day, I saw a screen capture from Tom's Hardware and so chased the series of links and quotes to try to find the earliest one that had reporting on it that was the source. That was the chain that I found.

I am not claiming that they run Windows 3.1 or Windows 95 ... but rather "this is where that story was sourced from" because everyone kept linking to somewhere else. The relevant XKCD is https://xkcd.com/978/

Modified3019
6 replies
21h41m

Funny enough, this cycle is close to what the Russian disinformation machine does deliberately to spread bullshit.

starspangled
5 replies
17h7m

Is that actually true, or just something that's repeated until people believe it?

starspangled
0 replies
15h31m

Yes, is there some evidence beyond the claims of "intelligence officials"?

red-iron-pine
0 replies
7m

Russian approaches are well known and documented. None of this is new, and wasn't even really that new in 2016, it's just become better known.

Essentially modern versions of Soviet-style disinformation campaigns, but augmented with new technology (social media), and without the ideological hindrances of a Communist government (e.g. sell hard to both Right and Left).

RAND Corp calls it "the Russian Firehose" model: https://www.rand.org/pubs/perspectives/PE198.html

Similar approaches are also used by NK, Indian, Chinese, and other national-tier disinfo campaigns. This contrasts with models used by the West, which are often less about creating a disinformation clusterfuck, and more of a "watch our Disney / BBC / Scandinavian TV & movies and their implied messages about freedom and human rights and shit".

dave4420
0 replies
7h0m

I see what you did there.

smileysteve
8 replies
21h51m

Re Delta

It's not so much a severity as "hard"; but with the hub and spoke model that Delta uses, scheduling being down (at all on Friday), combined with FAA hour limits. It becomes exponentially difficult to reschedule flights.

Put more plainly, on Friday, your scheduling software is down for 4 hours in the morning, so you "borrow" any replacements you need for employees that are late or sick. This ruins the availability for the next flights, at which time you hope the system is up again; but if it's not, you borrow from the evening flights. Combine this with each flight that was late/cancelled as you were hoping to fill now affects the hours available for the employees that were available. Finally, as you've cascaded this, you head into a weekend trying to catalog how many hours each crew member did or did not log, and you're not sure how to get them back in time.

sidewndr46
4 replies
20h21m

wouldn't this imply either an upper bound on down time (airline simply folds as it never catches up) or an upper bound on the duration of the impact ?

Someone1234
1 replies
15h13m

They did a "reset." Cancel enough flights to reduce load, then manually recalibrate the crew tracking software to figure out where everyone is and their hours. Then start operations again.

rconti
0 replies
13h19m

It's like stopping your in-place manual software recovery efforts and restoring from backup. You KNOW it's going to take a massive amount of time, but at least you know how much time it's expected to take, and what the expected result is, rather than "2 more hours... 2 more hours.... 2 more hours.." for a week.

toast0
0 replies
17h32m

Worst case, with good weather, you can stop service for a few days: day 1 mandatory rest; day 2 fly crews to where they need to be to start service; day 3 mandatory rest; day 4 return to service. Then start rebooking passengers and picking up the pieces. Carriers with long haul international may need longer, and maybe you need more rest days to ensure everyone is ready for their normal shift, but that's a reasonable napkin estimate.

Otoh, Delta seemed to have recovered after about a week, and canceled about 1,000 out of about 4,000 flights for several days. It's way better to fly 75% of the daily flights than not. There's less wiggle room in a summer schedule for weather, but there's still some wiggle room.

brendoelfrendo
0 replies
16h6m

There probably is an upper bound on down time, by which point the business has suffered some irreparable harm. It might not result in the business simply folding, but might result in significant expense or legal complications, long-term reputational damage, etc. In business continuity speak, that's the "maximum tolerable downtime," and while I don't know how Delta defines it for the impacted systems... I imagine they're not happy with how long they were down.

inferiorhuman
2 replies
21h27m

Except for Southwest the other legacy airlines (United, American) also use a hub and spoke model. So does jetBlue.

crazytony
1 replies
20h7m

Funny you should mention WN. Delta's meltdown is the exact same scenario as Southwest. Crew scheduling is messed up, they don't have a way of tracking where employees are, if the employee is legal, etc and so the operation grinds to a halt

rconti
0 replies
13h20m

To clarify, Southwest's meltdown last year, which was all about the difficulties of crew scheduling and the knock-on effects of same.

crazytony
4 replies
19h50m

One other compounding problem is that Delta's headquarters and main traffic patterns are on the east coast. Crowdstrike affected all the airlines at roughly the same time. This gave them roughly one to two fewer hours to respond before they hit their morning peak flights.

As someone else pointed out, they probably weren't ready by the time they needed their systems for the morning rush so they went to their business continuity strategy (manual). This has a throughput and recovery time penalty and obviously it compounds the longer they are in that mode.

I think what we're finding with the Southwest meltdown and now the Delta meltdown is that the big airlines just don't have the manpower or scheduling slack to accommodate going into business continuity. I do think this should be investigated. Hopefully financial penalties incentivize action but time will tell.

katbyte
1 replies
14h7m

They prioritized stock buy backs instead of investing in a robust it operation

shiroiushi
0 replies
14h5m

As well they should!

Which one profits the CEO more? Stock buy-backs or robust IT? Robust IT is only good for the company in the long term; however, with stock buy-backs or other skimping on IT, if disaster like this happens, the CEO just takes his golden parachute and leaves, but if no disaster happens, he gets a huge bonus to buy another private yacht.

throwaway2037
0 replies
17h33m

    > big airlines just don't have the manpower or scheduling slack to accommodate going into business continuity
Do small airlines have it? And, how much higher are you willing to pay in ticket prices to have this ability?

WalterBright
0 replies
13h23m

Hopefully financial penalties incentivize action

Delta already took a huge financial hit for this.

reaperducer
0 replies
21h44m

>> Why were other airlines able to get back to normal so much faster than Delta?

I read somewhere that their crew tracking software was hit hard and took time to recover. Will look for source on that.

I heard on the radio (maybe NPR, not sure) it wasn't about the computers, it was about Delta's response.

According to the report, the other airlines delayed flights, while Delta cancelled them outright. That left Delta with more people and planes in the wrong places, making it harder to recover.

bedobi
32 replies
22h12m

Apparently Southwest Airlines’ ingenious strategy of never upgrading from Windows 3.1 allowed it to remain unscathed.

this is pretty damning both ways

on the one hand, it's insane, unfathomable and inconceivable that anyone can run anything critical on windows 3.1 (!!!)

on the other hand, it's equally insane, unfathomable and inconceivable that those who do are actually better off - 30 years of "progress" is actually just bs? what are we as an industry "even doing here"???? is computing actually a solved problem and we're really just mostly reinventing the wheel and enshittifying perfectly already working systems?

jakub_g
5 replies
22h1m

You don't want to know what OS Sabre (backend for 30% of world's airlines) is using on their mainframes.

bedobi
2 replies
21h29m

actually, I do :) is it DOS? some IBM mainframe OS? do tell

Wytwwww
1 replies
21h2m

Was DOS actually every used on mainframes/servers on a significant scale? (genuine question, not saying it wasn't)

wrs
0 replies
20h16m

A mainframe OS called DOS was in fact quite popular, but it’s not the same thing as the DOS that was in PCs. (There were others, too, like Apple ][ DOS. As soon as your computer gets the capability of attaching a disk drive, somebody has to write a Disk Operating System.)

kayodelycaon
0 replies
20h57m

I do actually. My last job had a mainframe team maintaining (and adding to) an AS/400 application. They still had punchcard programs.

They had json apis. Each one had some variation on parsing http from a raw tcp connection with IBM RPG. I had to do some unspeakable things to a ruby library so I could control the order of the headers.

fourteenfour
0 replies
20h1m

Looks like it was IBM System/360 mainframes but they've recently migrated to google hosted services.

shagie
4 replies
22h3m

Thirty years of progress is still progress for managing the additional complexity that modern software needs.

A lot of software doesn't need that additional complexity. Having thirty year old software, if properly sequestered (since there are security holes large enough to fly a 737 through), means that this is software that has been working for three decades. It has issues (as the mess they had previously showed), but Southwest appears to be able to be able to manage that to some degree without needing to incur the additional complexity of managing a modern software stack for application software that doesn't need it.

The ability to play minesweeper on critical computing equipment without impacting it isn't necessarily a desirable feature. Having the computer boot in five seconds and run the desired application is.

And there are a number of ways to handle that ... running old operating systems is one of the ways. Space Force S02E07 is not a desirable situation https://youtu.be/xDLvUqhwHZc . You could also have a kuberentes cluster with multiple replicas and load balancing and all of that additional complexity that takes more people to be able to manage without any real gains in what the application itself is doing.

Wytwwww
3 replies
21h34m

Having the computer boot in five seconds

There are certainly more modern options that allow that and it's highly doubtful that specifically is particularly relevant for Southwest.

Not that there is any evidence that they're actually using 3.1 for anything?

that this is software that has been working for three decades

Or it's so buggy or designed (or more likely updated) so poorly that everyone is afraid to touch it. e.g. I doubt there are many (even any?) practical reasons for airlines to use GDS besides the cost and complexity involved in designing an entirely new system and somehow forcing all other airlines to switch to it?

shagie
1 replies
21h1m

Not that there is any evidence that they're actually using 3.1 for anything?

Windows 3.1? No. I'd even say there's no evidence that windows 95 is being used but rather that they've got what appears to be some old software with older design.

https://www.dallasnews.com/business/airlines/2022/12/30/what...

2. The crew scheduling system is the main culprit.

Southwest uses internally built and maintained systems called SkySolver and Crew Web Access for pilots and flight attendants. They can sign on to those systems to pick flights and then make changes when flights are canceled or delayed or when there is an illness.

“Southwest has generated systems internally themselves instead of using more standard programs that others have used,” Montgomery said. “Some systems even look historic like they were designed on Windows 95.”

Screen shots of this are in http://www3.alpa.org/LinkClick.aspx?fileticket=IO7kd%2Bfm2Do...

Unfortunately, I don't know the nuances of Microsoft Windows UI well enough to be able to pick out which OS version is running the software in those screen shots.

---

Or it's so buggy or designed (or more likely updated) so poorly that everyone is afraid to touch it.

That is a very common occurrence (I'm dealing with that now ... a .jar file that hasn't been rebuilt in 15 years). The big rewrite is something that comes with one part excitement (I can do it right this time!) and dread (oh my, that's how much code that I need to retest?!).

I was involved in the tail end of a 3 year project at one company with some software that replaced previously running DOS (and yes, it was DOS - they had an C and assembly guru employed who's job it was to remove / optimize code in the binary to get it to fit into 640k) to a Java Web Start (which was a neat technology) and the millions of lines of software that monstrosity had and needed to be debugged and fixed.

While they're in a better spot now (can use modern hardware), and its something that they can build in house (a major part of the reason to do it was to drop the external contractor who didn't like maintaining the C code) ... but that also came with the added complexity of the software that they licensed and the maintenance and deployment of that software. Before they could put the software on a floppy and have it shipped to each location ... now its a big bigger and more complex of a deployment (that was built with duct tape and chewing gum one night to do diff deployments of specific class files rather than trying to push the entirety down the pipe for each location).

My rambling point is that we are moving forward with complexity - and that allows us to manage more complex situations, but it comes at the cost of managing that additional complexity of the infrastructure and software it needs and that cost is ongoing and not always taken into account.

jwagenet
0 replies
20h7m

Those screenshots look more like WinXP to me with the rounded and shaded button elements. It’s the boring grey and buttons people presumably associate with the 90s.

spookie
0 replies
20h36m

Most times its less about a system being poorly designed and more about it being able to solve very hard problems which most existing employees today haven't even heard of. Institutional knowledge plays big time on these decisions.

josephg
3 replies
21h49m

is computing actually a solved problem and we're really just mostly reinventing the wheel and enshittifying perfectly already working systems?

On a whim I tried playing Solitaire on windows the other day. You know, that game that’s shipped with windows since forever. Well, it’s horrible now. When I tried firing it up, it first spent several minutes downloading software updates. Then it loaded in some horrible “casual games bundle” app which felt laggy like a web app - complete with Xbox cloud sync for my progress, and daily achievements and other junk.

The game used to run flawlessly on my old 486. My computer now is orders of magnitude more powerful - but solitaire feels laggy. I bet the entirety of windows XP is smaller than the “update” it performed to install solitaire.

I have a personal theory that there’s always something that gets the attention of the best engineers. Decades ago it was human interface guidelines and UI toolkits. Today it’s LLMs and AAA game engines. Most of the rest of the software in the world is worked on by the B team. And they don’t blink an eye at the idea of rewriting solitaire for windows on top of electron. If JavaScript is all their team knows, so be it. Heaven forbid we have to learn how to properly build software for windows.

shiroiushi
1 replies
13h55m

Heaven forbid we have to learn how to properly build software for windows

They already are properly building software for Windows, and your Solitaire app is a good example of it. It's much, much better than it used to be: it's laggy and slow, and downloads a bunch of extra crap, which has the potential of getting you to spend more money. In short, changing from the old and simple Solitaire to this bloated mess makes Microsoft more profitable, and customers like you keep using Windows regardless, so why shouldn't they do this?

Meanwhile, on my Linux box, my simpler games haven't changed substantially in years, if not decades (perhaps updates for the newer GUI toolkits being used), and still work great. But people prefer to stick with Windows and then complain about things being too bloated.

josephg
0 replies
8h12m

which has the potential of getting you to spend more money.

I don't think any of the extra stuff is monetised. I don't know; I didn't look closely at it.

customers like you keep using Windows regardless ... But people prefer to stick with Windows and then complain about things being too bloated.

"Stick with windows"? I run windows, macos, linux and freebsd at home. I want all of them to be better. It pains me to see any of the great operating systems of our time fall slowly toward mediocrity.

Of course I complain about it. Windows should be better than this.

mwcampbell
0 replies
5h31m

FWIW, that newer Windows Solitaire app isn't actually using Electron now; it's a UWP app using XAML. I guess that just goes to show that the problem isn't Electron specifically.

jandrese
2 replies
22h3m

In the long run the hardware that can still run Window 3.1 will become harder and harder to find and they'll be forced to upgrade, but currently enjoying the benefits of "if it ain't broke don't fix it". Plus, there were literally millions of systems made that can run Windows 3.1 so it will be many many years before the hardware is too hard to find.

We're talking about a problem on the scale of 4,000 flights per day. Assuming you avoid O^2 complexity computations that's the sort of thing even 90s computers could handle easily.

burningChrome
0 replies
21h31m

> Plus, there were literally millions of systems made that can run Windows 3.1 so it will be many many years before the hardware is too hard to find.

Two of my first contractor roles I had as a developer really opened my eyes to a lot of this.

We were building an inventory management system for a large company that built farm equipment. We started building it and once we got to the browser and mobile requirements, one of the VPs spoke up and asked if it would run on IE6 since they had not one, but THREE of their inventory legacy systems that still ran on Win98. This was in 2014, a full 6 years after many companies had stopped supporting it. And another 4 years since websites stopped supporting it.

The other one was for a very large, regional construction company. Same thing, we were building a web app for them and in one of the conference calls, one of the VP's was asking how this would run on Windows95 for the same reason. They had several legacy ERP systems that were running on Win95 and had specific requirements for stuff to run on that OS.

As a developer who was used to working with somewhat current tech - it was a real eye opener. It was crazy to think how many massive companies just didn't have the constitution to upgrade their stuff, and then by not doing so, had now dug themselves into an even deeper hole.

Once I started hearing stories about the details of this scenario, it made perfect sense to me since I had seen it multiple times. And not from little companies who didn't have the money or resources to upgrade, but massive Fortune 500 companies who just neglected their stuff until was too late.

arsome
0 replies
21h59m

You can basically run Windows 3.1 in dosbox on a potato now, so the hardware really isn't even a problem. If any of this was actually true...

alliao
2 replies
21h56m

productivity wise 70's and 80's peaked with those thin terminals with every action carried out by keyboard... workers tapped at light speed due to muscle memory, didn't look fancy, but it got the job done. GUI is sexy but like short videos ultimately did nothing for the user

Wytwwww
0 replies
21h26m

GUI is sexy but like short videos ultimately did nothing for the user

That's a stretch.

I guess it increased the productivity (measured in amount of "work" done, not necessarily something useful) expectations for most workers which effectively did nothing for them because they still need to work as much even if modern software allows them to accomplish much more in the same amount of time. So t might make sense in that regard.

workers tapped at light speed due to muscle memory

That's great if we're mainly talking about robotic tasks than can be mostly automated to only require a fraction of those clicks but wasn't for some unclear reasons.

JumpCrisscross
0 replies
21h45m

productivity wise 70's and 80's peaked with those thin terminals with every action carried out by keyboard

Computers only started showing up in nationwide productivity figures in the 80s to 90s.

AnimalMuppet
2 replies
21h41m

Well, what actual new features does Windows 11 give you compared to Windows 3.1?

It will support a huge number of new chips, new peripherals, more memory, and so on.

If I'm running Southwest's crew scheduling software, how much of that do I care about? Do I care that it will now support the latest Bluetooth? Do I care that it now has the same UI as tablets? Do I care that it has better ads to display on the start menu? No, no, and no.

The only thing might be more memory. (I mean, the UI might not look like it belonged in the Stone Age, so that's something, I guess...)

There hasn't been a real fundamental improvement in the functionality of OSes since Windows 3.1. It's all been device support (including new classes of devices), new CPU support, and new UI styles. (The security improvements in Windows were a legitimately big deal, but those were fixing what was broken, not adding new functionality.)

And I'm sure that, having said this, someone is going to point out something really important that I forgot...

Wytwwww
1 replies
21h14m

If I'm running Southwest's crew scheduling software, how much of that do I care about?

Not a lot of if you can't/don't want to upgrade or replace that software to make sure it runs on modern OSes. I'm sure that for the most part that software is causing various unnecessary issues and decreasing potential productivity at least to some extent. Just look at GDS, they love to get rid of that, but that would require a coordinated effort and extensive collaboration between all major airlines which is somewhat tricky.

There hasn't been a real fundamental improvement in the functionality of OSes since Windows 3.1.

Multitasking? A massive amount of other important features that matter if you want to build new software or significantly improve what you're using now.

Also you seem to be downplaying security a but too much? Those devices would need to be carefully isolated from everything else (not that as I understand there is any evidence that Southwest Airlines is actually using 3.1?).

but those were fixing what was broken, not adding new functionality

It's like saying that every new feature in introduced in any type of software that wasn't entirely novel was actually fixing stuff that was broken and wasn't "new". I guess some would apply to the claim GUI/desktop wasn't something new but it was just "fixing" (inherently "broken") command line interfaces?

Being able to design significantly objectively better (based on how much it could increase productivity) is I guess is not strictly tied to the OS at least in some cases. But it certainly make it a lot cheaper/easier.

AnimalMuppet
0 replies
21h1m

Multitasking. Yeah, I'll give you that. That is actually a huge step up.

I'm presuming that Southwest's internal software backends are not internet exposed, which is why I'm downplaying security.

numpad0
0 replies
22h1m

It is BS. Continuous updates for security notion, especially so. That said, the barrier to entry for programmers did come down significantly.

m3kw9
0 replies
22h4m

It’s like a calculator, if you need to do basic math, you can use an old calc.

aftbit
0 replies
22h2m

Well all you'd really need to do to avoid this outage is not run auto-updating proprietary kernel modules in the early anti-malware environment. Bare Windows 11 would have been fine - the problem was Crowdstrike.

ToucanLoucan
0 replies
21h49m

Almost every industrial embedded system I've ever used runs Windows XP at the absolute newest, and it is not uncommon in the slightest to see stuff as old as 95/3.1. These are computers that operate machinery that costs 6 or 7 figures. If it ain't broke, don't fix it.

Don't get me wrong, my Macbook is an absolute beast for all of my work tasks, and my gaming PC is an utter joy to use for my recreation time, but at the end of the day, for a ton of applications, a computer doesn't need to do shit beyond sending a lot of signals out of a parallel/RS232 port to control systems to operate... I mean Christ, anything. CNC mills, building lighting/security systems, packing machines, or to do things like issue tickets to people parking in a ramp. Like... a lot of this stuff just does not benefit at all from a modern software stack. Stick a crappy PC inside instead, load it up with the same image it had before which includes firewall rules that shut down every last port and connection apart from whatever needs to manage it, and you're done.

Don't fix what ain't broke.

TonyTrapp
0 replies
22h9m

From my memory, this wildly circulating Windows 3.1 quote is inaccurate. The software they were running was compared to running something like Windows 3.1, but it wasn't actually running on Windows 3.1, as far as I understand.

Edit: https://kotaku.com/southwest-airlines-windows-3-1-blue-scree...

Swizec
0 replies
22h8m

is computing actually a solved problem and we're really just mostly reinventing the wheel and enshittifying perfectly already working systems?

80% of the work is json bureaucracy

The other 80% is adapting to new requirements

And if you’re lucky maybe 0.1% of the time you get to build something new.

Fear not, a lot of this stuff was perfectly solved with pen and paper long before us computer nerds came to play in the big boy sandbox

skrebbel
14 replies
21h39m

I love that “CrowdStrike” is now a synonym for “global outage”. Not some cute hihi name like “heartbleed”, just the name of the company that did the screwup. Seems fair.

jraph
8 replies
21h15m

Not sure it's fair, but I am certainly waiting for it to become a verb or a noun.

    crowdstrike. n.
     1. A set of major disruptions caused by an update that was not tested enough, pushed to many devices across the globe.
     2. The name of such an update.
     3. (by extension) a joke so bad it causes major disruptions.

     For instance:
       - Congrats for your crowdstrike! Now my weekend is ruined as I'll be the one who'll be asked to fix this mess.

    crowdstrike. v. (simple past crowdstruck or crowdstriked¹, past participle crowdstricken, or crowdstruck, or (obsolete, regionalism) crowdstroke²)
     1. Action of pushing an update to many devices that causes a global outage or major disruptions in various sectors.

     For instance:
       - We've been crowdstruck. Again.

    crowdstrike. adj.
     1. Qualifies an update that, when pushed to many devices across the world, causes major disruptions across the globe.
     2. Qualifies such a (set of) event(s).

    For instance:
       - We are sorry for the crowdstrike event we caused. We gently remind our kind customers and their end users that per our ToS, we will issue no refund, and that no liability can be held against us. Customers who don't try to contact us in the following month will get a discount for their next contract renewal. You will hear us speak before the Congress, who nicely invited us for some comedy in the hope it will appease you all. Make sure you like the related videos on the various online platforms. We wish you a nice end of the week and nice, relaxing summer holidays.
¹ people have differing but strong opinions on which simple past form is correct, mainly due to regional differences. Some avoid saying crowdstrike and say crowdhit instead.

² some people have tried to push crowdstricken, which first caught on in some areas or particular contexts. The idea that this form likens the qualified subject to the bearer of some sickness has eventually seduced a critical mass of people after some initial push back. Please also see the usage notes for strike for other, rarer, alternative forms [*].

[*] https://en.wiktionary.org/wiki/strike#Usage%20notes

(Thanks to the contributors in this thread)

skrebbel
3 replies
21h5m

“The intern crowdstruck half the customers”

jraph
2 replies
21h1m

Exactly, by the way I added the irregular inflections and fixed the example for the verb. Thanks for your contribution.

LeifCarrotson
1 replies
20h52m

I disagree, I think that the simple past should be "crowdstruck" but the participle should be "crowdstricken", as might apply to someone afflicted by an illness:

"The update wasn't tested, so the servers are all crowdstricken."

jraph
0 replies
20h48m

Thanks, I added the documentation for this form, and added a second usage note. I initially wanted to tease you by documenting that people with bad taste tried to push for this form, but I really like this illness idea.

quectophoton
1 replies
8h3m

crowdstruck

    Said, "Yeah, it's all right
    We're doing fine"
    Yeah, it's all right
    We're doing fine, so fine

    Crowdstruck
    Yeah, yeah, yeah, crowdstruck
    Crowdstruck (crowdstruck)
    Whoa, baby, baby (crowdstruck)
    You've been crowdstruck
(AC/DC's Thunderstruck, but replacing "thunderstruck" with "crowdstruck")

arrakeen
1 replies
20h45m

since nothing will happen to them except a slap on the wrist, and all our employers will continue to force this crapware on our machines, i think we should make a point to start using their name as a pejorative (similar to the 'santorum' neologism). any when they inevitably try to rebrand, use that term too

chasd00
0 replies
16h16m

since nothing will happen to them except a slap on the wrist

I've already bought some of their stock, i'm pretty sure it's bottomed. I bet i make 30% a year from now. This always happens some "ohnoes!" event cuts a stock price off at the knees but then everyone forgets and in a year or so it's back to where it was before the event.

stana
1 replies
20h32m

Rebranding project coming up at CrowdStrike?

jraph
0 replies
20h19m

That would be a shame, the name is so fitting, more than ever!

They struck a very big crowd real bad.

aragonite
1 replies
16h1m

Does anyone know (or have any guesses as to) why the founder(s) named it "CrowdStrike"? What was (or might have been) the idea behind the name? I'm guessing it's not patterned after "crowdfunding" "crowdsourcing" "crowdlending", etc.

latentsea
0 replies
12h1m

It's part of a trend where companies name themselves after a self-describing disaster they're going to cause. Oceangate also did this.

New investing strategy is to look for companies whose name also fits this pattern but who have not yet caused the disaster and short the stock.

Hemospectrum
0 replies
15h49m

The cute name was Blue Friday, but it doesn't seem to have caught on.

ks1723
13 replies
21h59m

I found it quite interesting, that crowdstrike actually exclude a bunch of services explicitly. They also basically say, don’t use, if it needs to be reliable. I don’t know if this is standard for software, but for me this was quite surprising.

From crowdstrike terms and services [1]: […] THERE IS NO WARRANTY THAT THE OFFERINGS OR CROWDSTRIKE TOOLS WILL BE ERROR FREE, OR THAT THEY WILL OPERATE WITHOUT INTERRUPTION OR WILL FULFILL ANY OF CUSTOMER’S PARTICULAR PURPOSES OR NEEDS. THE OFFERINGS AND CROWDSTRIKE TOOLS ARE NOT FAULT-TOLERANT AND ARE NOT DESIGNED OR INTENDED FOR USE IN ANY HAZARDOUS ENVIRONMENT REQUIRING FAIL-SAFE PERFORMANCE OR OPERATION. NEITHER THE OFFERINGS NOR CROWDSTRIKE TOOLS ARE FOR USE IN THE OPERATION OF AIRCRAFT NAVIGATION, NUCLEAR FACILITIES, COMMUNICATION SYSTEMS, WEAPONS SYSTEMS, DIRECT OR INDIRECT LIFE-SUPPORT SYSTEMS, AIR TRAFFIC CONTROL, OR ANY APPLICATION OR INSTALLATION WHERE FAILURE COULD RESULT IN DEATH, SEVERE PHYSICAL INJURY, OR PROPERTY DAMAGE. Customer agrees that it is Customer’s responsibility to ensure safe use of an Offering and the CrowdStrike Tools in such applications and installations. CROWDSTRIKE DOES NOT WARRANT ANY THIRD PARTY PRODUCTS OR SERVICES.

[1] section 8.6 of https://www.crowdstrike.com/terms-conditions/

objclxt
9 replies
21h58m

I don’t know if this is standard for software

This is pretty standard. There is almost identical language in the Windows and macOS EULAs, for example.

SoftTalker
7 replies
20h49m

So how does it get installed on all the endpoints in 911 dispatch centers?

nemonemo
2 replies
20h23m

What is the alternative? Have you considered a possibility that those could be the best out there for 911 despite their imperfections?

SoftTalker
1 replies
15h4m

The data entry endpoints in a 911 dispatch center should not be running a general purpose consumer OS. They should be single purpose machines much closer to a dumb VT100 terminal than a personal computer. Maybe something like a stripped down hardened Chromebook. No internet connection. No personal email, web, or other use allowed or even possible. A product like crowdstrike should not be needed because it should not be possible to run anything but the dispatching software on those machines.

EvanAnderson
0 replies
12h56m

That's what computer aided dispatch (CAD, in the industry) software was 30 years ago (my PSAP had an AS/400). The market has rejected it. Also, see my other comment re: FBI CJIS policy.

In the PSAP I support we have three dedicated PCs at each workstation to run the CAD, phones, and radio. Each of those has a dedicated VLAN, separate physical servers and storage, separate Active Directory forest for CAD (no AD for radios or phones-- standalone PCs), and default-deny ACLs for inbound and outbound traffic on the hosts and at the borders.

A fourth dedicated PC (VLAN, ACLs, physical servers, AD environment) does email, web browsing, etc. (All of it is shackled together with a nice KVM that supports a single keyboard and mouse controlling up to 5 PCs.)

Not every PSAP does this and I think that's insane. The law and fire agencies we interface with absolutely do put a single PC on a desk (or in a cruiser) and use it for everything (and we filter and monitor the traffic coming in from them over our VPN heavily and block access at the first sign of anomalous traffic). Often their budgets don't support the notion of using dedicated computers for task-oriented work. The marketers have pushed general purpose devices for this kind of application.

In the last 5 years all three "hardened" systems we use (all companies acquired by Motorola) have started requiring Internet access for various APIs they use, and for integration with third-party vendors (mapping, public information databases, and task instructions for telecommunications). I think it's ridiculous, but I don't get to decide the direction of the product roadmaps or what the business stakeholders want from a feature perspective.

Motorola (who makes the CAD software used by some of the largest US municipalities) is pushing for hosted CAD and integrating hosted features into on-prem systems. (Of course, they have a managed security product offering that they want to sell along side it.)

EvanAnderson
2 replies
19h53m

Because FBI CJIS requirements, adopted by state law enforcement bodies, require it. I support a Public Safety Answering Point (PSAP, aka a 911 call center) and I push back on as many of the inane requirements as I can with compensating controls.

Example: As of right now I am still required to expire passwords every 90 days. My state is considering the current guidance from NIST but FBI CJIS policy still mandates the expirations.

tgv
0 replies
12h21m

I don't know what CJIS requirements entail precisely, but at a first glance, they seem reasonable. But it's weird that people then think they can comply by installing a product with a disclaimer against their intended use. It's just a token acknowledgment: "Yeah, we've read it, but we don't really care."

If that's also the interpretation of the courts, then each company would be invidivually liable, at least towards the government.

hypeatei
0 replies
4h57m

Holy shit I cannot stand the password expiration requirements. Like you said, NIST literally recommends against it but so many regulations require it. So aggravating.

wrs
0 replies
20h20m

Because no endpoint protection software exists that doesn’t have the same disclaimer clause. So you install this one and accept the lack of vendor liability.

(If such a thing did exist, it would cost a lot more!)

ale42
0 replies
21h40m

Same for datasheets of most electronic components. The manufacturers don't want the responsibility to avoid possible multi-million lawsuits.

sidewndr46
0 replies
20h19m

My experience has been better legal counsel has the relevant terms struck before the deal is signed. In this case it would have been the terms around Aircraft and aviation

jojobas
0 replies
15h10m

There often are limits to how much your can disclaim in your T&C. If under the same terms you cause damages deliberately you'll be held liable, and obvious gross negligence can be a factor as well.

There are often 3 opinions between any 2 lawyers so we have a chance to learn the outcome many months and millions of dollars later.

awad
0 replies
16h52m

Usually the largest of companies will have their own customized T&Cs governed in their Master Services Agreement (MSA) which are often very modified versions of these publicly available ones

azinman2
13 replies
20h30m

What this also tells me is there are a lot of computers connected to the internet that probably shouldn’t be.

0cf8612b2e1e
5 replies
19h25m

I suspect it is incredibly challenging to keep a non trivial number of computers available, but somewhat airgapped. Too many ways to unintentionally bridge the networks without extreme diligence. Which is slightly incompatible with something like airlines which employ enormous numbers of people.

azinman2
4 replies
19h23m

These are not small time operations (most of them). They are multi billion dollar companies with complex technical needs. This is doable and the bread and butter of good networking engineers.

Having not done this just cost them billions. Now imagine the US at war or another nation state that just wants to cause havoc.

nradov
2 replies
18h3m

That's ridiculous. The cost of building air gapped operations systems, and including the resulting loss of productivity and efficiency, would be worse than just accepting an occasional outage. People who lack technical competence and operational experience always tend to over react to software failures.

The major airlines may not be "small time operations" but none of them are even in the top 100 US corporations by market cap. They simply don't have the level of IT resources or competence that we see at major tech companies.

oceanplexian
1 replies
17h12m

It doesn’t need to be strictly air gapped, but there’s nothing technologically demanding about a computer used to check people in on a flight, it could be done over a 9600 baud modem and a thin client.

macintux
0 replies
16h25m

Nothing technologically demanding about global logistics software, being run by an industry with 0% profit margins (give or take, airlines don’t make much money), being asked to cripple itself 24x7 to avoid one weird outage every several years?

unethical_ban
0 replies
18h45m

Crowdstrike is a trusted application on every computer in these people's farms. It is not uncommon to have specific rules for these packages to be downloaded directly from the Internet.

You're suggesting either that Crowdstrike itself will get used as a vessel for an attack, or that banks and airlines have firewall rules open for enemies.

filleokus
1 replies
17h57m

Hmm. I think this is a pretty shallow take.

My experience from the airline industry is that the vast majority of systems classified as flight safety critical are not connected to the internet, or large networks at all. Which is good.

But unless we want to drastically change how airlines operate, the rest need to be online.

Today, you can purchase a ticket (or rebook an existing one) on your phone really close to the departure time. When that happens, a gazillion interconnected systems, across legal entity borders, need to cooperate to take you (and your luggage) to the destination.

To put all of this in a large non-internet network seems pretty pointless.

If we wanna go down that route, the only real "security improvement" I can think of is to dismantle the digital systems and go back to paper. Like Ryanair did during this incident. Handwritten boarding passess, verified against print-outs of passenger manifests.

tgv
0 replies
12h19m

the vast majority ... are not connected to the internet

But those couldn't have gone down due to Crowdstrike.

shagie
0 replies
17h46m

The related question is "How do you run your business out of downtown Fort Worth (American Airlines) and get your updates to 350 airports in 60 countries?"

Saying "run your own network" isn't exactly practical. Even imagining the very small airlines (that partner with big ones for the last leg) that only service a handful of rural airports, this doesn't seem practical.

The days of point to point updates over modems are overish with the amount of data that needs to be consistent transmitted and available.

I can imagine a modem at each airport and a phone bank of about 700 modems that are each getting or sending updates. The long distance calls to distant countries for that data could get expensive. Woe to the power outage in Texas that takes down the phone bank for a day or two or three or four.

Alternatively, there's a system that was developed to do this and it works pretty well most of the time. Combine this with having redundant systems there that are geographically separated. It isn't turnkey, but its probably better than other options that would involve home grown solutions.

rainsford
0 replies
18h35m

It's an interesting thought experiment to consider everything that would have to go into running operations for a business like one of the largest airlines in the world using a non-Internet connected network. Among other things, you probably lose the ability for your employees who aren't physically in the office (which is kind of a lot of them if you're running an airline) from interacting with your operations network. If you're an airline trying to schedule employees and share information with them while they're in hotel rooms, that's probably a deal breaker.

That's really only a secondary problem though, because disconnecting a network from the Internet isn't a replacement for security software or software updates, so you wouldn't even avoid the root cause of the issue here. I'm not saying CrowdStrike is essential software for Internet connected computers either, but if your business thinks it is, you should probably be running it on your "airgapped" computers too. And you should definitely be installing updates, so you can still fall victim to a bad updates regardless of which software you run. At best you perhaps increase the likelihood of hearing about a problem with an update before you deploy it on disconnected computers, but you can get a similar effect by delayed deployment of updates even on Internet connected networks.

hypeatei
0 replies
19h6m

Exactly, most of these systems probably wouldn't require EDR software if networking was done correctly in the first place.

PenguinCoder
0 replies
19h7m

Air-gapped networks have gotten more scarce in the day and age of the cloud computing. Expect for certain DoD and cleared spaces, I've even seen PLC networks internet connected....

Ekaros
0 replies
8h51m

So Web 2.0 was a mistake I take?

Problem once again is humans. Humans need to interact with systems, either receive information from them or give it to them or use the systems to process it. And for efficiency in general it nowadays happens online. It could be offline, but that would be slow. Or it could be segregated networks but that would get really expensive. Imagine having own fiber line for your instant messaging and email? With different terminal...

In the end most of these affected computers are on Internet for very good reasons. And this model really is working vast majority of time generating lot of efficiency.

nostromo
12 replies
20h3m

It's insane to me that CrowdStrike's stock is still up 66% year-over-year.

With all of the angry customers, lots of incoming lawsuits, and the fact that their "protection" is provably more costly than no protection at all now - I can't imagine why investors aren't dumping it like mad.

johndhi
5 replies
19h56m

My guesses: -no one really cancels their security vendors since security budgets don't shrink -they have a big moat so their customers won't be able to leave them

kd913
3 replies
19h12m

I am confused why they are around to begin with.

Companies already trust Microsoft, they buy Windows, Office, Azure.

Why would they bother with a 3rd party here when the low effort low risk solution is to pick the tool made by the OS vendor. I.e. windows defender

It should be a nobody gets fired for picking IBM situation. How did this random place get so much credibility that people trust them over the manufacturer?

kccqzy
1 replies
16h17m

Because they provide far more protection than Windows defender. You can write your own custom never-before-seen malware, and CrowdStrike will detect it purely based on behavioral signals. Windows Defender is still largely an antivirus solution.

Peanuts99
0 replies
8h52m

Microsoft's E5 offerings are a direct competitor to Cloudstrikes threat response products which is a lot more than just Windows Defender on endpoints. I'd imagine many of Cloudstrikes customers will be looking to move this to MS's tools instead as a result of this.

htrp
0 replies
16h18m

crowdstrike has oracle enterprise sales model. have you ever been to one of their events?

gavindean90
0 replies
14h16m

You don’t drop them until budget renewals, at least not for this. Solarwinds comes to mind as a company with a similar kind of thing.

chasd00
2 replies
16h9m

i replied upthread i think the stock price has bottomed from this event. It's way to hard to switch vendors like this at an enterprise scale. What's going to happen is the cloudstrike account reps are going to get yelled at and abused, some discounts are going to be offered for annual renew, then two years from now all will be forgiven/forgotten. In a year or so the stock price will recover and trend to more or less where it was before this event. I've already bought as much stock as I could.

AceyMan
1 replies
15h54m

It's way to hard to switch vendors like this at an enterprise scale.

Huh? Once you get the control plane/backend of a new AV vendor configured, you just uninstall AppA and deploy AppB on your nodes.

It's not like Crowdstrike is deeply integrated with other systems: it's an agent.

SandwichTeeth
0 replies
3h12m

This is massively underselling the kind of change management processes and potential challenges of scale a deployment like this would require at large enterprises. It's never as simple as "deploy app to nodes". Approvals, maintenance windows, deployment in waves (ironic I know, given the nature of the outage in the first place). Most places I've worked would require deployment to many sets lower environment machines of different functions first, then allow time to "bake" and ensure no issues crop up after things have settled. You would NEVER just yeet out a new agent to critical production systems without extensive change management, testing, and validation. I've deployed different AV products multiple times throughout my career (including Crowdstrike). It was never simple, and almost always took months to complete.

parmenidean
0 replies
19h57m

Good news then! You can short it and make a ton of money if you're confident this share price increase is a mistake.

nottommo
0 replies
18h57m

CrowdStrike makes it easy to pass your security audit. That's where the value is.

MattGaiser
0 replies
19h54m

1. Compliance. No protection at all isn’t a contractual option in many cases.

2. Companies react slowly. When has a vendor paid a high price for failure? Boeing can kill people and fail time after time still sell planes.

Catastrophe always changes less than anticipated.

knappe
12 replies
21h36m

For everyone flabbergasted by Southwest running ~Windows 3.1~ old software, I have bad news about the telecom industry. I worked at Ericsson at an R&D branch and one of the projects in the works was to move one of the main pieces of routing equipment that handled millions of telephony operatorations a day away from an ancient version of Windows.

A lot of code lives on much longer than you think. The general attitude we took was that most of the code we were writing would be running for at least 30 years. And that was the attitude at an R&D branch, arguably a side of that industry where we were working on the new tech.

Edit: Win 3.1 or something else, the point still stands. There is a lot of old software running out there that will continue to run our core services. Legacy software doesn't just mean v1 versus v2, it can mean v1 versus v41.

fckgw
7 replies
21h23m

Southwest does not use Windows 3.1. Why does not one read the article?

Southwest wasn't affected because they don't use Crowdstrike. That's it.

philipwhiuk
3 replies
21h17m

Yes and that article is wrong.

knappe
2 replies
21h8m

Win 3.1 or not, the point still stands. There is a lot of software out there that has been running for a really long time and will continue to do so.

Relatedly, it is nice to provide a source for your claims. I did see this [0] which would have been an appropriate thing to link

[0] https://kotaku.com/southwest-airlines-windows-3-1-blue-scree...

otterley
1 replies
20h59m

Dude, take the L. But for that false story being recited, you wouldn't have made that point in the first place.

For everyone flabbergasted by Southwest running ~Windows 3.1~ old software

You said it; own it. You could have said "For everyone who was tricked by the joke that Southwest runs ~Windows 3.1~ old software" but didn't.

knappe
0 replies
20h35m

That is fair. I wasn't aware of the kerfuffle around whether Southwest was using 3.1 or not. I took the source and linked source at face value. It would have been nice to have had someone do more than "nope" and instead link to a reputable source. This is how you counter disinformation.

kayodelycaon
1 replies
21h16m

No one runs servers with Windows 3.1. They would have used Windows NT.

The really damning bit is Windows 3.1 did not have preemptive multitasking. It barely had networking. You couldn’t run a server with it if you wanted to.

SoftTalker
0 replies
20h50m

They would have used Windows NT.

Or, at the time, OS/2

wrboyce
1 replies
21h29m

I find this more surprising, even if the Southwest & Win3.1 claims were true, I would expect most Ericsson systems to be Erlang based and thus happily chugging along on a (perhaps ancient) Linux box.

llm_nerd
0 replies
21h30m

For everyone flabbergasted by Southwest running Windows 3.1

Southwest isn't running Windows 3.1, though. That's some rather lame, but predictable, truth-through-repeated-assertion thing on social media.

Not everyone uses CrowdStrike, and in this case SW was the lucky one that didn't.

ketchupdebugger
0 replies
21h28m

Good thing a lot of our banking still runs on mainframes, will never be taken out by crowdstrike

hypeatei
9 replies
19h3m

has hired a law firm and will seek compensation from Microsoft and CrowdStrike

Going after Microsoft seems like a misguided move here. What does Microsoft have to do with a third party driver installed by your own IT department?

bruce511
6 replies
14h41m

I suspect the lawsuit is created by lawyers, not techies.

Equally reporting on this whole issue seems to be by journalists, not techies. It's been framed (a lot) as a Windows issue not a Crowd Strike issue.

(8.5 million machines were affected, out of 1.4+ billion windows machines [1])

I have one affected customer (10k machines) who assumed I'd suffered like he did, and was surprised when I said we weren't affected. The reporting was consistently that it was a Windows issue, caused by an MS update.

Even this article leans into this narrative...

"as the New York Times put it, “It is more apt to ask what was not affected.” The answer is Linux, Macs, and phones."

Let me add "not to mention 99.4% of windows".

So the journalists don't know what happened, or who was affected, and felt "some computers have a problem" was a weak headline. The lawyers get that narrative and run with it.

And yes, it's easy to squint and claim the "OS should cope with this", but there's realistically limits on what an OS can do once you install a kernel-level driver on the machine. Should we go after Intel for making the chips?

[1] https://www.pcworld.com/article/608447/microsoft-delighted-b...

vel0city
2 replies
4h24m

Windows is unsuitable precisely because it can be brought down by third party updates

If I run bullshit on a Linux or MacOS box it can also be unstable and brought to its knees. Or is that poster really trying to argue there's no way you can get a Linux box to lock up?

fsflover
1 replies
3h13m

Key quote from my link:

Third party vendors are forced into writing unsafe kernel drivers because Microsoft does not provide sufficient user mode APIs.

AFAIK it's different on Linux, and the reliability is higher. Is this not the case?

vel0city
0 replies
3h1m

https://forums.rockylinux.org/t/crowdstrike-freezing-rockyli...

https://access.redhat.com/solutions/7068083

https://lists.debian.org/debian-kernel/2024/04/msg00202.html

You can install buggy kernel modules in Linux as well. I can't even count how many times an apt upgrade/yum update made my system unbootable when using nvidia GPU drivers.

And besides, if you're really wanting that AV system to deeply know about everything the operating system is doing and hook into tons of syscalls, you pretty much can't be running exclusively in usermode. If someone compromised the root of the system you then can't trust the info the kernel is giving your usermode application. eBPF isn't usermode.

And in the end, the poster literally said Windows is unsuitable because third party updates can kill it. That was the key takeaway from their post. Well, third party updates can kill Linux, it can kill MacOS, it can kill darn near everything.

bruce511
0 replies
3h40m

It's an argument. I'm not sure it's a _good_ argument, but hey it's an argument.

realusername
0 replies
11h27m

Reports in the mainstream media were absolutely insane. All the language used points to some kind of unlucky event similar to a bad weather pattern. I knew the general IT knowledge isn't very high but I didn't expect newspapers to report on it like a tornado or an earthquake...

travoc
0 replies
17h45m

It was too tempting to include damages from accidentally enabling New New Outlook.

L-four
0 replies
8h5m

You, want Microsoft named in the case so CrowdStrike can't defect that it's Microsoft's fault.

roshankhan28
7 replies
10h32m

i really dont understand, how can my social media have better backup and infrastructure as compared to an OS which is being used by worldwide?

rr808
0 replies
7h41m

Meta is one of the most valuable companies in the world with the most resources to buy the best of everything. At 1,280 Billion dollars of market cap it is 30x bigger than American, Delta and United put together. It made $39 Billion last year compared to $7.8 Billion for all US airlines together. Of course it has better systems.

pjc50
0 replies
9h31m

Windows has always been terrible for reliability. Adding a "security" system which is invasive and always-updated makes the reliability worse.

owl57
0 replies
5h57m

I wouldn't overestimate FAANG's immunity to crash-the-world config updates. Facebook had everything including engineers' access to the datacenter down for hours in 2021:

https://news.ycombinator.com/item?id=28750894

> infrastructure as compared to an OS

By the way, I don't think quality of Microsoft's infrastructure is relevant here.

hiddencost
0 replies
9h9m

Because one is in a data center that can be controlled, and the other is deployed to user owned hardware that cannot.

goodcanadian
0 replies
8h45m

Because IT is your social media's business. They know IT inside and out. They understand what can go wrong and how to mitigate it. The business for airlines (for example) is to fly planes. They are pretty damn good at it. IT, however, is just a tool to them that they buy elsewhere. They don't understand it in the same way as social media. They rely on outside contractors to do it right: outside contractors who get the job based on being the cheapest or convincing the buyers their service is "industry best practice."

Nextgrid
0 replies
4h35m

Compare the salaries, working conditions and prestige offered by tech jobs at a social media companies vs some large legacy company like a bank or airline.

In the former, you are paid well and have some sort of prestige and political capital. In the latter, you are underpaid and your prestige/political capital is often equivalent to the janitor's.

Ekaros
0 replies
9h5m

Because they don't do mass rollouts on the servers. Then again those companies could fail if they had single point of failure with automatic mass deployments...

This could happen for anything that supports this type of automatic mass deployment. Just in this case that thing was popular enough and happened on one of the most popular platforms.

ssivark
4 replies
22h13m

Apparently Southwest Airlines’ ingenious strategy of never upgrading from Windows 3.1 allowed it to remain unscathed.

OMFG, does this mean we need to be prepared for a (juicy) “IT failure” that brings down Southwest at some point?

tracerbulletx
0 replies
21h30m

This isn't true, and that should have been obvious to technical people. It's so sad that we have a tech media that doesn't give a damn about making things up.

toast0
0 replies
21h52m

Southwest experienced this kind of scheduling issue in 2021 [1], and again in 2022 [2]. Honestly, if they're running win 3.1 or win 95 as suggested, I think that puts them in a better place tech wise than keeping up with the Joneses on the upgrade treadmill --- although they should consider updating to windows 3.11, because they have a workgroup :P and the microsoft hearts network is pretty cool; but they have historically done poorly on scheduling after a significant disruption. An article from last year [3] says they updated their crew assignment software as well as increased staffing in colder airports and in general and got more deicing equipment. We won't really be able to tell if it works, until they experience another disruption.

[1] https://www.cnbc.com/2021/10/12/southwest-airlines-reduces-c...

[2] https://www.npr.org/2022/12/26/1145536902/southwest-flight-c...

[3] https://www.npr.org/2023/11/09/1211064462/southwest-airlines...

ryanmcbride
0 replies
22h1m

You don't even have to wait it's been happening

jujube3
4 replies
22h10m

Sounds like we saved a lot of tons of CO2.

aflag
1 replies
21h57m

Hard to say, it could actually increased emissions. As when the timings of things don't align correctly it's common to cause an increase of resource usage. Eg. people travelling less optimal route, extra commutes back and to the airport. People having to physically travel to datacentres in order to fix things, just rebooting the machine without need will use more CPU.

mbreese
0 replies
21h29m

Or planes taking less efficient routes or flying at faster speeds to “make up time”.

systemtest
0 replies
19h54m

In my country, companies are required by law to keep track of their CO2 emissions.

In the case of CrowdStrike, they would have been able to deduct this event from their emissions for decades.

more_corn
0 replies
22h4m

Always look on the bright side!

jijji
4 replies
15h27m

basically any airline using linux is not on that list

bruce511
1 replies
14h33m

It's more accurate to say "any airline not using Crowd Strike is not on that list."

Blaming Windows for this outage is like blaming Linux for Apache bugs. The two systems are distinct.

It just so happens that Crowd Strike was very successful at selling to large corporates. That includes some airlines.

99.4% of Windows machines were unaffected. Including those of airlines using Windows, but not Crowd Strike.

jijji
0 replies
13h45m

ahhh yes you are correct

namdnay
0 replies
9h10m

every airline in the world "uses linux", the core reservation and distribution systems were migrated from TPF to Linux over the past 20 years

misja111
0 replies
11h38m

Can you name one?

firtoz
4 replies
22h13m

Is there a similar global analysis?

jjwiseman
1 replies
21h37m

Maybe I'll do a Part 2: The World.

opdahl
0 replies
15h44m

As a non-American that would be very interesting.

jijji
0 replies
15h24m

love to see the airlines using linux and what kind of problems, if any, they experienced that day

fullspectrumdev
0 replies
22h5m

I’d love to have some solid numbers of “global cancellations due to” - I heard a bunch of varying figures so far.

aftbit
4 replies
22h0m

One interesting feature of this outage was that "PROD" was generally fine, on account of mostly running on Linux and/or ancient proprietary software, while "CORP" was generally wrecked, on account of mostly running Windows. In other words, the bank systems responsible for moving money mostly worked, while the systems responsible for allowing humans to interact with them (to issue approvals, change configuration, or other ops things) often did not.

brazzy
1 replies
21h57m

In the original thread there were some reports of people having their Linux systems taken down by Crowdstrike as well. At separate times, of course, and I supposed the greater heterogeneity of Linux distros prevents events of this magnitude. But that would be little consolation when it takes down your systems.

foobarchu
0 replies
20h18m

Those should be considered coincidence until proven otherwise. Crowdstrike is intended to bring down systems when it believes there was an intrusion, after all.

7thaccount
1 replies
21h57m

Same thing for a lot of industries actually. PROD runs on Linux and probably has some delay to prevent this. Corp gets hosed.

LeifCarrotson
0 replies
20h58m

Yep, here in manufacturing production/OT PLCs run on Wind River VxWorks from Rockwell, Siemens, and others. The HMI (human-machine interface, basically a touchscreen used to display status and enter setpoints and other data) and SCADA/ERP systems run on Windows. Sometimes, this is an industrial fanless PC with eg. Ignition (Java+Python) software, other times it's a Rockwell Panelview which actually still run Windows CE 6.0.

This gets to be a problem when IT wants to get their hooks into OT networks. The PLC is meant to be left alone, and will happily send its Ethernet packet to that servo drive or digital IO card every 10ms for literal decades. There is no reason to update its firmware ever, just don't expose it to the Internet. But corporate wants everything on the Internet.

The PLC will reliably run its sequence when you close the contacts on the physical "Cycle Start" pushbutton. But if corporate is down, you can't know what part number you're supposed to make or how many of them, or get a serial number from and report test results to the traceability database.

Zigurd
4 replies
21h56m

I would like to know if a solid, up to date, well-rehearsed disaster recovery plan saved anyone's butt, or if we're all just raw dogging our machines whether IT is paying for backup and recovery or not?

ta1243
0 replies
21h53m

Our systems worked fine, we expect things to fail - including software like sentinal one, crowdstrike, etc, and have DR systems which can keep us limping along. We have DR systems which will work should other things happen - say the Thames barrier fails (i.e. no docklands)

Unfortunately some of our outsourced suppliers didn't have such attitudes.

paulddraper
0 replies
3h25m

I've never seen it.

Obviously some selection bias there, but I'd love to hear some success stories.

berniedurfee
0 replies
1h39m

They certainly have DR infrastructure primed and ready to go… with Crowdstrike pre-installed on every DR server.

camillomiller
3 replies
11h16m

Berlin Brandenburg got hit hard. As a disgruntled BER user, I am NOT surprised they had one of the worse repercussion.

nicbou
1 replies
11h12m

German IT is often hit hard by such things. Unless of course they're still running on paper.

At least they immediately mentioned it on their website, as a banner right at the top. The immigration office's appointment system has been down for over a month, and it took them 3 weeks to just acknowledge it.

camillomiller
0 replies
1h23m

Thanks for your website btw. My partner’s currently renewing her work visa (she’s from Australia) and it’s insane how bad the situation is. To the point where it feels like it should just be illegal for Berlin to operate services like this. Appalling.

ta1243
0 replies
4h32m

I'm still shocked that Brandenburg is actually open!

pimlottc
2 replies
13h44m

One thing I don't understand from these graphs - why was there a relative uptick in takeoffs starting a short time /before/ the CrowdStrike update was pushed? It's in the overall graph, as well as the graphs for United, American, and especially Delta. I can't think of any reason for this, maybe it's just random noise, or maybe there was something unusual about the previous week at the same time?

mike_hearn
0 replies
10h6m

It was widely reported to be the busiest travel day for quite a long time, which compounded problems.

account42
0 replies
9h58m

Yeah, shouldn't have been too hard to add a couple more weeks so you at least get an idea about variance.

skrebbel
0 replies
21h40m

It’s also not true

mjevans
1 replies
21h58m

Outsourcing a core business competency and surely also cutting the contracts to the bone as well to pocket the savings embrittled Delta and I seriously hope the compensation to customers costs more than any savings or profits they made in the interim. It MUST be painful enough that they do not repeat this mistake again.

The article quotes https://www.reddit.com/r/delta/comments/1edtfbh/why_did_delt... (with improper attribution)

topgun966Platinum wrote on Reddit """ These "experts" are completely wrong. The core issue was Delta did NOT have a proper DR plan ready and did NOT have a proper IT business continuity plan ready. UA, AA, and F9 recovered so fast because they had plans on stand-by and engaged them immediately. After the SWA IT problem, UA and AA put in robust DR plans staged everywhere from the server farms, to cloud solutions, to end-user stations at airports. They had plans on how to recover systems. DL outsources a lot of their IT. UA and AA engaged those plans quickly. They did not hold back paying OT for staff. UA and AA have just as much reliance on Windows as Delta. AA was recovered by end of data Friday and resumed normal operations Saturday. UA was about 12 hours behind them having it resolved by Saturday morning resuming normal schedules Saturday afternoon. The ONUS is 100% on DL C+ level in their IT decisions. The problem is that the lower level IT staff is going to get the brunt of the blame and the consequences. """

tiahura
0 replies
3h24m

That’s why I think the suit against crowdstrike and Ms is mostly a dud. First you have to get around the waiver (much harder for business than a consumer) and then you have to deal with comparative fault - ie delta’s disaster recovery system sucked.

otterley
0 replies
20h55m

It blows my mind how many people actually believed the claim -- clearly in the obvious-joke category -- that SWA is running their mission critical flight systems on Windows 3.1. (Yes, Southwest runs a lot of old tech in their stack, but that claim is patently hyperbolic.)

People need to stop believing everything they read on the Internet and have a little bit of skepticism.

miohtama
0 replies
3h49m

How to avoid getting rekt

Southwest wasn’t affected because they don’t use CrowdStrike
bandyaboot
0 replies
18h30m

Anyone know why Minneapolis-St Paul began experiencing cancellations much earlier than other US airports?