Looking at the Downdetector home page [1], it looks like many more services are having outages, not just the ones owned by Meta, including:
- YouTube
- Google Play
- T-Mobile
- X (Twitter)
- Discord
- TikTok
- Pokemon Go
- Snapchat
It looks like they all have the same failure point.
Or people are using Facebook Auth for them. I don't really trust Down Detector, which despite the claims is really People Winging on Twitter Detector.
I'm confused. Isn't listening for spikes in complaints about outages a great way to detect them? I know for a fact some service companies monitor social media channels for this purpose (among others). I'd be surprised if that wasn't more or less standard practice.
I've checked Down Detector for ISP outages in my area many times now. It's always confirmed them before my ISP did.
When there's a major ISP outage, people report problems with all the major sites. When Facebook's down, people report problems with any site that has "Login with Facebook" as an option.
It's almost never actually an outage impacting all of FAANG at once.
> When Facebook's down, people report problems with any site that has "Login with Facebook" as an option.
If users log into your site with Facebook, then the login functionality of your site effectively is down when "Login with Facebook" is down.
From the user's perspective, your subcontractors, including authentication subcontractors, are a problem for you to deal with and never show them. From your perspective, you could have architected your site in a way that logging in doesn't "go down" when Facebook login is down.
If the user chooses "Login with Facebook" over other authentication options available, and they don't want to use other options, educating them with a good error message might help. Or you could remove the Facebook login option, if you (totally reasonably) don't want Facebook's failures to reflect poorly on you.
There are plenty of sites where "Login with Facebook" is a convenience but hardly the only way to log in. Reddit, for example, has "Login with Google" and "Login with Apple"; it would be highly misleading to claim "Reddit is down" if Google's OAuth flow was having an outage.
Nothing in the API or OAuth flow would make that doable in an automatic fashion with this outage. It'd have to be something you put up manually as a banner after hearing of the outage.
I don't particualrly care; we're talking about why DownDetector isn't necessarily ideal for assessing. It can be a useful signal, in some scenarios, but I've seen plenty of spurious signals come from it.
> Nothing in the API or OAuth flow would make that doable in an automatic fashion with this outage. It'd have to be something you put up manually as a banner after hearing of the outage.
That is fair: if I choose to architect my site such that a user-critical feature goes down when a 3rd party service goes down, it behooves me to monitor the 3rd party service and do whatever necessary to properly inform users what's going on.
I edited my post unfortunately after you replied, but another option is removing the parts of your site that rely on 3rd parties, if you don't want the failures of those 3rd parties to reflect poorly on you (which they reasonably would).
>we're talking about why DownDetector isn't necessarily ideal for assessing. It can be a useful signal, in some scenarios, but I've seen plenty of spurious signals come from it.
Indeed, and if a bunch of users say that a feature of your site is down, even if it's a result of a 3rd party failure: chances are, that part of your site is down, and it's partially your fault for relying on a 3rd party for that feature. The users correctly don't care what the root cause is, they expect you to either mitigate it or don't have a feature they rely upon be unreliable.
All that's fine, but totally misses the point.
Take a look at https://downdetector.com/status/aws-amazon-web-services/ ; scroll down to the comments.
"SSH and Dbconnect stopped on all of my EC2 instances. Anyone else?"
"I can't add a payment method"
The chart shows a big spike this morning, but there was no AWS outage, nor does Amazon use Facebook login.
Again, DownDetector can be a useful "is something unusual happening right now" signal, but it'd be a mistake to take its attribution at face value.
Ignore the comments on DownDetector for a moment and check out that huge spike in reports recently. Clearly something wrong happened with AWS's user experience. That's something AWS needs to resolve, in the eyes of their users.
>The chart shows a big spike this morning, but there was no AWS outage
Are you sure? If hundreds of users simultaneously reported there was some sort of outage, particularly a huge spike like we saw, chances are there was an outage.
>Again, DownDetector can be a useful "is something unusual happening right now" signal
Exactly! Specifically, "is something unusual happening right now with my site, in the eyes of my users?" Every site owner should know when that condition is true. What you think about your site "up-ness" isn't as important as what your users think about your site "up-ness". What you attribute your downtime to, isn't as important as what your users attribute your downtime to (you.)
But that's not the case. It's a false positive.
Pick a DownDetector service and open the page every day for a few days. You'll see it most of the time just reflects people waking up in the US timezones.
Exactly. If you click through down detector when things are _up_ you'll see people still complaining that $site is down. Could be a local power outage or even a flaky connection in their own home.
Down Detector is one of many signal sources and should have a "credibly" score associated with it that's proportional to the number of people complaining that something's down.
do you really think there are masses of people who can’t tell the difference between a single sign on service being down and individual sites being down and reporting it to downdetector?
Even if there were doesn’t the outage graph give you exactly the information your asking be curated?
I very much don’t like the idea of downdetector using some special sauce to hide or limit outage reporting.
Have you never seen The Website Is Down? https://www.youtube.com/watch?v=uRGljemfwUE
The answer is: way more people than a software developer might think. Ask anyone in IT, or go to anywhere bugs are reported and read a handful.
Yes, absolutely. 100%.
https://downdetector.com/status/aws-amazon-web-services/
Was there an AWS outage this morning? The graph sure looks like it, but there wasn't.
I can guarantee you with 100% confidence from experience that the call centers for AT&T, T-Mobile, Comcast, etc. are all blowing up right now because of users who assume that if the Instagram app isn’t loading it means the “wifi” is broken. Also keep in mind “wifi” doesn’t mean 802.11, it means “anything related to the internet” up to and including 4g/5g and Ethernet.
Yes? That's how all top-level reporting is going to work. It's not going to tell you which part of your service is inaccessible. It's just telling you that people can't access it. You obviously have to do additional investigation to figure out why people are having trouble.
Scroll up the thread a bit; https://news.ycombinator.com/item?id=39605354
Even here on HN, where people should know better, people take its incorrect attribution as useful info. TikTok isn't down. X isn't down. Google isn't down.
That's kinda the point though isn't it? DownDetector is showing an early indication of a major outage in both of your examples. The issue may not be caused by the indicated service, but it's still a useful information source especially when we can correlate reports on there with what we are seeing in our internal monitoring.
A big spike on DownDetector is an indication of something going on.
Its attribution of what/who is often incorrect. You'll see "maybe it's more than Big Site X!" comments come up on every HN thread like this citing DownDetector; it's almost never the case, and folks on HN should know better.
The OP means there is a lot of collateral noise from people who are just tech savvy. Eg. “oh no, I can access Facebook, my internet must be down. Let me login in Down Detector to file a complaint against my ISP”
The problem is the source of the reports and display of the reporting.
I'd trust Down Detector a lot more if it was filled with Hacker News community -- people who are able to understand that there's "DNS" and "Routing".. and that your phone can have internet access at home while your home PC does not.
I personally hate Down Detector's graphing because it can make it 'look' like there's an issue when there isn't really... Facebook with 500,000 reports looked as down as Google with 1,000 reports... For equally sized / used entities, I would not trust that "Google" is down with 1,000 reports. I had a coworker ask me what was going on with the internet because "everything is down.. Facebook, google, gmail, microsoft!" (when seeing the Down Detector home page)
DD should normalize the graphs against the service history in some way. A service shouldn't spike because it had 30 reports / hour for a day, then suddenly has 100... when it has a history of being out with 100,000+ reports. The 100 reports are probably mis-reporting, but you can't tell until you dig into each service, one by one, with separate page loads.
The problem is that people use FB login for other sites, and if FB login is down, many users report a problem with that other site, not with FB.
Yeah, it's wild that it's now treated as an authoritative source, especially by some news organizations.
It's as good as asking a neighbor what happened with a loud noise down the street. Sometimes you'll get something good, sometimes it'll be completely wrong.
Asking my neighbors if they know what some loud noise was or about some local disturbance has been extremely reliable in my experience. The one time someone gave me an explanation about something which wasn't mostly right they qualified it with something like "So-and-so said it might be such-and-such but I don't know if it's true".
You must have an exceptional neighborhood. Everywhere I lived, here's a handy map of "actual cause" :: "what the neighbors said it was"
Car exhaust :: gunshot Appliance delivery truck liftgate :: gunshot Transformer explosion :: gunshot Garbage truck :: gunshot 787 at 25000ft :: complete ruining of peace and quiet Any police activity :: probably someone robbed a bank
For the record, my city has (statistically indistinguishable from 0) homicides and bank robberies and, by American standards (I know, I know) no particular issues with gun crime.
I can imagine it being different in a city. I'm in a fairly quiet suburban area.
One time I heard a loud boom. A few hours later I saw a neighbor outside and asked if he'd heard it and if he knew what it was. He told me a house a few neighborhoods over had exploded. I was a bit skeptical of it but he turned out to be right.
Downdetector is nice because it answers my question of "is anyone else having issues with this?" When it takes AWS an hour to even acknowledge "increased error rates", and tells me that everything is a-ok in the meantime, I want another perspective.
Twitter's search used to be my go-to for this - a search for "AWS down" would typically be very illuminating - but it's tough to get it to genuinely spit out the most recent tweets with a keyword these days.
The New York Times just posted a news flash ... citing Down Detector :-P
Gmail is also on the list. You can't use FB auth to login to Gmail, can you?
Gmail was also experiencing issues: https://www.google.com/appsstatus/dashboard/incidents/shD5Vv...
YouTube was definitely doing something weird that doesn't seem likely connected to Facebook.
A couple hours ago after watching a video I went to my home page, which usually shows recommendations based on what I've recently watched plus a few videos labeled as sponsored that have nothing to do with any of my interests.
Instead everything on the home page was either a sponsored video, or a movie that was free to view with ads, or something from one of their music products.
I tried from an incognito window to see if it had something to do with being logged in. Normally going incognito loses the history-based recommendations but at least recommends user uploaded content. But now it has just like my logged in home page. No user content. Just ads and videos from Google's movie and music services.
Refreshing gave an error that said something went wrong. I then logged in on that page and again got something went wrong. Another refresh got a page with some user content. Another refresh was the ads and Google stuff page.
A little later it seemed to clear up and now my home page is back to normal.
i trust Down Detector more than the (majority of) companies who are silent during outages
hell, i'm surprised Down Detector hasnt been outright sued due to the graphs being an actual honest representation of availability that shitty companies cannot hide
People are using Facebook Auth for YouTube?
Seems likely. TikTok and YouTube are currently working for me, while Meta platforms aren't.
Their outage heatmap is also basically a population density map too. https://xkcd.com/1138/
HN seems to be struggling too, but that could just be everyone here to talk about the outages.
That's the standard HN experience, this site runs on a single core I believe.
Does it really or is this a joke?
Edit: I found the following, I wonder if it's still the case.
https://news.ycombinator.com/item?id=16076041
It's real. Single core performance improves all the time. People overestimate how much power it takes to handle lots of queries per second on a well-tuned system and well-written software in 2024.
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
I see the "sorry, we are receiving too many requests, try again in a few minutes" error several times a day on here. I don't think that HN is reliably able to handle the amount of users it currently has.
I have been using HN daily since I was a teenager. I've seen that message maybe 10 times outside of serious issues in last 15 years. It's strange to me that it happens so frequently for you.
That's difficult to believe to be honest. I get it several times a week.
I get something like that when I try to comment and then upvote too quickly.
I believe that's by design if you send an action request very quickly after a previous one. It's very easy to replicate. Open a post. Then click the upvote button and very quickly click the favorite button too. That will trigger it. I think it's used to rate limit.
dang, linked in one of the ancestor comments. But I still suspect you are correct.
I just tested it by quickly upvoting your comment and then favoriting it and the error was:
Note that this only seems to happen for actions. Doesn't seem to be the case if I am just loading a page quickly.
Hmm. I get a different message. Something like "We are having trouble handling your request. Sorry!"
I saw it just a few minutes ago, but I don't remember the exact wording...
It's a feature
Note that's the application server process being single threaded, but the server machine is 4 core, so nginx cache etc use other cores
A worthwhile distinction!
I looked up the CPU mentioned in the link from your other comment. It looks like HN handles enormous traffic on about 2x the power of the last Celeron chip ever made.
https://www.cpubenchmark.net/compare/2383vs5793/Intel-Xeon-E...
I wasn't overestimating anything, but with how easy it is to write concurrently software today, why limit your site to a single core.
Maybe it's a lisp thing. Who knows what mysteries lurk here
Doesn't he mean single socket by single core?
These are good for actual business needs, but bad for resume-driven development.
See also: https://news.ycombinator.com/item?id=28478379
You don't run a massively profitable VC company by just throwing money away at a second core.
Happens whenever Sama sneezes too
It's honestly despicable how billion dollar companies at the forefront of making AI have such insanely brittle networking.
"blah blah blah I don't like the company so all their engineers must be really stupid..."
"The metaverse is the future of this company! We've hired the very best brains in the world to make these 1998 era PS1 polygon avatars!"
whole company goes offline
The Metaverse engineers and AI researchers don't actually run the infrastructure, but by all means, keep shitposting. It's so clever.
Yes, but there's been a trend recently to cut costs on all engineering except AI. Might be related
You keep defending developers while I'm pointing the finger at corporate. But good victim complex.
Yep. I've also noted that the people making such claims never seem to cite their own work as an example of how to implement something at Facebook or YouTube scale that is less "brittle".
Armchair quarterbacking isn't just a U.S. football phenomenon.
Dealing in absolutes is a result of misaligned expectations.
"The Force SHOULD have saved my mother's life!" - Anakin Skywalker
iirc these all use GCP which would make sense for them all to be disrupted at the same time. I wouldn't have thought Meta was GCP reliant though?
all of them are using oauth, likely auth provider issue?
https://www.cloudflarestatus.com is reporting an issue with SSO login. So seems like you might be onto something..
Definitely not us.
And I assumed it would be DNS again. Would referral traffic cause issues like this?
There’s no way fb uses gcp
Down Detector is so unreliable. People that can't call an AT&T phone via Verizon will think (and report) that Verizon is down, when it's really AT&T. People can try logging in using Facebook's on click login and not be able to get in, so they think Tiktok is down. It's not all that useful. I hate when journalists cite it.
It's only slightly better than "my mom claims". My mom would ask if I had the internet at my house. Yup. all of it. in a rack in my bedroom closet. She'd also report the "internet is down" when a single website was having issues. To me, that is down detector carrying on the legacy of moms everywhere.
A single report on there is useless. A sudden flood of reports is a good sign that something interesting is happening.
Eh, mostly it's people misunderstanding what it represents.
If I can't login to tiktok because FB is down, then tiktok is effectively down for me. When it comes to technology most people don't care about the trip, they care about the destination.
So yea, tiktok isn't "down" but for a lot of people it might as well be, hence coupling your infrastructure/auth on other providers has side effects like this you must take into account.
It has false positives and noise for sure, but it's also very sensitive and shows issues very quickly.
I wouldn't trust it as a single source, but in a case like this where our internal monitoring shows a spike of issues with the Google APIs and we can see a huge spike in reported issues for Google on Downdetector starting at the same time, it's useful to confirm that the issues have an external source.
S**'s gettin' real: https://downdetector.com/status/pokemon-go/
Scrolling down their list, why is DoorDash the only one that didn’t have a spike this morning?
Everything is going down, except Steam. You know where to find me.
Some piece of core infrastructure went down because everyone got spike at the same time. Surprisingly DoorDash and Steam was up
What's your fave conspiracy theory? Massive cyberattack for Super Tuesday? Powers-that-be mandated takedown? Mossad sleeper agents activated? Covid-brain struck that one engineer attending to that one wire that kept everything going?
Houthis destroying underwater cables in the Red Sea.
i knew my packets took a wrong turn at Albuquerque
I guess the swamp got drained, so there's no more flow through the tubes.
There is no consistent scale on that graph, so any local maxima of reports received would look similar to any other.
Exactly this. FB topped out around 520,000 reports. Google topped out around 1,400. That's a massive difference in scale.
Both are above their baselines, but I bet some is just mis-reports, or increases in awareness due to more people checking in.
Meta seems to be the only one really affected from what I can tell.
We saw a big spike in latency and failures on the Google OAuth apis starting at the same time (15:21 UTC)
Additional fun factor: today is Super Tuesday - primary elections in a lot of US states.
This outage will result in absolutely no ridiculous conspiracy theories.
If your election integrity relies on Facebook, YouTube, or even DNS to be up... there are bigger issues.
Actually, if all of them including Xitter went down, maybe things would get better? All the sunlight photons might get sucked in by too many eyeballs though, and there could be grass trampling.
I agree. While I don't think it likely that Facebook or YouTube would enter into it, I'd pretty much bet that DNS being down would cause problems.
And yes, there are bigger issues with that. Much.
A lot of sites have features that say “log in with your X or Y account.” They connect to each other somehow. I never studied that protocol. I wonder if authentication failures across services could be tied to it.
For process of elimination, do all of these services do multi-platform logins? Or do some not connect to anyone else?
that actually infuriates me more than cookie banners. the one from Googs is the worst offender.
Most of those seem OK for me now, and DD agrees. This seems to have been a temporary blip for all of them, possibly some kind of service switchover/fallback "not entirely unrelated" to the Meta outage?
Edit: actually a more attractive theory, given the very short timelines and near simultaneity of all those failures, is that downdetector itself had a failure, possibly a Meta-dependence, that they noticed and corrected quickly.
GCP seems fine, and no issues logging into Google Cloud Console.
Fb is down but YT is still up for me.
Facebook audience is in the billions, so you will see 100k false positives when a big site like that goes down.
Services are coming back to live.
Interesting to see that all static content was still working during the outage (at least for Instagram). It was still possible to swipe through all reels (I assume the list was cached).
What could have this sort of blast radius? BGP?
Man the postmortem on this is gonna be fun.
“Yeah so it turns out when Facebook and Instagram goes down so does Google”
I do not envy the SREs at either company. I'm pretty sure all those other ones use Facebook or Google as their OAuth provider which is why they are all being reported as down.
Thanks for sharing this, from what I have read it looks like an issue beyond just Facebook.
Google was acting up for me as well, so that could be
Cloudflare or AWS ?
Downdetector has successfully detected 150 of the last 20 outages.
It’s mention should honestly be banned from this site.