return to table of content

GitHub was down

sebmellen
49 replies
19h26m

I've never seen an outage this big. Even the homepage doesn't load. We've had recurrent issues with Actions not running, but this seems a lot bigger.

The status page says all is well, though: https://www.githubstatus.com/. Hilarious.

ergocoder
16 replies
19h22m

I wonder why the status just doesn't ping github.com for 200. That seems easy to do.

tinyhitman
8 replies
19h21m

delaying SLA

sebmellen
7 replies
19h21m

This is at least a multi-million dollar payout (if they admit to it).

All GitHub Pages say

We're having a really bad day.

The Unicorns have taken over. We're doing our best to get them under control and get GitHub back up and running.
cbates
5 replies
19h7m

Seems slightly unproffesional for a massive company like Github/Microsoft.

xp84
2 replies
18h53m

I disagree. This hurts no one, and not everything needs to be sanitized and painted over with bland corporatespeak.

majewsky
0 replies
9h34m

I don't think they were asking for corporate speak. But at least I would find a plain technical error message like "cannot contact file server" much more respectable than something like "unicorns are hugging our servers uwu".

COMMENT___
0 replies
1m

This “ironic” and “humorous” style of errors and UI captions is the actual new corporate speak. I’d prefer dumb error messages rather than some shit someone over the ocean thinks is smart and humorous. And it’s not funny at all when it’s a global outage impacting my business and my $$$.

colimbarna
1 replies
18h40m

It's closer to the truth than you usually get. They're having a bad day, it's completely true. It's the start of my day, but I guess this is the middle of the night for them. There's no such thing as unicorns, but that just highlights the metaphorical nature of the remaining claim - getting Unicorns under control means solving their problems. Normally "professional" corporate speak means avoiding saying anything whose meaning is plain on its face and disconfirmable while avoiding the implication that the company is run and operated by humans. This is a model. (Obviously the came up with the message in advance, which just goes to show that someone in the company is well enough rounded to know that if it is displayed, they're having a bad day.)

wrs
0 replies
17h20m

GitHub is (was?) a Rails application, so it was probably originally running behind Unicorn [0], if it isn’t still. So the unicorns are (were) real.

[0] https://en.wikipedia.org/wiki/Unicorn_(web_server)

ljahier
0 replies
18h33m

At the moment, all github services seem to be restored, and the github status indicates that the problem is still ongoing. I don't think it's related to the SLA, but rather to the monitoring, which is not live. There are a few minutes of delay.

fragmede
5 replies
19h4m

from where? they don't only have one load balancer, so you'd still have the problem of the page showing green when it's not loading for some folk?

ergocoder
4 replies
18h52m

At Github's scale, why wouldn't they put a ping monitor from every continent at least?

Then, you would show the status based on the continent.

fragmede
2 replies
14h22m

Where on the continent? GitHub is undoubtedly doing blackbox testing internally and has multiple such monitors but that's not going to capture every customer's route to them, leading to the same problem - customers experience GitHub being down, despite monitoring saying it's mostly up. Thus the impass. Even doing whitebox testing, where you know the internals and can this place sensors intelligently, even just for ingress, you're still at the mercy of the Internet.

If a sensor that's basically in the same datacenter says you're up, but the route into the datacenter is down, then what? multiply this by the complexity of the whole site, and monitoring it all with 100% fidelity is impossible. Not that it's not worth it to try, there's a team at GitHub that works on monitoring, but beyond motivation about keeping the SLA up, as a customer, unless you notice it's down, is it really down? In a globally distributed system, downtime, except for catastrophic downtime like this, is hard to define on a whole-site basis for all customers.

laserlight
0 replies
12h56m

100% fidelity is impossible

I don't think anybody asked for 100% fidelity. We are talking about a complete outage that affected at least North America and Europe. If the status page shows green in such a case, its fidelity is around 50%. People expect better from GitHub.

ergocoder
0 replies
10h57m

monitoring it all with 100% fidelity is impossible

This is impossible regardless of how godlike the design is... Nobody is asking for 100% fidelity.

intelVISA
0 replies
15h20m

That would be self-defeating given that it's a Rails app.

bigiain
0 replies
17h32m

To be fair - I really couldn't care less is the homepage is loading or not.

So long as I can fetch/commit to my repos, pretty much everything else is of secondary, tertiary, or no real importance to me.

(At work, I do indeed have systems running that monitor 200 statuses from client project homepages, almost all of which show better that 99.999% uptimes. And are practically useless. Most of them also monitor "canary" API requests which I strive to keep at 99.99% but don't always manage to achieve 99.9% - which is the very best and most expensive SLA we'll commit to.)

purkka
10 replies
19h15m

I have to wonder how a company at the scale of GitHub can be so bad at keeping track of their status.

Now 4 out of 10 services are marked as "Incident", yet most of the others are also completely dead.

xuancanh
4 replies
19h2m

It's because of the way most companies build their status dashboards. There are usually at least 2 dashboards, one internal dashboard and one external dashboard. The internal dashboard is the actual monitoring dashboard, where it will be hooked up with other monitoring data sources. The external status dashboard is just for customer communication. Only after the outage/degradation is confirmed internally, then the external dashboard will be updated to avoid flaky monitors and alerts. It will also affect SLAs so it needs multiple levels of approval to change the status, that's why there are some delays.

ParetoOptimal
3 replies
18h49m

The external status dashboard is just for customer communication. Only after the outage/degradation is confirmed internally, then the external dashboard will be updated to avoid flaky monitors and alerts. It will also affect SLAs so it needs multiple levels of approval to change the status, that's why there are some delays.

This defeats the purpose of a status dashboard and is effectively useless in practice most of the time from a consumers point of view.

consteval
1 replies
4h51m

From a business perspective, I think given the choice to lie a little bit or be brutally honest with your customers, lying a bit is almost always the correct choice.

ParetoOptimal
0 replies
2h13m

My ideal would be if regulations which made it necessary that downtime metrics had to be reported with at most somewhere between a 10m and 30m delay as "suspected reliability issue".

If your reliability metrics have lots of false positives, that's on you and you'll have to write down some reason why those false positives exist every time.

Then that company could decide for itself whether to update manually with "not a reliability issue because X".

This lets consumers avoid being gaslighted and businesses don't technically have to call it downtime.

insane_dreamer
0 replies
4h28m

Liability is their primary concern

x86a
4 replies
19h11m

This is intentional. It's mostly a matter of discussing how to communicate it publicly and when to flip the switch to start the SLA timer. Also coordinating incident response during a huge outage is always challenging.

thiagocsf
3 replies
16h1m

That it may be but there’s no excuse.

Declare an incident first, investigate later.

Cheating SLAs by delaying the incident is a good way to erode trust within and without.

antimemetics
2 replies
13h35m

Declare an incident first, investigate later.

If that would be the best way to deal with it- why is literally no one doing it this way and what does that tell you?

adgjlsfhk1
0 replies
13h18m

because it involves admitting that you messed up which companies are often disensentivized to do

ErikBjare
0 replies
10h15m

False positives?

karmakaze
7 replies
19h20m

I get the angry unicorn page "No server is currently available to service your request. Sorry about that. Please try refreshing and contact us if the problem persists. Contact Support — GitHub Status — @githubstatus" with that last link going to https://x.com/githubstatus showing "GitHub Status Oct 22, 2018 Everything operating normally."

TwiztidK
3 replies
19h10m

The era of Twitter/X status pages needs to come to an end given how unusable it is if you aren't logged in.

blt
2 replies
13h15m

Making logins required to view twitter was the ultimate bed shitting move. The whole point of twitter was to be a broadcast medium. Tweets were viewable without following or logging in. There is a huge vacuum in that space now.

raxxorraxor
1 replies
11h13m

For most (social media) platforms really. Management believes it would force users to sign up, but in reality the platform just becomes less relevant because of that limitation. Not even talking about search crawlers.

An all around stupid decision. That said, if management is that shitty, the platform probably won't be attractive for long anyway.

Facebook/Instagram were successful despite that to a degree, but this decision probably still did a lot of damage to their relevancy and user numbers.

disgruntledphd2
0 replies
10h53m

Facebook/Instagram were successful despite that to a degree, but this decision probably still did a lot of damage to their relevancy and user numbers.

FB/IG/Whatsapp have half of humanity logging into their services once per month, so I'm not sure how much better they could be doing if they didn't have a login wall.

Meanwhile, Twitter (with no login wall) never broke 500mn. Like, personally I totally take your point about status updates but I'd have used my Twitter account a lot more if I'd needed to log in to see the content.

kalkin
2 replies
19h13m

I think this is because logged-out Twitter now shows top Tweets of all time from a user, rather than most recent Tweets.

Good reason why companies shouldn't be using Twitter/X for status updates anymore!

xp84
1 replies
18h54m

Thank you! I was wondering why all I could see was useless content there!

temp0826
1 replies
18h28m

Used to work ops at AWS. I don't know if it's still the case but it required VERY HIGH management approval to actually flip any lights on their "status page" (likely it was referenced in some way for SLAs and refunding customers).

smsm42
0 replies
18h17m

That is an excellent illustration to Goodhart's law. We're going to have this avesome status page, but since if we update it the clients would notice the system is down, we're going to put a lot of barriers to putting the actual status on that page.

Also probably a class action suit lurking somewhere in there eventually.

saul-paterson
1 replies
11h58m

FWIW, our self-hosted Gitea instance has not had a single second of unplanned downtime in five years we've been running it. And there wasn't much _planned_ downtime because it's really easy to upgrade (pull a new image and recreate the container — takes out the instance for maybe 15 seconds late at night), and full backups are handled live thanks to zfs.

Migration to a new host takes another 15 seconds thanks to both zfs and containers.

I don't know how many GitHub downtime reports I've seen during that time, we're probably into high dozens by now.

chrisallenlane
0 replies
6h24m

I've been running Gitea on my homelab for a few months now. It's fantastic. It's like a snapshot of a point in time when GitHub was actually good, before it got enshittified by all of the social and AI nonsense.

I've been moving most of my projects off of GitHub and into Gitea, and will continue to do so.

rvz
1 replies
19h4m

Looks like we have a full house outage at GitHub with everything down. Much worse than the so-called Twitter / X recent speed-bump that was screeched at and quickly forgotten.

I don't think GitHub has recovered from the monthly incidents that keeps occurring. Quite frankly it is the expectation that something will go down every month at GitHub which shows how unreliable the service is and this has happened for years.

I guess this 4 year old prediction post really aged well after all about self-hosting and not going all in on Github [0]

[0] https://news.ycombinator.com/item?id=22868406

dataspun
0 replies
18h51m

statute of limitations for HN comment predictions is 3 years.

RIMR
1 replies
19h18m

Wow, the status page only just now started reporting issues, and it still doesn't seem to communicate the scale of the issue.

People use this page for guidance. I guess now we know how much it can be trusted.

ikiris
0 replies
19h16m

It’s used to ease their comms, not a real time status board pointing at their monitoring.

manquer
0 replies
19h18m

Status page updates with "degraded availability". lol

kinduff
0 replies
19h19m

They are flipping the switches now, status page just changed.

TacticalCoder
0 replies
6h20m

I've never seen an outage this big.

I remember a time when systems would boast about their "five nines" uptime. It was before anything "cloud" appeared.

Lanedo
0 replies
18h59m

Twitter now has:

We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back.

https://x.com/githubstatus/status/1823864449494569023

weystrom
13 replies
19h8m

A reminder of how centralized and dependent the whole industry has become on GH, which is ironic, considering that git itself is designed to be decentralized.

Good opportunity to think about mirroring your repos somewhere else like Gitea or Gitlab.

nox101
6 replies
18h58m

They're already mirrored on my hard drive. That's how git works.

cropcirclbureau
4 replies
18h50m

Github is more than a remote host for git repositories. It's become one of the major CDNs for software distributions. Github Pages host a majority of static sites that developer use. You won't be able to use Cargo, Nix, Scoop and other package managers right now because their registries have a critical dependency hosted on Github.

This is not to mention all the projects that rely on Github for project management, devops, community and support desk.

GitHub is also very international, I doubt isolated netziens like those from China are shielded from this outage. I imagine very, very few software shops are unscathed by this. The whole affair is very on brand for 21st century software which is to say pitiful.

Joe_Cool
2 replies
18h41m

We installed a private GitLab instance on our own servers exactly out of fear that Github might suddenly alter the deal or just cease operations. Pretty happy with our decision so far.

colimbarna
1 replies
16h14m

Do you mean you switched to self-managed GitLab, or you have a self-managed GitLab that you keep around as a backup plan?

Joe_Cool
0 replies
2h33m

Actually both. Our internal closed source projects are only in our GitLab. The open-source stuff is both on GitHub and our GitLab. Since our GitLab instance isn't public we only use the issue tracker on GitHub for public stuff.

Another bonus is that we don't pay Microsoft.

keybored
0 replies
13h12m

- Champion a hard to use VCS which to its credit is distributed

- Make everyone dependent on all the centralized features of your software to use Git[1][2]

- Now you have a de facto centralized, hard to use VCS with thousands of SO questions like “my code won’t commit to the GitHub”

- Every time you go down a hundreds-of-comments post is posted on HN

How to get bought for a ton of cash by a tech mega corporation.

[1] Of course an exaggeration. Everyone can use it in a distributed way or mirror. The problem occurs when you’re on a team and everyone else doesn’t know how to.

[2] I’m pretty sure that even the contributors to the Git project rely on the GitHub CI since they can’t run all tests locally.

Zambyte
0 replies
18h50m

The key difference is being able to mirror communication channels. While you can continue to work fine with your local repo, the only way to share those changes are via another forge, or sending patches through some other channel. Having another forge to distribute code is generally more ideal.

readline_prompt
1 replies
18h58m

these things could happen anywhere though. Gitlab also `rm -rf`ed before remember

jtriangle
0 replies
18h47m

The odds of all services rm -rf / at the same time are pretty small to be honest. The point is to have your work in multiple places, such that you're not reliant on a single service.

turnsout
0 replies
18h47m

I'm taking this opportunity to randomly shout out Gitea! I've self-hosted Gitea for 5 or 6 years, and it has been bulletproof.

layer8
0 replies
18h48m

It's not the whole industry, just some imprudent sections of the industry.

autom4ton
0 replies
18h58m

great point

__turbobrew__
0 replies
17h22m

I’m kind of surprised that gitlab doesn’t have a larger market share given that you can run it air gapped on-prem without too much fuss.

twp
13 replies
19h26m

https://www.githubstatus.com/ reports no problems, but it's clearly down for a lot of people (including me).

tabbott
9 replies
19h25m

It is kinda amazing how consistently status pages show everything fine during a total outage. It's not that hard to connect a status page to end-to-end monitoring statistics...

cortesoft
3 replies
19h23m

There is always going to be SOME delay between the outage and the status page, although 5 minutes is probably enough time where it should be updated

thund
2 replies
19h20m

after several minutes the status page is still showing all is fine.

For a service like GH, anything more than 30 secs is unacceptable

x86a
1 replies
18h51m

That is very unrealistic. Infrastructure monitoring at that scale won't even be collecting metrics at that interval.

And simple HTTP monitoring would be too flappy for a public status page.

aeonik
0 replies
6h53m

What monitoring tools are you using? I know a ton that can do 30 seconds or less at scale. I'm fact, I'm pretty sure all the big players can do that.

frabjoused
1 replies
19h24m

It's simply too soon for the status page to report the anomaly, is my guess. It's been down for 4 minutes.

thih9
0 replies
19h19m

4 minutes is a long time for something that could have been an automated check.

For the record, the status page eventually got updated - around 7 minutes after this submission was created.

owyn
0 replies
16h58m

Once in the past I did actually have an incident where the site went down so hard that the tool that we used to update the status page didn't work. We did move it to a totally external and independent service after that. The first service we used was more flaky than our actual site was, so it kept showing the site down when it wasn't. So then we moved to another one, etc. Job security. :)

blinded
0 replies
19h21m

From my experience this requires a few steps happen first:

- an incident be declared internally to github

- support / incident team submits a new status page entry (with details on service(s) impact(ed))

- incident is worked on internally

- incident fixed

- page updated

- retro posted

Even aws now seems to have some automation for their various services per region. But it doesn't automatically show issues because it could be at the customer level or subset of customers, or subset of customers if they are in region foo in AZ bar, on service version zed vs zed - 1. So they chose not to display issues for subsets.

I do agree it would be nice to have logins for the status page and then get detailed metrics based on customerid or userid. Someone start a company to compete with statuspage.

beefsack
0 replies
12h23m

They say you shouldn't host status pages on the same infrastructure that it is monitoring, but in a way that makes it much more accurate and responsive in outages!

kredd
1 replies
19h24m

It went down literally 3 minutes ago (I was in the middle of writing a PR comment), let's see if their cron job kicks in and reports the issue.

thund
0 replies
19h19m

it's starting to show now, about 10 minutes after the issue started

agosz
0 replies
19h18m

It's showing a few incidents now. Some things are still green though that don't seem to be working.

bitbasher
9 replies
19h24m

The timing is pretty uncanny. I just deployed a github page and had a DNS issue because I configured it wrong. I hit "check again" and github went down.

Hope I don't appear in the incident report.

red_Seashell_32
3 replies
19h6m

Wait. You use github pages for something or actually work on it?

bitbasher
2 replies
19h1m

I use it for something.

I had a github page that was public, but it was made private and the DNS config was removed. Fast forward to today. I made the private repo public again and forced a deploy of the page without making a new commit. It said the DNS config was incomplete, so I tweaked it and hit "check again" and github went down.

Probably unrelated, but the timing was spooky.

paledot
0 replies
16h0m

Your domain isn't `null.example.com` or something, is it?

dang
0 replies
15h31m

Sorry for the offtopicness - would you mind emailing hn@ycombinator.com so I can check in with you about a couple things regarding https://news.ycombinator.com/item?id=41221186?

zombot
0 replies
9h33m

So it was you who crashed GitHub?

theovermage
0 replies
18h53m

Bad bitbasher bad! :catbonk:

sunrunner
0 replies
18h50m

Perhaps this is a repeat of the Fastly incident with a customer's Varnish cache configuration causing an issue in their systems (I think this is a rough summary, I don't remember the details).

So, you're both responsible and not responsible at the same time :)

Hope I don't appear in the incident report.

Appearing in an incident report with your HN username could be pretty funny...

RIMR
0 replies
19h10m

This will all clear up when it finishes checking your DNS configuration I bet.

OutOfHere
0 replies
18h54m

Fwiw, GitHub Pages is down too. The hosted Pages sites are down.

xelamonster
8 replies
19h21m

Love that HN is a better status page for dev services than most companies can manage to provide. Knew I'd find it here but on the front page within 3 minutes is impressive.

rvz
2 replies
19h9m

I guess when GitHub goes down, it is somehow strangely tolerated for years even after the acquisition and goes down more times than Twitter. When the latter encounters a speed-bump, just like the 'interview' with Trump, it's global news because a Mr Elon Musk owns it.

Both seem to be doing too much all at once. But really it is worse with Github if this is what Microsoft stewardship is incidents every, single week and each month guaranteed for years.

Anyways. #hugops for the GitHub team.

mrala
1 replies
18h45m

What makes you say it’s “somehow strangely tolerated” when GitHub goes down?

What’s the point of bringing up twitter? It is strange to seek victimhood for a petulant billionaire. Of course, it is worse with GitHub because GitHub actually provides useful functionality.

rvz
0 replies
17h48m

What makes you say it’s “somehow strangely tolerated” when GitHub goes down?

The same folks complaining about something at GitHub going down are the same people that stay and are willing to tolerate the regular incidents and chaos on the site.

It is the fact that not only the Github incidents have been happening for years, it has gotten worse as there is an incident every month.

Of course, it is worse with GitHub because GitHub actually provides useful functionality.

That isn’t an excuse for tolerating regular downtime for a site with over 100m+ active users, especially with it running under Microsoft stewardship who should know better.

Any other site with that many users and with a horrendous record of downtime like Github would be rightfully branded as unreliable. No excuses.

tmvnty
1 replies
19h15m

HN need to publish their secrets on how they rarely goes down!

peterlk
0 replies
18h56m

Based on what I've read in the past, I believe the secret is simplicity. Simplicity scales

RIMR
1 replies
19h12m

Except that the weird HN algo just saw 187 upvotes in 15 minutes, and dropped this thread to the second page...

leeoniya
0 replies
19h11m

it knows we're all in a voting ring called Github Users

kinduff
0 replies
19h16m

Reminds me of a repository I once found when searching for Prometheus exporters.

It stated this but with Twitter; it will monitor latest tweets searching for a custom word combo and raise a server alert when found. I found it hilarious. Will post the source once GitHub is back on.

dlahoda
6 replies
19h20m

I see more and more people use less Github, but some other git solutions. I am afraid to think what to do when GitHub is down for hours (need to learn maillists?).

Another reason is that MS may be in phase when it will ask to pay for using GitHub just for reads (rate limiter).

curtis3389
3 replies
19h9m

I recently looked into using Git in a decentralized way. It's actually pretty easy!

When you would usually create a PR, you use `git format-patch` to create a patch file and send that to whoever is going to merge it.

They create a branch and use `git am` to apply the patch to it, review the changes, and merge it to main.

It is nice that git supports multiple remotes, though. It feels good to know that `git push` might not work for my project right now, but I know `git push srht` will get the code off of my laptop.

outworlder
1 replies
18h32m

I used to work at a company with very draconian policies. Whenever I needed to update some code on a public GitHub repository, I would just push to a remote that was a flash drive. Plug it in my machine at home, pull from that remote, push to origin.

I also had to setup a bidirectional mirror back when bandwidth to some countries was restricted. We would push and pull as normal, and a job would keep our mainline in sync.

It is sad that most organizations forget that git is distributed by nature. We often get requests to setup VPNs and all sorts of craziness, when a simple push to a bare mirror would suffice. You don't even need anything running, other than SSH.

__float
0 replies
17h21m

Draconian policies...but not security ones? Why were USB drives not blocked?

Gormo
0 replies
5h58m

I recently looked into using Git in a decentralized way. It's actually pretty easy!

Well, that's how it was designed to work! The whole point of Git is that it's a distributed version control system, and doesn't need to rely on a centralized source of truth.

mhh__
0 replies
18h20m

emailing patches is fairly easy.

The real reason not to use github anyway though is that it's terrible (the basic "github model" for doing code review was basically made up on the back of a napkin IMO)

katzinsky
0 replies
15h51m

Git without github is pretty much the same as with it. It's just PRs that are different.

damiankennedy
5 replies
19h19m

Investigating - We are investigating reports of degraded availability for Actions, Pages and Pull Requests Aug 14, 2024 - 23:11 UTC

damiankennedy
4 replies
19h16m

Update - Pages is experiencing degraded availability. We are continuing to investigate. Aug 14, 2024 - 23:12 UTC

damiankennedy
3 replies
19h16m

Update - Copilot is experiencing degraded availability. We are continuing to investigate. Aug 14, 2024 - 23:13 UTC

damiankennedy
2 replies
19h15m

Update - We are investigating reports of issues with GitHub.com and GitHub API. We will continue to keep users updated on progress towards mitigation. Aug 14, 2024 - 23:16 UTC

EDIT: The reply link is no longer available.

Update - Packages is experiencing degraded availability. We are continuing to investigate. Aug 14, 2024 - 23:18 UTC

damiankennedy
1 replies
19h10m

The reply link is now available?

Update - Issues is experiencing degraded availability. We are continuing to investigate. Aug 14, 2024 - 23:19 UTC

Update - Git Operations is experiencing degraded availability. We are continuing to investigate. Aug 14, 2024 - 23:19 UTC

EDIT: The reply link is no longer available again.

damiankennedy
0 replies
19h4m

The reply link is now available again?

Everything is red now. Nearly lunch time in New Zealand.

suyash
4 replies
19h20m

RIP to all those who host their websites on GitHub pages :(

bogwog
1 replies
18h56m

Anybody who publishes an app on the Google Play store and hosts their privacy policy on Github pages may have their app taken down because Google's bots won't be able to verify it exists.

That happened to me a while back with an app listing that was almost 10 years old because the server I was hosting the policy on went down. Ironically, I switched it to Github pages so it wouldn't happen again.

testergrave
0 replies
18h45m

I have my client's app policy on GitHub. I have to check if anything happened to it. The websites are working fine

jacooper
0 replies
18h56m

First time in a while that pages goes down with github itself, it's usually separate from the main site.

iNate2000
0 replies
18h33m

I checked, and my pages-based site was down, but it is back up now.

bamboozled
2 replies
18h47m

How would customer credentials being leaked be part of an outage of this size ?

kgrax01
0 replies
18h46m

If its enough of a security issue they could have pulled the site while its fixed/cleaned

fragmede
0 replies
17h7m

Because there are worse things than being down; if the front page got hacked and is spewing gore or CSAM or PII or creds, for example.

2YwaZHXV
4 replies
19h7m

given that it seems like the entire thing is busted, can anyone explain how the unicorn page is being served?

esrauch
3 replies
18h52m

They probably have a reverse proxy in front of all their http endpoints and that is still up and able to show the unicorn if the backends aren't responsive.

The static content on the error page might also be on akami or cloudflare side.

2YwaZHXV
2 replies
18h32m

makes sense, thanks.

the images on the page are all just base64 encoded right into the html

bill_lumbergh
1 replies
15h31m

https://github.blog/news-insights/the-library/unicorn/

  Unicorn has a slightly different architecture.
  Instead of the nginx => haproxy => mongrel cluster setup
  you end up with something like: nginx => shared socket => unicorn worker pools
  
  When the Unicorn master starts, it loads our app into memory. As soon as it’s ready to serve requests it forks 16 workers. Those workers then select() on the socket, only serving requests they’re capable of handling. In this way the kernel handles the load balancing for us.

2YwaZHXV
0 replies
3h3m

amazing, thanks!

zachdoescode
2 replies
19h0m

I swear even my VSCode intellisense is broken now... Rip to a real one.

aarkay
1 replies
18h54m

yep very strange. You can disconnect from Wifi to get it to work. Vscode probably keeps pinging github/microsoft before every operation.

aarkay
0 replies
18h51m

You can also disable telemetry and that seems to work too. Settings-> search for telemetry and select "off" from the telemetry dropdown.

rhabarba
2 replies
19h12m

I love how the same people who try to drag me towards using Git are the only people who seem to have serious problems working on their code when a website goes down.

waveBidder
1 replies
19h6m

Git is not the same thing as github. It's designed to be decentralized, even if it isn't getting used that way atm

rhabarba
0 replies
19h2m

I am quite familiar with the basic functionality of Git. However, I am always amused by how it works in practice.

martins_irbe
0 replies
19h22m

would be better if it's down too :D

Ygg2
0 replies
19h19m

It's yellow/red now.

j3s
2 replies
19h22m

for everyone complaining about the status page - status pages are normally operated by hand by design, and will rarely reflect things in real-time.

give the poor github ops folks a second to get things moving.

manquer
1 replies
19h5m

Most status page products integrate to monitoring tools like Datadog[1], large teams like github would have it automated.

You ideally do not want to be making a decision on whether to update a status page or not during the first few minutes of an incident, bean counters inevitably tend to get involved to delay/not declare downtime if there is a manual process.

It is more likely the threshold is kept a bit higher than couple of minutes to reducing false positives rates, not because of manual updates.

[1] https://www.atlassian.com/software/statuspage/integrations

xyzzy_plugh
0 replies
18h48m

Nah, _most_ status pages are hand updated to avoid false positives, and to avoid alerting customers when they otherwise would not have noticed. Very, very few organizations go out of their way to _tell_ customers they failed to meet their SLA proactively. GitHub's SLA remedy clause even stipulates that the customer is responsible for tracking availability, which GitHub will then work to confirm.

flkiwi
2 replies
19h16m

Me: I think I'll update nixos*

Nix: barfs voluminous errors I've never seen before

Me: whaaaat the farrrrk

* nixos updates are pulled from a github repo

dlahoda
0 replies
19h7m

Yeah, need more caches and backup git links (including local clones).

Also they had IPFS attempts, but not finished.

arianvanp
0 replies
8h28m

They're pulled from our CDN by default. Only if you use experimental flakes is GitHub in the loop. And even if GitHub isn't down you can't pull nixpkgs more than twice per hour without running into rate limits and get your IP banned. Don't rely on GitHub for critical infrastructure.

https://github.com/NixOS/nix/issues/6975

willchen
1 replies
19h5m

I'm wondering why this isn't on the front page? It has a lot of points in 23 minutes.

mvdtnz
0 replies
18h51m

HN has a strange philosophy built into its ranking algorithm that an item with a large number of comments early on should be de-ranked because the conversation is likely to be of poor quality.

waveBidder
1 replies
19h23m

I wonder how much it would've taken to keep github running without Microsoft buying them and running them into the ground like this.

kagevf
0 replies
19h20m

I wonder if MS long term plans to have both GH and Azure DevOps (the source code management part) ...

startages
1 replies
19h12m

It's 00:16, just about to go to bed, I ran `git push` and it's not working. Check Github, says it's down, I think it's only me, maybe I'm blocked, Github can't be down. Come here to check and it's down for everyone, such a relief.

testergrave
0 replies
18h44m

Really it was a relief. Same case for me. Now I have no energy to push my code. Tomorrow maybe

pietroppeter
0 replies
10h56m

I should have looked for this before posting the same comment. Upvoted :)

jrop
1 replies
18h54m

A coworker and I just had to use `git format-patch` and `git am` to exchange work. Git is super cool!

remram
0 replies
18h46m

`git bundle` is another option for this (I'm not trying to imply it's preferable)

josvdwest
1 replies
19h9m

Would this explain why "npm install next-sanity" doesn't work properly, or am I hitting a user error?

count_countules
0 replies
18h57m

could be if the package is hosted on github

exfil
1 replies
19h5m

I am happy that my project is pushed also to codeberg.

lucb1e
0 replies
18h39m

And has a website so anyone could just ask me if something went wrong on github's side and I can send them a complete copy. Decentralised version control is nice!

dghlsakjg
1 replies
19h7m

https://www.githubstatus.com/

This is a pretty good place to check. The lag is pretty minimal traditionally.

At the time of posting everything is broken.

croemer
0 replies
19h2m

Not sure what you mean by minimal lag. The status page showed all green for at least 10min while everyone got unicorns.

cropcirclbureau
1 replies
19h10m

There goes Pages, there goes the CDN for release artifacts, there goes any package manager hosting repositories on GitHub. Is this outage just contained to github or is it an Azure outage?

cwilby
0 replies
18h46m

It looks to be contained to just GitHub, azure service page shows no outages at this time.

Khaine
1 replies
19h20m

Down in Australia as well

ralusek
0 replies
19h5m

Down under?

Dibby053
1 replies
19h16m

Feels bad to have one's job interrupted. Looking on the bright side this is the excuse I needed to check out Radicle...

dataspun
0 replies
18h58m

checked out radicle: doesn't do windows

zacym
0 replies
18h50m

Aug 14, 2024 - 23:29 UTC Update - We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back.

wdb
0 replies
18h52m

Feels like Github is down more often than Gitlab

wavemode
0 replies
18h49m

I sure wish this had happened before I logged off from work for the day...

"Why isn't this project done yet?"

"Didn't you hear? GitHub is down!"

and I get to go out for a long lunch

vivgui
0 replies
19h18m

Down in the Dominican Republic as well, was just trying to commit and end my day

upbeat_general
0 replies
19h23m

Down for me as well. Thought my SSH agent was broken.

unicorner
0 replies
18h55m

I've never seen such a serious outage before. Even GitHub Pages hosted sites aren't accessible.

uknownuser
0 replies
19h9m

Yep

timetraveller26
0 replies
19h5m

Its all down according to https://www.githubstatus.com/

  Update - Issues is experiencing degraded availability. We are continuing to investigate.
  Aug 14, 2024 - 23:19 UTC
  Update - Git Operations is experiencing degraded availability. We are continuing to investigate.
  Aug 14, 2024 - 23:19 UTC
  Update - Packages is experiencing degraded availability. We are continuing to investigate.
  Aug 14, 2024 - 23:18 UTC
  Update - Copilot is experiencing degraded availability. We are continuing to investigate.
  Aug 14, 2024 - 23:13 UTC
  Update - Pages is experiencing degraded availability. We are continuing to investigate.
  Aug 14, 2024 - 23:12 UTC

theteleporter
0 replies
19h7m

Wtf, thought it was me alone!

that_other_one
0 replies
19h0m

Welp, that’s as good a time as any to call it a day!

Good luck to the devs and dev-analogues involved in getting the ship righted.

th3w3bmast3r
0 replies
19h5m

This is my first time seeing the angry unicorn! Hopefully it’ll be gone soon :(

sweca
0 replies
19h19m

This will have a fun post mortem

stogot
0 replies
19h12m

Wonder if they’ve had worse uptime after moving to Azure

shishirraven
0 replies
19h10m

Yes, github is not working right now

shcheklein
0 replies
18h53m

Seem to be up again, I also wonder what is was.

shashankkoppar
0 replies
18h58m

We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back. hope it is back up soon

shashankkoppar
0 replies
18h58m

We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back. hope it comes back soon

sergiogjr
0 replies
18h57m

Yep, angry unicorn. If the copilot debacle wasn't reason enough to make people migrate or diversify the code repo efforts with, let's say, GitLab, this should.

rvz
0 replies
19h19m

And so goes all your packages, private repositories, pages, AI intern copilot bot and Github Actions; and soon your AI models once you host them there - all being unavailable and going down with GitHub.

Time to consider self-hosting like the old days instead of this weekly chaos at GitHub.

runnr_az
0 replies
19h25m

down in phx az

robotdragonfire
0 replies
18h54m

it crashed the second i opened this old plugin for blockbench

robotdragonfire
0 replies
18h53m

it crashed the second i opened a github rep for an old plugin for blockbench

robertclaus
0 replies
19h7m

Sure looks like it!

readline_prompt
0 replies
18h52m

and... we're back, at least in Japan region

rafamvc
0 replies
19h24m

down for me

quincepie
0 replies
19h4m

Yet again, this shows how useless GitHub status page is.

qianli_cs
0 replies
18h59m

"We suspect the impact is due to a database infrastructure related change that we are working on rolling back."

peterlk
0 replies
19h2m

Can't wait for the writeup! So many services down at once... Something very interesting must have happened

packetlost
0 replies
4h53m

Wonder how much of this is to blame on copilot generated code lol

omoikane
0 replies
18h49m

Maybe it's fixed already? Works for me.

neuronexmachina
0 replies
18h58m

Latest update at 23:29 UTC says: "We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back."

neoyagami
0 replies
14h41m

and I was in the middle of commiting a hot fix D:, I had to push the image directly to the registry D:

mhh__
0 replies
18h22m

This reminds me that for some reason I am logged into my gaming machine's windows store with my GitHub account thanks to the bizarre way that microsoft do auth.

maximilianroos
0 replies
19h16m

Even GitHub-hosted Pages are down — https://prql-lang.org/ is also a unicorn

martins_irbe
0 replies
19h23m

just wanted to do one last final commit :D good timing

low_tech_punk
0 replies
18h59m

All the AI-native developers are twiddling their thumbs because Copilot is out of office.

livefish
0 replies
19h1m

Imagine everyone having their site on gh pages, and artifacts for OS startup on here. Hello, NixOS...

livefish
0 replies
19h0m

Imagine everyone having their site hosted on gh pages. Now imagine artifacts for system update on github. Hello NixOS filesystem!

kimboox
0 replies
19h1m

Time to go out and see people

kgrax01
0 replies
18h49m

Back online for me

keyle
0 replies
19h9m

Yes it went down about 5 mins ago, I got the angry unicorn. Since then the status page is increasingly red.

Seeing it all kind of went sideways at the same time, my money is on the typical load balancer config rollout snafu.

      "As part of a routine configuration deploym..." [splat]

java-man
0 replies
19h23m

The status page should have a button "Report Outage".

inmanturbo
0 replies
18h52m

They were manually ran a hot patch on the distributed production database and forgot to use a transaction

inmanturbo
0 replies
18h51m

They manually ran a patch query on the distributed production database and forgot to use a transaction

gregors
0 replies
18h32m

curious if their layoffs last year had the intended impact

gray_-_wolf
0 replies
19h22m

The macho unicorn is kinda dope though.

goranmoomin
0 replies
18h51m

Seems like sites based on GH Pages were down, but are back up (i.e. the Rust blog).

goranmoomin
0 replies
18h49m

It feels so wrong that there are so many blogs and websites that are based on GH Pages and they all died at once…

Seems like they’re back up though. Or at least the Rust blog is back up.

ghostbk24
0 replies
19h2m

on X @githubstatus seems to be getting regular updates / automated messages around impact

frabjoused
0 replies
18h50m

Seems to be back online.

erksa
0 replies
18h51m

The mobile app on iOS is a 503 with

```

Received a 503 error. Data returned as a String was: <!DOCTYPE html> <!- -

Hello future GitHubber! I bet you're here to remove those nasty inline styles, DRY up these templates and make 'em nice and re-usable, right?

Please, don't. https://github.co...

```

That's where it's cut off on my screen.

Curious what the link is :)

I like to think, someone did.

elashri
0 replies
19h23m

I think it maybe global but at least it is down in US-East (for sure).

dylanz
0 replies
18h59m

Who's the Bozo Doofus maintainer? https://yhbt.net/unicorn/LATEST. I love that we can still see Unicorn in action. I rarely had problems with it back in the day.

dataspun
0 replies
19h14m

down! unicorn!

fatal: unable to access: 502

cli, web, and iOS app :-/

dataspun
0 replies
19h16m

Thank goodness for HN status reports.

darth_aardvark
0 replies
19h26m

Sure is.

darkangelstorm
0 replies
6h8m

Services that explicitly needed the API were also down, and it wasn't pretty. For example: Minecraft Mod packs that rely on SerializationIsBad all went kerplunk! I'm sure a lot of people were scratching their heads yesterday wondering why they couldn't do anything for a time.

What made me laugh though was when the "X is functioning normally" immediately followed by "X is degraded, continuing to monitor" messages that kept popping up then right back to "normal" again, all in the same 30 second timespan... made me giggle

croemer
0 replies
18h59m

Cause seems to be database related per most recent update (23:29 UTC):

We are experiencing interruptions in multiple public GitHub services. We suspect the impact is due to a database infrastructure related change that we are working on rolling back. Aug 14, 2024 - 23:29 UTC
c23gooey
0 replies
19h5m

Status page showing a complete outage now

bijant
0 replies
19h18m

time to go to bed then. Wasn't getting any useful work done any more anyways...

big-green-man
0 replies
19h24m

In the last 5 minutes too, wow.

bangaladore
0 replies
19h14m

And other services like copilot...

[error] [auth] Response content-type is text/html; charset=utf-8 (status=503)

aag01
0 replies
18h52m

back up

PoignardAzur
0 replies
13h26m

This is weird. I've been using Github all night (in France) and didn't notice anything was wrong. Was the outage in North America?

ICHx
0 replies
18h50m

back to live

GDTang
0 replies
19h1m

totally down -.- Cannot access from Hong Kong OMG...

GDTang
0 replies
19h1m

totally down -.-

Flop7331
0 replies
19h22m

So that's why I haven't heard back on my applications

EADDRINUSE
0 replies
19h19m

Things seems to be ack'ed: ``` Investigating - We are investigating reports of degraded availability for Actions, Pages and Pull Requests Aug 14, 2024 - 23:11 UTC ```

EADDRINUSE
0 replies
18h39m

GH Ops team be like

Senior: Ah found it! Let's just rollback one revision on the db. Newguy: let me fix this! `kubectl rollout undo ... --to-revision=1` Newguy: Ok, Started rollback to revision one! Senior: Uh-oh..

Duende1
0 replies
18h53m

Down in Vancouver, Canada

BenjiWiebe
0 replies
19h11m

What'll it be this time? DNS or BGP?

BeefySwain
0 replies
19h1m

Interesting that CoPilot is down as well. I would have assumed it was really only part of GitHub as a branding/marketing thing.

1vuio0pswjnm7
0 replies
18h31m

Github mirrors?

1attice
0 replies
18h53m

...and we're back