Someone else posted about PDX02 going down entirely[0], so sounds like this is the root cause, especially with the latest status update.
> Cloudflare is assessing a loss of power impacting data centres while simultaneously failing over services.
> [0]: Looks like they lost utility, switched to generator, and then generator failed (not clear on scope of Gen failure yet). Some utility power is back, so recovery is in progress for some portion of the site.
[0]: https://puck.nether.net/pipermail/outages/2023-November/0149...
When I worked there (3+ years ago), if PDX were out then "the brain" was out... things like DDoS protection was already being done within each PoP (so that will be just fine, even for L3 and L7 floods, even for new and novel attacks), but nearly everything else was done with the compute in PDX and then shipped to each PoP as configuration data.
The lifecycle is: PoPs generate/gather data > send to PDX > compute in PDX > ship updates / data to PoPs.
If you take out PDX, then as so much runs on fresh data, it starts getting stale.
I doubt everything has changed since then, so this is unlikely just "API down" and more likely that a lot of things are now in a degraded state as they're running on stale information (no update from PDX)... this includes things like load balancing, the tiered caching (Argo Smart Routing), Warp / Zero Trust, etc.
Even if it were only "API down", then bear in mind that a lot of automation customers have will block attacks by calling the API... "API down" is a hell of a window of opportunity for attackers.
Note that just before I'd left they'd been investing in standing up AMS (I think) but had never successfully tested a significant failover, and the majority of services that needed fresh state did not know how to do this.
PS: :scream: most of the observability was also based in PDX, so hugs to all the teams and SREs currently running blind.