return to table of content

Cellular outage in U.S. hits AT&T, T-Mobile and Verizon users

jader201
86 replies
1d1h

Something I'm not seeing discussion on:

What is/were the cascading effects of this, particularly for drivers?

Many people in buildings were unaffected, as they could fallback to wifi. But I imagine this had a pretty broad impact to drivers.

Just a few things I can think of:

- Packages delayed (UPS, FedEx, Amazon, truck drivers, etc.) for drivers that relied on their phone's mapping apps to get them to their deliveries

- Uber/Lyft/taxi/etc. drivers not able to get directions to their pickups/dropoffs

- Traffic worsened because drivers weren't able to optimize their routes, or even get directions to their destination

Maybe larger companies have their own infra for this, or have redundancy in place (e.g. their own GPS devices)?

I'm curious to hear thoughts on whether these (and others) were impacted, or if there are ways they're able to get around this.

Also, unrelated to drivers, I can imagine there is/was a higher risk of not getting treated for emergencies due to not being able to make calls (I'm not sure whether/how emergency calling was impacted).

standardUser
33 replies
1d1h

Worth noting that GPS does not rely on cell service.

Scoundreller
10 replies
1d1h

The routing does though.

I have Google offline maps downloaded for areas I end up in just in this case. Gotta do traffic rerouting the old fashioned way though.

Or have an old-school GPS map thingy in your glovebox.

(Also have kiwix and a whole archive of Wikipedia on my phone).

I wonder if meshtastic communicators sales took off during this. How’s LoRa traffic these days?

freedomben
1 replies
1d1h

Yes, much that we think of as Google Maps relies on API calls made to the backend. Plus this assumes that you downloaded the offline maps ahead of time, which in my anecdotal experience is not something that most people really consider. GMaps does (or did at one time at least) have a neat feature of auto-downloading your home area map but, the one time I needed it, it didn't work.

Scoundreller
0 replies
1d1h

which in my anecdotal experience is not something that most people really consider

Thankfully I’m in Canada where it’s not impossible to end up in the sticks with no service.

Chewing through your handful of gigabytes/month of data wasn’t hard. Only in the past year or so have double digit gigabyte/month data plans become cost-effective.

And our roaming prices are extortionate, so for jaunts over the border (or internationally), I’ll sometimes go “naked”.

a_gnostic
1 replies
1d

Carriers have mapping independent of networks. Drivers keep personal GPS too. You would lose traffic and road conditions, I guess, but nothing proper trip planning wouldn't cover.

Scoundreller
0 replies
1d

Drivers keep personal GPS too.

Do they? I know there are a lot of old units out there but I figure people would have tossed them.

At least I’ve found Waze has been pretty good at starting off with wifi and loading the map of the whole journey after coverage was lost with some resilience for stops/detours.

treflop
0 replies
22h13m

I am consistently in areas with zero cellular service and I’m reasonably sure Google Maps will route offline. At least, I’ve never switched to another mapping app because I couldn’t route — it’s more usually because Google Maps is more primitive areas is kind of detail-less.

But even if it doesn’t, there are a ton of offline map apps that use OpenStreetMap data.

steelframe
0 replies
20h34m

Or have an old-school GPS map thingy in your glovebox

You can also install Organic Maps on your phone.

ianburrell
0 replies
1d1h

Google Maps does offline routing. It doesn't do traffic routing but updating routing is better than nothing.

ezfe
0 replies
23h22m

Apple Maps has offline navigation with historical traffic included

bombcar
0 replies
1d1h

The "Here" app or whatever it is called did offline maps and offline routing decently enough. It wasn't perfect, but it worked for "here to there", even if it didn't find the best possible route.

01HNNWZ0MV43FF
0 replies
23h9m

I've been using this on Android for a couple years and love it: https://en.wikipedia.org/wiki/Organic_Maps

You click a few buttons to download OSM tiles and then it does routing. The latest OSM even has a decent amount of stores, restaurants, etc., listed.

offmycloud
7 replies
1d

GPS on most cell phones uses data connection to download current satellite data in order to decrease the time from cold start to GPS lock. Lack of cell or WiFi can cause GPS to take 5-15 minutes to "search the sky" and download satellite data via low bitrate channel under poor signal conditions.

https://en.wikipedia.org/wiki/Assisted_GNSS

Edit: You can think of it as a CDN for the GPS almanac.

dfadsadsf
3 replies
21h25m

That’s just not true for modern phones. I use iPhone on hikes without cellular connection and GPS lock is instantaneous. Organic Map app is great for hiking.

AlotOfReading
1 replies
21h7m

You're talking about something very different called a hot start. The GP is discussing time to fix in a cold start scenario. You'd only see this on a phone that had been powered off for months, or "teleported" hundreds of miles away. In this scenario the receiver has to download the new time, new ephemeris data, and a new almanac (up to 12m30s in the worst case) before it can fix. Depending on the receiver, there may also be a delay of several minutes before it enters cold start mode.

If the receiver has recently (last few days) gotten a fix and hasn't moved too much from that fix, it'll be in at least warm start mode. It still needs to download ephemeris data, but this usually takes 30ish seconds to fix.

If the receiver has seen a fix very recently (last few hours) or a recent network connection, it can fix from hot start like you saw, which only takes a few seconds and may not even be observably slow depending on how the system is implemented. Phones go to great lengths to minimize the apparent latency.

Tempest1981
0 replies
11h44m

You reminded me of my first GPS, which connected to a laptop via RS-232:

https://www.bevhoward.com/TripMate.htm (Not me)

Back then, just getting a GPS fix at all was exciting. Then driving around with it propped on the dashboard or rear window.

rezonant
0 replies
20h56m

The sibling response here covers all of the points I would say. Scott Manley has a nice video covering the history of GPS and how it works, well worth a watch https://www.youtube.com/watch?v=qJ7ZAUjsycY

It's not as simple as you think.

sobriquet9
2 replies
23h3m

From cold start. Most starts are not cold. The phone knows where it is, approximately, what time it is (within a second or so, from built-in RTC), and orbital parameters of the satellites overhead (maybe without the latest corrections).

My Garmin watch gets a GPS lock in way less than 5 minutes without any cellular connection.

rany_
1 replies
5h40m

Actually, your Garmin probably gets A-GPS data uploaded to it via the app.

I think that because Huami/Amazfit/Xiaomi smartwatches already do that. We know this from reverse engineering efforts in Gadgetbridge, but support for Garmin is still new and so there isn't as much info about it; either way it probably works in the same way.

scarface_74
0 replies
2h58m

My first GPS Garmin watch used for running back in 2011 didn’t have an app and didn’t have any cellular signal. I put it on my wrist and started running. I don’t remember it taking more than a minute to get a signal.

fhdkweig
6 replies
1d1h

agreed. On Google Maps app, there is a feature called "offline maps" which allows a user to select a rectangle on a the map and download all the street info inside it. A whole US state can fit in less than a few hundred megabytes. I have all the city I live in downloaded so I can go on walks without needed to use my data plan.

jader201
3 replies
23h35m

That's assuming you have it on and updated before you hit the road.

I think it's off by default, and I'm guessing most people haven't thought to turn it on, or are even aware of it.

patmorgan23
2 replies
23h27m

I'm pretty sure maps caches the data around you if you've used it somewhat recently. It saves Google bandwidth too.

jader201
1 replies
23h9m

I’m not so sure.

Anecdotally, I’ve made it to a remote destination using Maps, then hopped back in the car an hour later (with no signal), and it couldn’t load anything. This seems to happen quite often.

Scoundreller
0 replies
21h57m

Maps used to expire after 30 days (no idea why), and the auto-updating while on wifi wasn't great unless you were in the app forcing it update. Nowadays they last 365d.

a_gnostic
1 replies
1d

Not as useful as back when google maps required a 5GB download IIRC

herbst
0 replies
11h57m

Downloaded a whole island yesterday. Was 40mb. That's a lot better now, has less resolutions packed in tho.

LeifCarrotson
5 replies
1d1h

Google Maps and now Apple Maps (as of ~6 months ago) have offline maps, but not by default. If you enable and download them for your area of interest you can use a subset of the normal app.

I make sure to have this around my usual area and anytime I travel to an area with poor coverage, plus my Garmin watch has offline maps and GPS everywhere, but this is not typical.

OSMand usage is even less common.

tnel77
3 replies
1d1h

Offline maps are a life saver in areas with bad coverage. One of the first things I setup for a new phone or when I’m headed somewhere new on vacation.

freedomben
2 replies
1d1h

This is one of the most interesting differences I often notice between users who rarely leave the city and those who routinely leave. Offline functionality often seems unnecessary at best and absurd at worst to the former group, while the more rural/remote the person the more they value offline functionality. For the most extreme example, talk to the average person who lives outside of Anchorage or Fairbanks in Alaska, and they only really care what the app can do when it's offline as that is it's assumed status when on the go (disclaimer: I moved out of alaska a little over 5 years ago so things might have changed somewhat).

tnel77
0 replies
1d

I grew up in a rural area and lived in Colorado for a while. Going home or venturing into the mountains often resulted in bad service so it just became second nature. Good observation!

ghaff
0 replies
1d1h

Yeah, if I'm going to travel internationally or if I'm somewhere I know I'll have spotty cell service, I'll download maps. I should probably be better about doing it in local areas where I "assume" things will be fine.

maxerickson
0 replies
1d1h

Lots of people dislike the design choices in OSMAnd, so it's worth mentioning that there are lots of apps that use OSM data and provide offline maps and routing.

rpd9803
0 replies
22h56m

worth noting that without cell service, GPS can reliably give you time, lat, long and elevation. So if previously you had no actual map downloaded, or an old or out of date map, you'd just get a pretty accurate dot on an inaccurate map, or just raw coordinates.

panarky
15 replies
1d1h

Also can't login to sites that require SMS 2FA.

danesparza
10 replies
1d1h

Well ... let's be honest: SMS 2FA shouldn't be a thing.

TOTP or stronger, please.

SpaethCo
6 replies
23h35m

TOTP or SMS, it's just another text password you're entering in that's fully phishable.

TOTP just "feels" more secure.

jjav
2 replies
22h38m

No, TOTP is far more secure because it has no dependence on a third-party who can mess up in many ways (Denial of service like in this case by being unavailable, Impersonation by allowing SIM swaps or intercepting messages directly).

You fully control how to store the TOTP seed and how you compute the value, so it is far more secure.

Yes, it can be phished if you fall for that, but it removes several attack vectors.

SpaethCo
0 replies
20h28m

Yes, it can be phished if you fall for that, but it removes several attack vectors.

How was the first factor (the password) compromised?

Assuming the user is using site-unique passwords, in 99% of cases where an attacker obtains a functional password they can get at least one TOTP code or the seed in the same manner. (ie, if I can steal your password DB, odds are pretty good for me stealing your TOTP seed DB as well.)

The outcome of a single successful authentication is a longer-lived session cookie. Once an attacker has that they can reset your creds (usually just requiring re-entering the password) and the account is theirs.

IMO, the only 2nd factor that matters are those that mutually authenticate like PassKeys / FIDO keys.

Scoundreller
0 replies
21h54m

You fully control how to store the TOTP seed

Sorta. The seed still needs to be issued to you in some way.

throwway120385
1 replies
23h16m

TOTP is more secure in that you can't be simjacked by someone impersonating you in the cell phone store.

SpaethCo
0 replies
20h35m

That's assuming your attacker already has your password, or the service allows SMS password reset. (thus negating the second factor. Essentially SMS becomes the only factor.)

ezfe
0 replies
23h19m

SMS 2FA is a code that you're entering from a phone number. The "risk" is that your phone number can be ported without your permission, and then someone else can get the code.

TOTP is more secure because it isn't tied to a phone number. You're right that it's still phishable but that's not the point.

In both cases, the primary benefit to the general population is to have a rotating credential that, if one website is hacked, is useless on another website.

zackkitzmiller
0 replies
5h24m

SMS 2FA and TOTP aren't mutually exclusive are they?

TOTP -> Time-based One Time Password SMS -> Delivery mechanism.

You can deliver TOTP over SMS.

Obviously, SMS shouldn't be used, but I was under the impression that the code generation mechanism and the code generation algorithms are completely disparate concepts.

steelframe
0 replies
20h27m

TOTP or stronger, please

One of the biggest weaknesses with TOTP apps I've tried using is that you have to remember to transfer them to a new phone before you get rid of your old phone. I once got locked out of a domain registrar because I set up TOTP on an old phone many years back. That was long gone by the time I wanted to do something with my domain.

TOTP is fine, but always give me recovery codes I can print and out and keep with my other important documents. Too many services don't do that.

Johnny555
0 replies
21h6m

Isn't that what passkeys are suppose to be? Better and stronger than passwords with TOTP?

giancarlostoro
2 replies
1d1h

This really shouldn't be the only way to verify its you if its going to prompt you every single time.

panarky
1 replies
15h24m

Tell that to $8 trillion Schwab and $12 trillion Fidelity.

giancarlostoro
0 replies
6h57m

If you are their direct customer you should just keep emailing support as much as possible.

cmg
0 replies
23h21m

My service came back around 1:30PM in Connecticut. Data and calls are working fine. I requested a 2FA code at 2:30 from a service that only offers SMS. An hour and a half later, I still haven't gotten it.

sandworm101
7 replies
23h45m

> Traffic worsened because drivers weren't able to optimize their routes

I'm not sure that is a thing. The vast majority of drivers are on familiar routes and are not navigating via electronic means.

Better question: How are the autonomous cars doing? Are they parked by the side of the road unable to navigate without cell coverage.

taurath
4 replies
23h41m

It only takes a few people missing an exit and swerving to create a bunch of traffic. So many people are used to not navigating manually anymore I can’t imagine it doesn’t have a big effect.

sandworm101
3 replies
23h21m

I have a 20-mile commute. I used my phone on day one at the new job, then never again since. It just isn't worth the effort for a road I've driven literally hundreds of times before. Do people also use google maps to get them from their front door to their garage? From the grocery store to wherever they parked their cars?

jader201
0 replies
23h4m

It depends on your daily commute.

If I need to drive 20 minutes with most of it on the expressway, and they’re prone to accidents and there are multiple viable routes, I’m 100% going to load it up on Maps every trip, if it will save me being delayed 10-60 minutes every few weeks.

But if I’m going mostly backroads, probably not worth it, since you can more easily go around accidents, and they’re less common.

But again, I’m guessing more city expressway commuters use navigation daily than you think.

dgacmu
0 replies
22h5m

I do. Mostly from curiosity about which way Google will suggest I go; sometimes because the traffic or road-closure awareness is useful. Though it's often the case that I know about the road closures before it does -- but sometimes it surprises me in a pleasant way.

calfuris
0 replies
19h41m

I use it every day. There are two roughly equivalent paths that I could take, so I use it for information on traffic conditions, and then I leave it running on the off chance that it might route me around a slowdown that wasn't present at the start of my commute.

jader201
0 replies
23h38m

The vast majority of drivers are on familiar routes and are not navigating via electronic means.

I've been rerouted due to an accident many times, and I've seen the detours get backed up because of people taking more optimal routes (without traffic being redirected via other means).

I'd be curious to see more data on it, but I would speculate it's less than the "vast majority".

Better question: How are the autonomous cars doing? Are they parked by the side of the road unable to navigate without cell coverage.

Yeah, that falls under my point about Uber/Lyft/taxis. I would speculate there is broader impact from those vs. autonomous cars (that are probably still relatively uncommon).

bobthepanda
0 replies
19h50m

ups is known for optimizing route planning for time and fuel efficiency purposes, particularly biasing towards three right turns instead of a left

steelframe
5 replies
20h37m

Traffic worsened because drivers weren't able to optimize their routes

This might explain a huge random traffic jam I hit in the middle of my town this morning.

I had no idea any kind of an outage was happening because I've intentionally scaled back my dependence on my phone. I always used to automatically pull up Google Maps to navigate no matter how short the trip. At some point I realized I was losing my ability to travel without being completely dependent on some company tracking my location and telling me what to do, so as part of my phone de-Googlification I switched to Organic Maps. And even then I try to navigate on my own without any GPS assistance as often as possible. I feel like navigating is a skill you can actually lose if you don't practice doing it.

After running an errand across town this morning, I decided to try getting back home via the biggest arterial through the city that I know about, and I immediately hit a huge westbound backup stretching at least a mile. It was a total standstill. I peeked ahead trying to see if there was some kind of accident or something and didn't see anything. Everyone was just sitting in this traffic jam, and I couldn't for the life of me figure out why.

I immediately flipped a u-turn and went 3/4 of a mile north to another westbound road I knew about. That one was completely clear of any traffic at all, and I was able to drive the speed limit all the way back.

The most-used navigation apps I know of suggest alternate routes when there's congestion, so why were all those people just sitting there in that jam while a parallel road less than a mile away was clear? Maybe it was this cascading effect of too many people conditioned into being told what to do by their phones while their phones couldn't tell them to take the other route.

bobthepanda
3 replies
19h51m

there is also a big of fear, that if you go to an alternate route it may also be congested, and it will all have been for nothing.

radio is pretty good with traffic news, but how many people would even think of local radio?

pests
1 replies
17h31m

I think many many people would check local radio, not everyone lives on their phone or in a tech bubble.

vel0city
0 replies
14h16m

Many people don't have a way of listening to local radio outside of their car. Loads of people don't even have a way to watch OTA TV.

steelframe
0 replies
19h47m

if you go to an alternate route it may also be congested, and it will all have been for nothing

Yeah, I get that there can also be a bit of a sunk cost thing along with regret minimization going on too. I think game theory suggests that you should switch routes the instant you hit significant congestion though, because P(congestion on the current route)=1 as soon as you hit it.

justsomehnguy
0 replies
16h32m

Maybe it was this cascading effect of too many people conditioned into being told what to do by their phones while their phones couldn't tell them to take the other route

There is a lot of people who couldn't navigate to a neighboring street without a direct directions even if their life depended on it.

Add to that what the most people doesn't have a slightest idea where are they, where are the cardinal directions and what they need to get from point A to point B.

dktalks
5 replies
1d

If you use Google Maps, it will automatically prompt you to download a map of the area if there is known poor coverage. It also has automatic (?) local maps.

Scoundreller
4 replies
1d

One beef of mine with Google’s offline maps is that they’re only driving maps, and not walking/transit/cycling maps. Obviously you can kinda figure out walking paths anyway, but since I’m sometimes travelling without roaming access, it’s unfortunate.

giantg2
2 replies
22h31m

I image it would be hard do transit maps if you weren't connected to get the schedule.

Scoundreller
1 replies
22h24m

They already get them somehow while "online". Offline with beginning & end times and a rough idea of frequency should be good enough for local use.

Offline road maps are subject to construction/seasonal/holiday route closures/deviations too, and so is transit.

giantg2
0 replies
22h2m

Road closure tends to be much more rare. Transit is much more variable in the US.

steelframe
0 replies
20h34m

Have you tried Organic Maps for walking or cycling routes?

Scoundreller
5 replies
1d

Traffic worsened because drivers weren't able to optimize their routes, or even get directions to their destination

During Canada’s Rogers outage in 2022:

In Toronto there was some dependency on Rogers. One quarter of all traffic signals relied on their cellular network for signal timing changes. The Rogers GSM network was also used to remotely monitor fire alarms and sprinklers in municipal buildings. Public parking payments and public bike services were also unavailable.

https://en.m.wikipedia.org/wiki/2022_Rogers_Communications_o...

As it was summer, I recall some park programming for kids had to be cancelled because the employees were required to have a phone capable of calling 9-1-1 (but sounds like that at least still worked here)

bostonwalker
4 replies
22h24m

This is putting it mildly. The Interac network went down and no one could use their debit cards nationwide.

Scoundreller
2 replies
22h3m

I'd put that on Interac single-homing itself without redundancy.

Their ops are critical enough you'd expect better from them.

Not the kind of shortcut Canadian banking takes for core stuff.

tharkun__
1 replies
20h50m

The funny thing there is that Interac "did" have redundancy i.e. another network provider to fall back onto.

Unfortunately they failed to notice that this was a reseller for Rogers lines.

Scoundreller
0 replies
19h26m

that explains a lot

twisteriffic
0 replies
21h53m

Not all of it. My credit union's interac services still worked.

devgonewild
3 replies
19h16m

In Australia we recently had a telecom outage with Optus; there were untold amount of damage - card payments at shops/cafes were out. - rural towns completely cut off (a few in particular are only serviced by Optus) - emergency services unavailable; for example a snake wrangler was unable to receive his call-outs - hospitals infrastructure came to a halt

And I'm only going off of examples I have heard. These outages are very damaging.

luxuryballs
2 replies
18h52m

cashiers always look at me dumbstruck when I tell them about the mechanical offline credit card machines we had when I was a cashier back in 2004

james-skemp
1 replies
18h6m

If we're thinking of the same thing, I have fond memories of those at mall department stores in the 80s and 90s.

I think Sears around '04-06 (?) was the last time I saw one of those used. I think I bought a dehumidifier or air purifier.

When they started rolling out credit and debit cards without the raised numbers I thought fondly of those and how they were definitely done for now.

luxuryballs
0 replies
13m

Oh yeah I didn’t even think about the raised numbers thing! Cash is still king.

GravityLab
1 replies
21h31m

We need to maintain paper-based systems of information storage and retrieval. People should be familiar with a physical map. If we are too dependent on the technology, that is a risk.

Johnny555
0 replies
21h9m

Just keeping the paper isn't a solution, people need to know how to use the back up, and use it regularly. When I delivered pizzas, we had a big paper map of the city that we used to consult for deliveries, drivers quickly learned where nearly all of the streets in the city were and how to get there. For most deliveries, drivers just knew where to go, for the rare times I didn't, I either remembered the main street near my delivery or wrote down some notes on the box. Someone marked new streets on the map, as well as the names of major apartment complexes.

Just having that map on the wall isn't going to do any good since without regular use, no one's going to be able to use it effectively. And it's doubtful that people can be forced into using it.

wepple
0 replies
21h18m

Maybe larger companies have their own infra for this, or have redundancy in place (e.g. their own GPS devices)?

Modern trucks have cell modems tied to a private APN that are used for updating vehicle firmware & doing telematics. They also typically have a route to the internet that provides a WiFi hotspot in the cab.

Depending where the fault was in the telco stack, that APN may have still been functional

Not saying this was a significant resolution, but at least a possibility.

silisili
0 replies
21h10m

What about those mobile card readers like you see in small businesses and food trucks and such? I've never owned one, but assumed they ran over cellular.

dheera
0 replies
22h26m

What is/were the cascading effects of this, particularly for drivers?

I wish for an economic system in which all causes could be backpropagated to the source and the source be held responsible.

If for example I lost 2 hours of my time today because I had to fight with Comcast, Comcast should be charged for 2 hours worth of my hourly salary.

If I lost a job offer because of bad interview performance because of heating issues because of bad maintainence on part of landlord, landlord should be charged for the difference in time until I get my next job offer or the difference in salary until the next job offer.

If I had to fight health insurance for 5 hours on the phone due to incorrect bill and that caused me additional stress that caused my condition to worsen, health insurance should be held liable for the delta effects of that stress.

In this case the cellular operators in question would be held liable for the lost incomes of those drivers plus the lost incomes of passengers who lost money because they couldn't get to their destinations on time or missed flights and had to rebook them.

I know this level of backpropagation is hard to implement in the real world but it would be awesome if the entire world were one big PyTorch model and liabilities could be calculated by evaluating gradients.

caddemon
0 replies
6h53m

Minor in the grand scheme of things, but I thought interesting (or at least unexpected) - a bunch of rides at Disney World went down because they rely on AT&T push to talk for communication between staff, which is required for safety reasons to run some attractions.

Animats
31 replies
1d

This is a serious architectural flaw.

In the entire history of electromechanical switching in the Bell System, no central office was ever out of service for more than 30 minutes for any reason other than a natural disaster, or, on one single occasion, a major fire in NYC.

The AT&T Long Lines system in the 1960s and 1970s had ten regional centers, all independent and heavily interconnected. There was a control center, in Bedminster, NJ, but it just monitored and sent out routing updates every 15 minutes or so. All switches could revert to default routing if needed, which meant that some calls would not get through under heavy load. Most calls would still work.

jrochkind1
14 replies
19h24m

I've been thinking about how pretty much no USA infrastructure of today is as reliable as of the last half of the 20th century. Just my imagination, or true? And what does it mean?

topkai22
7 replies
18h36m

Airliners crash far less, road fatalities kept going down till 2010, so taking the broad view of were infrastructure is I’d say at least those got better.

Anecdotally, I remember more electricity interruptions and plumbing issues when I was a kid, but that could be location dependent and I couldn’t quickly find good numbers going back that far.

Edit: While the phone network didn’t necessarily go down, I frequently got “all circuits busy” when I was a kid. I don’t remember the last time that happened.

bronco21016
2 replies
17h53m

I wish we had metrics for utility companies. In my midwestern experience, things have gotten worse. I don’t remember any outages as a kid in the 90s that were over 24 hours aside from the major blackout in the early 2000s. As an adult I’ve experienced several outages greater than 24 hours in both summer and winter months. It’d be nice to be able to measure this.

Aloisius
1 replies
17h27m

I would caution comparing today against one's childhood memories.

Children have few responsibilities and are shielded by their caretakers. They simply do not notice much of the things that happen.

vel0city
0 replies
14h19m

Right? As a kid we might not have questioned that sudden trip to Grandma's for the day.

georgeplusplus
1 replies
6h54m

Not to come off pedantic but improvements in airplane and car design has nothing to do with infrastructure.

I think the average person views infrastructure improvements as improvements in the roads, airports, or air traffic control.

Reubachi
0 replies
4h52m

They are part and parcel.

In USA for example, road design AND vehicle design is directly linked and beholden to NHSTA regulations and policies.

Infrastructure (IE State route roads) are ever trending towards wider lanes and more gentle shoulder. This is precisely due to vehicle industry requirements requiring more vehicle safety features (and thus width and length), height, and shoulder level clearance @ windows.

All these infrastructure and endpoint changes are driven "organically" by the USA trend towards SUVs, but mainly driven by insurance requirements. Insurance and gov't "make out" on safer/roads/vehicles due to (perceivabally) less accidents and road maintenance.

I can't speak to airplanes, but I imagine the fact that far, far, far more people are able to fly today than even 25 years ago should show that the infrastructure has drastically improved.

watersb
0 replies
18h12m

And how often one of our five TV stations would be "Experiencing Technical Difficulties... Please Stand By".

scarface_74
0 replies
3h7m

And tangentially related, it was much easier for anyone to eavesdrop on your conversations.

When it rained, I could pick up my phone and hear conversations from my neighbor on my landline and talk to them without calling.

Not to mention if you were in the same house, you could surreptitiously here conversations by just picking up the phone or getting a device from radio shack that didn’t have a microphone, that you could plug in to another phone outlet.

With analog cellular, you could also buy a receiver from Radio Shack and hack it to pick up the unencrypted signals from cell phones.

lor_louis
3 replies
19h16m

Redundancy was needed because individual nodes/machines were more prone to failure. As machines got more and more reliable, having highly redundant infrastructure was seen as an extra cost.

kibwen
0 replies
18h12m

Efficiency or reliability: pick one.

dymk
0 replies
17h48m

Do you work at a telecom or are you just guessing?

Animats
0 replies
17h8m

Yes. Electromechanical switching systems were substantially more reliable than their components. How this was done should be understood by anybody designing high-reliability systems today.

"A History of Science and Engineering in the Bell System - Switching Technology 1925-1975" is a readable reference. The Internet Archive has it.[1]

More hardcore: "No. 5 Crossbar"[2]

The Connections Museum in Seattle still has a #5 Crossbar working.[3] Long distance used toll switches, "#4 Crossbar", and there were 202 of them.

#4 and #5 Crossbar machines are collections of stateless microservices, implemented from electromechanical components. The terminology is used in the old books is completely different, but that's what they are. Each service always has at least two servers. The parts that do have state are distributed. The crossbar switches that make actual connections have state, but are dumb - they are told what to do by "markers", which are stateless but can read the state of the crossbars and of other components. Failure of a single crossbar unit can take down less than a hundred lines at most. Other than the crossbars to external lines, everything had alternate routes. Everything has fault detection, with lights and alarm bells.

Error rates were fairly high. In the previous "step by step" system, a good central office misdirected about 1% of calls. With bad maintenance (and those things were high maintenance) that could get much worse. Crossbar was better, maybe 0.1% misdirected calls.

Routing tables in crossbar were mostly static ROMs of one kind or another. Routing consisted of trying a predetermined set of routes, in order. Clunky, but reliable.

Modern systems need a backdown to that mode.

[1] https://archive.org/details/historyofenginee0000unse_q0d8

[2] https://archive.org/details/bellsystem-no-5-crossbar-blr

[3] http://www.telcomhistory.org/connections-museum-seattle-exhi...

onthecanposting
1 replies
16h59m

Highway construction standards are much higher than they were 30-50 years ago, but it's a mixed bag. Administrative costs are significantly higher. Survey has dramatically improved with GPS. Highway engineering has not improved since about the 90s. Automated machine guidance has significantly improved the potential accuracy of grading operations in th last decade.

georgeplusplus
0 replies
6h51m

>>Highway construction standards are much higher than they were 30-50 years ago,

The roads in many US cities arent built to those standards and are grandfathered from them. New York City highways areas horrible.

lxe
5 replies
23h42m

The complexity and scale of moderns systems are on another scale of magnitude.

bluepizza
1 replies
23h36m

This is no justification. They should have another scale of resilience.

lxe
0 replies
23h20m

Agreed. Not a justification, but an explanation/excuse as to why systems are less reliable. You're right on the mark -- when reliability doesn't scale with complexity, you get this.

Scoundreller
1 replies
23h35m

And centralized. Data is cheap (though they won’t admit that) while big iron cellular core stuff is expensive.

Funny when they billed extra for long distance calls even though all calls were routed through one place for a huge geographic area. Calling your neighbour could be a hundreds of miles round trip over mobile.

Animats
0 replies
21h54m

And centralized.

Yes. Too much of routing is centralized. Since phone numbers are no longer locative (the area code and exchange number don't map to physical equipment) all calls require a lookup. It's not that big a table by modern standards. Tens of gigabytes. All switches should have a database slave of each telco's phone number routing list, to allow most local calls if external database connectivity is lost. It may be behind, and some roaming phones won't work. But most would get through.

Diederich
4 replies
21h40m

In this context, what's the physical scale of a 'central office', as far as regional dimensions? Thanks!

thedaly
2 replies
20h17m

I suspect that when AT&T built all the COs (1950-70s) they were constrained by both max number of lines and physical distance.

You’ll see numerous COs in a big city, but they are also pretty widely dispersed throughout the suburbs and rural areas.

jsjohnst
0 replies
17h57m

I worked for Southwestern Bell in the 90s pre-SBC (aka just before remote terminals and dslams became common). COs handled mid-tens of thousands of lines in big cities, for smaller more rural areas they generally covered a single town or less often there were a few in a county where the full county was under 50k people.

In towns, we generally tried to keep loop lengths under 30k feet, but in rural areas that simply wasn’t possible. You’d often find remnants of party line systems in those areas and definitely load coils out the wazzoo. It was “fun” unwinding all that crap to install ISDN circuits and later DSL.

I remember the old hats at the time laughing about VDSL saying “leave it to the nerds to dream up some unrealistic shit where the loop length can be at most 2k feet, where does that exist!?” not realizing a few years later RTs and DSLAMs would mean a significant portion of city and suburban customers would be.

deelowe
0 replies
18h35m

I used to work at BellSouth in outside plant engineering in the early 2000s. That's exactly what it was. Of course, by then any expansion was done via remote terminals and COs were becoming very antiquated.

teeray
0 replies
21h3m

Well, one is a skyscraper filled with equipment near the Brooklyn Bridge IIRC.

spacebacon
1 replies
16h50m

I’m surprised no one noticed the 502 when attempting to enable WiFi calling. Azure was the source of the 502. Cloud architecture problem.

ok123456
0 replies
21h36m

There was the Mother's Day outage of 1990. That was caused by someone swapping a break statement for a continue statement in some C code that handled the routing, and there was a cascading effect.

Then again. That only affected long-distance service.

danlugo92
0 replies
17h34m

True words :checkmark

kevin_nisbet
30 replies
1d4h

To everyone trying to speculate on the root cause, I haven't seen enough information in any of the comments to really draw any conclusions. Having worked on several nationwide cellular issues in Canada when I worked in telecom, we saw nationwide impacts based on any number of causes.

- A new route injected in the network caused the routing engines on a type of cellular specific equipment to crash nationwide. This took down internet access only from cell devices nationwide. But most people didn't notice because it happened at 2AM maintenance window and was fortunately discovered and reversed before business hours why the routing engine was in a crash loop.

- A tech plugged in some new routers, and the existing core routers crashed and rebooted. While the news worthy impact was just a regional outage for something like 20 minutes, we discovered bugs and side effects from the Pacific to Atlantic coasts over the next 12 hours. So when you say you're impacted at location x, that data point could be everyone is down in the area, many people are having issues, or only one or two people have issues spilled over to some other region. This is why seeing it does or doesn't work in location x is limited value, as almost every outage I've investigated could result in some people still having service for various reasons. The question is in a particular area is it 100% impact, 50% impact, or 0.001% impact.

- A messaging relay ran into it's configured rate limit. Retries in the protocol increased the messaging rate, so we effectively had a congestion collapse in that particular protocol. Because this was a congestion issue on passing state around, there were nationwide impacts, but you still had x% chance of completing your message flows and getting service.

And then there was the famous Rogers outage where I don't remember them admitting to the full root cause. It's speculated that they did an upgrade on change on their routing network, which also had the side effect of the problem booting all the technicians from the network. Then recovery was difficult because the issue took out the nationwide network and broke the ability for employees to coordinate (you know because they use the same network as all the customers who also can't get service). All the CRTC filings I reviewed had all the useful information redacted though, so there isn't much we can learn from it.

So it's fun to speculate, but here's hoping at the end of the day ATT is more transparent then we are in Canada, so the rest of the industry can learn from the experience.

flippy_flops
15 replies
1d3h

The speculation is fascinating. For most people, their guess is a reflection of themselves. Is there a term for that? This is a gross generalization, but I've seen... - Science people guessing solar flares - My "right-wing friend" guessed international hackers - I, myself, guessed it was a botched software release - Someone in this post commented their military friend says get gas

And yet, like everyone else, I genuinely feel that I'm probably right

BuyMyBitcoins
4 replies
1d2h

My speculation is: “Higher-ups kept demanding that technicians ‘do more with less’ in order to deliver on quarterly metrics and now we’re finally seeing the cumulative result of employees being stretched thin, underpaid, and overworked.”

You are welcome to infer as to why I’m thinking this way!

spazx
0 replies
1d1h

This is my bet; and mayyybe some external bad actors taking advantage of the situation on top of that.

nonethewiser
0 replies
1d1h

Obviously you’re a self loathing executive.

gnuser
0 replies
1d1h

The ops team can run the whole company and better without the C-Suite is my impression of modern day SV. Agile stickers on waterfall gates…

bregma
0 replies
1d1h

So how is the job search going?

Scoundreller
3 replies
1d3h

Someone in this post commented their military friend says get gas

The Rogers outage in Canada took out the nationwide debit card payment network because that infra depended on Rogers. Credit cards still worked, but depends on your station’s access to make the transaction. And no shortage of shops running their POS “in the cloud” and needing to close if they lose internet access. I actually did have to lend cash to a colleague to buy gas to get home during that Rogers outage.

All it takes is for one pipeline valve to depend on a cellular connection for billing to get the whole line shutdown.

And ugh, we hope for a botched software upgrade too, but a corp cyberattack is so much harder to recover from so can’t be discounted from the realm of possibilities. I know that’s where my mind went with Rogers given how thorough their outage was.

Was kinda unimaginable for a total outage to happen with no org comms ready to go in the pipeline. Your plans are supposed to have those comms ready for a bad update that you’ve been planning for weeks. It’s a cyberattack where you may stay silent. But I know Rogers isn’t going to admit fault until they find someone else to blame.

charcircuit
2 replies
1d2h

PoS devices are usually networked. If you don't validate transactions in realtime you would later validate in batch, but that has more risk than validating at the time of transaction.

Scoundreller
1 replies
1d2h

If you don't validate transactions in realtime you would later validate in batch

yeah, a lot of orgs just don't enable that (or don't have a process to enable it as required, and have difficulty pushing out a notice to do so if the network is down!).

Also can only do offline credit card transactions. Can't with our Interac (Canadian-only) debit network. Unsure about Visa/Mastercard debit transactions.

rescbr
0 replies
1d

Unsure about Visa/Mastercard debit transactions.

AIUI, the debit card itself enforces online confirmation, even if the transaction goes through the credit card rail.

r721
0 replies
14h20m

It looks like you were right:

A temporary network disruption that affected AT&T customers in the U.S. Thursday was caused by a software update, the company said.

AT&T told ABC News in a statement ABC News that the outage was not a cyberattack but caused by "the application and execution of an incorrect process used as we were expanding our network."

https://abcnews.go.com/US/att-outage-impacting-us-customers-...

https://news.ycombinator.com/item?id=39477187

nonethewiser
0 replies
1d1h

I literally caught myself thinking about a cyberattack merely because its sort of exciting (albeit terrible). And then realizing despite its prominence in my mind, it’s probably not the most likely cause (although certainly plausible still). And furthermore, that my mind gravitates to that without any real information suggesting it over other explanations. More about fearing for the worst instead of what you want I think.

mlyle
0 replies
1d2h

And yet, like everyone else, I genuinely feel that I'm probably right

This is the thing with black swan events. The more pedestrian explanations are almost always true, but then there's a tiny fraction of the time where you're much, much better off having taken a bit of an alarmist view.

akira2501
0 replies
20h15m

I genuinely feel that I'm probably right

We are wired that way for a reason. Until you personally see conflicting evidence you have to make an assumption or you would spend your life paralyzed or ignorant.

Biology rewards action more than accuracy.

ShamelessC
0 replies
1d3h

Is there a term for that?

Projecting, biased.

Scoundreller
10 replies
1d3h

Rogers, of course, blamed their vendor (Ericsson I believe it was). Rogers can do no wrong!

Of course, was fun to see yet another huge org have no back-out/failure plan for their potential enterprise-breaking changes. No/limited IT 101 stuff here.

The only positive thing we learned was that the big 3 (really 2) telcos thought it would be a good idea to give eachother emergency backup sims for the other network to key employees in case their network went down. They did that in 2015, but better late than never.

Fun that Rogers used the same core for wireless and wired connections, so many of us were in total blackout, even if we used a 3rd party internet provider that ran over Rogers. Like, everything including their website was down, corp circuits, everything with non-existent comms from Rogers.

Thankfully my org was multi-homed and switched over its circuits at 6am so on-site mostly continued without issue.

Also fun where the towers remained just powered on enough for phones to stick to them but not be able to do anything, so 9-1-1 calls would just fail, instead of failing-over to other networks. Seems like a deficiency in the GSM spec (or Rogers SIM programming?) that I don’t think was actioned on.

https://en.m.wikipedia.org/wiki/2022_Rogers_Communications_o...

kevin_nisbet
7 replies
1d3h

Also fun where the towers remained just powered on enough for phones to stick to them but not be able to do anything, so 9-1-1 calls would just fail, instead of failing-over to other networks. Seems like a deficiency in the GSM spec (or Rogers SIM programming?) that I don’t think was actioned on.

Actually, I think this is going to change after the Rogers outage, it's just slowly happening behind the scenes so it's not getting much attention these days. The government has mandated a lot of industry response to failover between providers... we'll see where they land after all the lobbying happens. I do think implementations are changing a bit around this, mostly in the phones so that they give up and go into a network scan if the emergency call is failing.

I worked mostly on core network stuff, so I was a layer removed from the towers, but if they hadn't lost management access they would've been able to tell the tower to stop advertising the network and 911 service. I do understand the question of from a vendor implementation perspective of how automatic this should be though... because automation in this regard does have some of it's own risks and could complicate some types of outages or inadvertently trigger and confuse recovery of problems.

I'm with you though there should be an automatic mechanism to fail over to other network operators, I just haven't thought through all the risks with it and I hope the industry is taking their time to think through the implications.

Scoundreller
4 replies
1d3h

I do think implementations are changing a bit around this, mostly in the phones so that they give up and go into a network scan if the emergency call is failing

It seems like this is a global problem, since all Rogers-subscribed devices in a Rogers reception area couldn’t make 9-1-1 calls. But could be a SIM coding issue and not afflict other providers elsewhere.

I just always imagined the GSM spec was so resilient that you could always make a 9-1-1 call if a working network was available but this outage proved that wrong. Surprising to learn in 2022.

Of course it’s Canada, so I agree with them that the thought of letting users failover to a partner for everything would thrash the partner’s networks. Even though Canadian subscriber plans are laughably low in monthly data and population density is low (per the telecom’s usual excuse for our high prices) it turns out the telecoms still underbuilt their networks to have less capacity than what other networks internationally built out to support plans available on the international market (e.g. close to truly unlimited data/free long distance calls)

kevin_nisbet
2 replies
1d2h

I just always imagined the GSM spec was so resilient that you could always make a 9-1-1 call if a working network was available but this outage proved that wrong.

As I recall it is slightly more nuanced than this and was particular to the failure mode, and has a couple of different things aligning to create the failure mode.

If you're phone is just blank, no sim card. To make an emergency call, it has to just start scanning all the supported frequencies. This is very slow, tune radio, wait for the scheduled information block that described the network on the radio protocol. See if it has the emergency services bit enabled. If not, tune to next frequency and try again. I used to remember all the timers, but almost a decade later I can't remember all the network timers for the information blocks.

The sim card interaction, is say you're at home and you boot up your phone with 100% clean state. You don't want to wait for this scan to complete, so the SIM card gives the phone hints about which frequencies the carrier uses, so start on frequency x to find the network. But if you roam internationally, it can take alot longer to find a partner network, and there are some other techs around steering to preferred partners, but I don't know that those come into play here. I don't know but would be surprised if there is a SIM option to try and pin the emergency calls to a network, I think it's more likely the interaction is this hint on where to start the scan.

The way the rogers network failed, it appears to me it caused the towers to stay in a state where they advertised in their radio block the network was there, and the 911 bit was enabled so the network could be used for emergency calls. This is where I don't really have the details since they haven't been public about it, how much of their network was still available internally. Maybe the cell towers could all see each other, that network layer was OK, and the signalling equipment was all talking to each other as well. That's the part I don't really know and have to speculate, as well as the tower side since I was a core person. So because the towers had enough service to never wilt themselves, they kept advertising the network, along with the 911 support. But then when you try to activate an emergency call, somewhere in the signalling path, as you get from tower to signalling system, to the voip equipment, to the circuits to the emergency center the outage knocked something out. Oh and for all these pieces of 911 equipment, there are two of everything for redundancy... two network paths, two pieces of equipment, etc.

And because they lost admin access to their management network, no one could go in manually and tell the towers to wilt themselves either.

If the towers had just stopped advertising 911 services, the phone would fall back into the network search mode as I described when you have no sim card. It just starts scanning the frequencies until it see's an information block for a network it can talk with the emergency support advertised to and does an emergency attach to the network that the carriers will all accept (An unauthenticated attach for the sole purpose of contacting an emergency center).

So my suspicion is because carriers are so used to we have two of everything, and all emergency calls are marked for priority handling at all layers of the equipment (they get high priority bits on all the network packets and priority CPU scheduling in all the equipment), this particular failure mode where there was a fault somewhere down the line, and they lost control of the towers to tell them to stop advertising 911 services all sort of played together to create the failure mode.

mjevans
1 replies
1d1h

Multi-faceted failure mode.

0) At the network terminal level (mobile phone): at least for emergency calls if a given network fails to connect, fail over and try other networks. Even if the preferred networks claim to provide service.

1) At the network level: failure thresholds should be present. If those thresholds are crossed enter a fail-safe state. This should include entering a soft offline / overloaded response state.

2) Where possible critical data paths should cross-route. Infra Command and Control and Emergency calls in this case. Though if Roger's issue was expired certs or something the plans for handling that get complicated.

Scoundreller
0 replies
1d

it’s that “0” level that surprised me the most here.

Days later, Rogers said you might be able to pull out/disable your SIM card to call 9-1-1, but then it depends: if Rogers is the strongest network, you might end up in the same predicament anyway.

toast0
0 replies
1d2h

I just always imagined the GSM spec was so resilient that you could always make a 9-1-1 call if a working network was available but this outage proved that wrong. Surprising to learn in 2022.

The X is broken but claims it isn't stops failover pattern is strong all over networking. It's not unusual to see it in telco root cause analysis.

Fatnino
1 replies
1d2h

Say there is an outage at the 911 call center. Now you try to call, don't get through, and your phone writes off that tower. Who were you planning to call after 911? Too bad, should have placed that call first.

vlovich123
0 replies
1d2h

Your phone would try other towers from other providers. If 911 is experiencing an outage that’s a separate issue that needs to be mitigated at a different layer. Even still, 100% uptime is difficult and expensive.

MichaelZuo
1 replies
1d3h

Fun that Rogers used the same core for wireless and wired connections, so many of us were in total blackout, even if we used a 3rd party internet provider that ran over Rogers.

If it ran over Rogers circuits then why wouldn't it go down too? Isn't that the case everywhere?

Scoundreller
0 replies
1d3h

I just know that a part of Rogers’ response was to separate their cores between wireless and wireline so that the risk of both going down simultaneously would be reduced.

The 3rd party providers aren’t white-label resellers, but there’s obviously some overlapping susceptibilities to going down when Rogers breaks something. Depends what they break, and in this case, it took them down too.

giancarlostoro
0 replies
1d1h

some people still having service for various reasons.

I assume roaming being one of the top reasons no?

anitil
0 replies
16h54m

All the CRTC filings I reviewed had all the useful information redacted

Is this common in the industry?

Y3Jlbmd1dGEK
0 replies
23h12m

I don't think it has anything to do with routing by looking at the comments on down detector. Many people report they are in the same household, and one person out of six (in the household) experiences the problem while all are on ATT. It sounds more like an upgrade that went through halfway, or, considering the time it happened, maybe a rollback that went only half through.

oceliker
26 replies
1d8h

This is so odd. I have two phones with AT&T currently sitting right next to each other. One has service, the other is in SOS mode.

bombcar
10 replies
1d7h

Does it still show bars in SOS mode? Or is SOS just “I dunno can’t see no cell towers but maybe it’d work?”

I wonder if the MVNOs that piggyback on AT&T are showing down also. If not, it’s some AT&T service authorization system that exploded.

rixthefox
6 replies
1d5h

On the latest iPhones, SOS mode is the emergency fallback to satellite service. It's really meant to be used in situations where you're well outside of any sort of service area but you have a clear, unobstructed view of the sky.

Your iPhone will instruct you on where to point and help you track an emergency satellite that is manned by live humans who will take your emergency request and relay it to the proper people.

More specific info here: https://support.apple.com/en-us/104992

SirMaster
3 replies
1d5h

SOS means it has cell service and you can call 911.

If there is no cell service then it's SOS with a little picture of a satellite next to it.

inferiorhuman
2 replies
1d4h

Except SFFD is reporting that (some) AT&T customers are unable to call 911.

organsnyder
0 replies
1d2h

SOS mode typically means that your phone is connected to a carrier other than one you have a contract with.

Scoundreller
0 replies
1d1h

I wonder if some devices bungle their failover. The exact failure mode state of AT&Ts network might cause some devices to hang onto AT&T’s RF.

lxgr
0 replies
1d5h

I’m pretty sure that “SOS only” can also mean the phone seeing networks it can’t register with but which it could make an emergency call on if required. This predates satellite SOS.

bombcar
0 replies
1d5h

Won't that only activate if it can't see or communicate with any towers at all?

sandinmyjoints
0 replies
1d6h

Mine says SOS Only, shows no bars.

ok123456
0 replies
1d4h

Emergency only means 911 calls through whatever provider is available.

dcan
0 replies
1d6h

SOS mode means it can see towers of other providers you aren’t authenticated with, but no signal to authenticated cell towers

nathanyz
5 replies
1d6h

Some of our staff are reporting similar where their partner's phone has service and their's doesn't. Both on same AT&T family plan.

So the radio bands may play into it although I would think with latest iPhones, they can use any of the bands from AT&T although I could be wrong.

PietdeVries
1 replies
1d4h

Can you check if things improve if you turn off 5G and move to 4G/LTE instead?

nathanyz
0 replies
1d3h

Good thought, but switching to LTE only didn't work. Same result of ending up in SOS only. Cellular over wifi works perfectly fine though. Wish we could count on better post mortems from the phone companies, but I'm not holding my breath for it.

vel0city
0 replies
1d6h

I'm wondering if it could be some kind of auth timeout. I've heard from a few people of one person's phone going out, then a bit later the other person's phone finally failing too.

alephnerd
0 replies
1d6h

I'm on the latest iPhone and it's SOS for me

aclindsa
0 replies
1d6h

Yep: my partner's iPhone has service while my Pixel doesn't, both on same plan.

BuckYeah
2 replies
1d7h

Different bands could be affected differently if it is solar radiation related. Same exact model of phone?

sandinmyjoints
0 replies
1d6h

Different models, in my case.

oceliker
0 replies
1d6h

Same year, different size (13 and 13 Pro Plus)

jeffwask
1 replies
1d4h

There are some reports that phones with e-sims are less likely to be impacted versus phones with hardware sims.

harambae
0 replies
1d

I have eSIM only (iPhone 15) and was impacted the same as physical SIM users on AT&T (Boston area).

I suppose I can't speak to likelihood with a sample size of one.

SkyPuncher
1 replies
1d5h

My wife and I were riding in the car next to each other. Took mine about 5 to 10 minutes longer to jump to SOS mode.

pixl97
0 replies
1d5h

I wonder how SIM registration works? For example if it's like a token with an expiry. If some set of registration servers on the network couldn't renew then I could see behavior like this.

sandinmyjoints
0 replies
1d7h

Same here. Strange. Would love to know the reason!

bdcravens
0 replies
1d4h

Ditto, mine and my wife's. In my case, the working one is slightly newer (15 Pro Max vs 14 Pro)

TheAdamist
17 replies
1d8h

This reminds me of the recent discussion on status pages. https://news.ycombinator.com/item?id=39099980

They need to be accurate. At&t status claims everything is fine.

My wireless service is down. Down detector has tens of thousands of reports, so clearly everything is not fine.

bombcar
12 replies
1d7h

Status pages are basically useless if they’re public facing.

Either they automatically update based on automatic tests (like some of the Internet backbone health tests) or they’re manually updated.

If they’re automatic, they’re almost always internal and not public. If they’re manual, they’re almost always delayed and not updated until after the outage is posted to HN anyway.

op00to
9 replies
1d7h

Which is better? How do you know whether an issue is individual to a customer or a quick blip that will resolve in a few seconds?

menacingly
5 replies
1d5h

you can't operate at any scale at all without mechanisms in place to know perfectly well whether an issue is impacting a single customer or if your world is on fire

bombcar
3 replies
1d4h

You'd like to think so, but surprisingly large number of "large scale" things operate on the "everything is fine" until too many people complain about the fire.

pixl97
2 replies
1d4h

Caches make problems fun too.

Quite often you see automated tests that check how well your cache/in memory data are working. But when some other customer that isn't in the hot path tries to access their request times out. I've seen a lot of people making automated checking systems fail at things like this.

zitterbewegung
1 replies
1d3h

The phrase “the hardest parts of computer science is caching and naming things” come to mind.

r2_pilot
0 replies
1d1h

I see 2 things here but you're off by one.

op00to
0 replies
1d5h

Yes, but those mechanisms take time to determine this.

bombcar
2 replies
1d6h

I prefer fully automated tests publicly revealed because the main thing I want to know (as a customer) is should I keep trying to fix my end or give up because GitHub exploded again.

It’s most annoying when you have something like recently - known maintenance work on my upstream home fiber connection that was resulting in service degradation (but not complete loss, my fiber line was back to DSL or dialup). The chat lady could see that my area was affected, but the issue lookup system couldn’t.

If the issue lookup had told me there as an issue I’d’ve gone on my merry way.

I even checked a few more times until it was resolved; the issue never appeared in the issue lookup system.

op00to
1 replies
1d5h

should I keep trying to fix my end or give up because GitHub exploded again

Making this decision easy is a fight I fight for my customers every day. :)

bombcar
0 replies
1d4h

This was much much much easier when websites used to explode with tracebacks and other detailed error messages, now you just get a "whoopsie doopsie we did a fuckywucky" and you can't really tell what's going on.

ryathal
1 replies
1d6h

The other problem with status pages is depending on what happened it may not be possible to update the status page anyway. You really need a third party to have a useful status page.

TheAdamist
0 replies
1d1h

Which is pretty much what down detector has evolved into. And it looks like they have an enterprise offering to alert companies to their own issues.

teeray
1 replies
1d2h

They need to be accurate

It would be nice if the FTC mandated this. It is exhausting when the status page is taken over by the marketing department (the infamous green check with the little "i").

jimmaswell
0 replies
1d

the infamous green check with the little "i"

I'm not familiar, what are some examples?

spicybbq
0 replies
1d3h

Currently there is a banner on the AT&T outage page with this message:

Service Alert: Some of our customers are experiencing wireless service interruptions this morning. We are working urgently to restore service to them. We will provide updates as they are available.

https://www.att.com/outages/

op00to
0 replies
1d7h

Which status page?

nathanyz
12 replies
1d3h

Latest AT&T Statement: “Our network teams took immediate action and so far three-quarters of our network has been restored,” the company said. “We are working as quickly as possible to restore service to remaining customers.”

Still down for me though.

Scoundreller
6 replies
1d2h

3/4 might just mean the internal facing side, which is still progress, but doesn’t mean any improvements for end-users.

wizerdrobe
5 replies
1d1h

Anecdotally, I woke up to no signal / “SOS” mode on my iPhone this morning at around 0600 and had service restored around 0830 in South Carolina. However, a coworker in Memphis confirmed he was still out of service at 1000 so it’s regional restoration.

Scoundreller
3 replies
1d1h

I always wonder instead of a regional restoration, if they would “disable” segments of SIMs/accounts randomly to avoid lightning strike (it’s not a DDoS…) their network as they turn things back on. Depends on what the recovery method is, but could be problematic to turn everything back on at once.

zackkitzmiller
0 replies
5h19m

I've never heard lightning strike, always thundering herd.

wizerdrobe
0 replies
18h43m

I actually spoke with my wife after the initial comment, she was reconnected at a later time than I! So it’s not regional but some other mechanism.

fragmede
0 replies
1d

I've heard that called the thundering herd problem.

chasd00
0 replies
1d

my wife's phone was SOS when she woke up at about 6AM central and finally become operational around 1:30PM central.

IAmGraydon
2 replies
1d3h

Still down for me too.

TheCaptain67
1 replies
1d3h

down all morning in ATL but back up at 1PM EST

themaninthedark
0 replies
1d1h

Back up in Cartersville, north of ATL @ 13:12. Oddly enough my text messages say they went through at 12:43 but my response to someone's message when once my phone had everything roll in at once is timestamped 13:12

partiallypro
0 replies
1d3h

Also still down for me here around Nashville.

DHPersonal
0 replies
1d2h

Just came back up for me in Oklahoma City, OK.

BuckYeah
12 replies
1d7h

NOAA is reporting R3 activity. Solar flare outage seems likely

relbeek2
3 replies
1d6h

A solar flare targetted at ATT Infra only? Unlikely.

Could be a hack Could be a single point of failure Could be a config change that borked the system

Could be other things that make more sense we will have to wait for more info.

IAmGraydon
2 replies
1d4h

I know the other carriers are saying they aren't affected, but one look at DownDetector shows that nearly every carrier was affected, and all at the same time.

miah_
0 replies
1d3h

A user who has a working cell phone could contact a user who is on a network experiencing issues. Because the call fails, that user may decide their cell network is also faulty. Downdetector only works on user reports. Its basically useless for actual measurement because people are bad at troubleshooting.

joecool1029
0 replies
1d3h

That site counts mentions over social media, if you said 'T-Mobile/Verizon isn't having an outage' it'd still show up as outage activity on it. Plus people report issues calling AT&T customers.

alephnerd
1 replies
1d7h

Then why isn't Starlink down

BuckYeah
0 replies
1d7h

Does starlink use the same bands and same equipment? I doubt it

swagmoney1606
0 replies
1d4h

I don't think this is likely

sumtechguy
0 replies
1d7h

getting BOFH vibes here...

Waterluvian
0 replies
1d7h

Then why is this a U.S. only problem?

Edit: I’m getting 1 bar when I usually get 4 in southern Ontario. But I see no broad reports of issues.

LinuxBender
0 replies
1d7h

There was an earth directed X class, but not a big one. If that could affect SS7 I would expect it to also take out a chunk of the internet but I am not seeing that. [1] There are many cellular networks having issues but some could be and likely are resellers of others. [2] Probably more likely SS7 related as marcus0x62 mentioned. Another potentially SS7 related possibility? [3]

[1] - https://www.thousandeyes.com/outages/

[2] - https://downdetector.com/

[3] - https://www.webwire.com/ViewPressRel.asp?aId=318230

nu11ptr
9 replies
1d4h

Using myATT app doesn't even show my wireless account anymore. My entire family account doesn't even show up as a service. Seems like a hack or internal issue that deleted accounts? Can others confirm whether they see their accounts?

partiallypro
3 replies
1d3h

From what I read just a bit ago, basically there is a problem with the database of SIM numbers. So, SIMs all just dislodged from the network because they lost their network authorization. That would lead one to believe it was a botched software push. I imagine online accounts get this information somehow which could explain the portal being broken. I have a prepaid hotspot and it works fine, but none of the "family plan" month to month contract phones work. I also wonder if there's a physical SIM vs eSIM situation that could explain "newer" models working.

Kon-Peki
1 replies
1d2h

In our house this morning, the two phones with physical SIMs worked fine and the two phones with eSIMs were SOS mode only.

I could log into my AT&T account just fine and all phones showed up correctly.

(I’m submitting this from an AT&T 5G connection, no WiFi nearby)

chasd00
0 replies
1d

aligns with my experience too. My wife's newer phone was sos but my older one was fine. Both ATT.

kjellsbells
0 replies
1d1h

That would be consistent with the symptoms. Big telco networks are hierarchical with most functions pushed to regional data centers with a very small number of services in a redundant pair or trio of central data centers. Subscriber database (HSS, UDR) would be one such function.

The cause of a failure of the HSS could be manifold, ranging from router failures to software bugs to cyber attack (databases of 100M+ users being a juicy target).

One slightly scary observation from NANOG was that FirstNet, the network that ATT built for first responders, was down. That would be ugly if true and I'd expect the FCC to be very interested in getting to the bottom of it.

mh-
1 replies
1d4h

Many of their APIs appear to be intermittently returning 502s, leading to strange behavior in their web/mobile apps.

radicaldreamer
0 replies
1d1h

That's just people trying to figure out if their service has been disconnected rather than it being a network outage

zeven7
0 replies
1d4h

I can see my account. The first thing I did when I saw my phone wasn't working was log in and pay my bill, thinking maybe I had missed one or something.

deckar01
0 replies
1d3h

The account management / status tools were slow and flaky on the best days. I wouldn’t rule out a little extra traffic knocking them out. Correlation is not necessarily causation.

bmitc
0 replies
1d4h

Their outage status page is also completely broken. Doesn't show anything.

c-linkage
8 replies
1d8h

I'll take a chance here and say this was a hack, possibly at the equipment-level. One major carrier having an outage? Possible. But three? On different networks?

Even if its not a hack I'd love to see the root cause on this one! Communications is critical infrastructure, so I'm gonna guess the government will demand a full report.

marcus0x62
1 replies
1d7h

There is some shared infrastructure in the PSTN that all network operators use. This smells more like an SS7 outage to me than a hack, but we'll have to wait to find out.

zikduruqe
0 replies
1d6h

Backhoe fade?

Back in my circuit switched days, we lost half of our US long distance routes, because a farmer in Wyoming dig up an unmarked fiber link. Hence, backhoe fade.

throwawaaarrgh
0 replies
1d7h

"A Verizon spokesman reached by CNBC said there was no issue with the Verizon network and their customers are only impacted if they try to reach out to the carrier experiencing the problem."

Sounds like only one had an outage

tallanvor
0 replies
1d6h

Sounds like it's really only AT&T. Most likely somebody screwed up a configuration change and took out a lot of capacity.

probably_satan
0 replies
1d4h

Probably Huawei related

jeffwask
0 replies
1d4h

I'm 50% hack and 50% BGP but my gut says cyberattack.

input_sh
0 replies
1d7h

When you lose your own network, your phone connects to a different one to be able to make emergency calls. One carrier knocking down other carriers is entirely plausible.

beAbU
0 replies
1d7h

Or a borked update to some piece of hardware that's used by all 3. Hanlon's razor and all that.

DebtDeflation
8 replies
1d6h

Verizon and T-Mo both issued statements that they have no outages and the issue is just their customers being unable to call AT&T customers. Looks like most of the AT&T network in the US is down though.

peteradio
4 replies
1d5h

My wife has google-fi and her coworker has verizon. Both of them say they can't make calls.

whynotminot
0 replies
1d4h

Any chance they’re trying to call an AT&T customer?

nonethewiser
0 replies
1d4h

and google-fi uses T-Mobile I believe

inferiorhuman
0 replies
1d4h

I've been tethering with T-Mobile as my primary internet connection and that's been working just fine. Voice also works for me with both TMo and Google Fi.

cddotdotslash
0 replies
1d4h

Anecdotal, but I have Google Fi and was on a ~1 hour call this AM during the height of the outage and had zero issues.

Tarball10
1 replies
21h50m

A theory for the reported Verizon/T-Mobile issues is that when AT&T went offline, all of those phones went into SOS mode and tried to register on the remaining available networks (Verizon and T-Mobile) to allow 911 calls to be made. The surge in devices registering at once may have overloaded some parts of those networks.

justsomehnguy
0 replies
2h54m

Uhm, no.

You don't need to register to allow 911 calls. You 'register' (it's not a regular registration) at the moment you are placing the actual 911/112 call. At least that was in 2G/3G networks, doubt it changed.

There is always some amount of terminals without SIM or without working SIM, there is no need for them to bang every available network in the vicinity just in case there could be an emergency call.

ethbr1
0 replies
1d3h

Data point on ATT (via MVNO) in Atlanta: was connected until ~11:00 EST, then booted off and haven't reconnected.

midwestfounder
6 replies
1d6h

I wonder if this is a cyber incident. Curious if any telecom folks know what the most likely explanation for an event like this would be, and what telltale signs/symptoms might first indicate this was caused by something nefarious.

relbeek2
3 replies
1d2h

However, the US Cybersecurity and Infrastructure Security Agency is “working closely with AT&T to understand the cause of the outage and its impacts, and stand[s] ready to offer any assistance needed,” Eric Goldstein, the agency’s executive assistant director for cybersecurity, said in a statement to CNN.[1]

[1] - [https://www.cnn.com/2024/02/22/tech/att-cell-service-outage/...

This isn't telling of anything, right? Wouldn't CISA be involved with anything that impacts Public Infrastructure at this level?

relbeek2
0 replies
1d2h

from the same article above, it seems like it's a critical part of this.

“Everybody’s incentives are aligned,” the former official said. “The FCC is going to want to know what caused it so that lessons can be learned. And if they find malfeasance or bad actions or, just poor quality of oversight of the network, they have the latitude to act.”

If AT&T gets to decide if they are at fault, they will, of course, never be at fault. So a third-party investigation makes a lot of sense.

I would also suspect that the FCC would not be as well versed in determining if there was a hack or even who did it, which is why I feel like CISA would need to get involved in the investigation.

red-iron-pine
0 replies
1d1h

by itself, not telling of anything per se.

like, you could commit a dumb BGP config and break lots of stuff. have done that in the past, actually...

but any time a national-tier ISP has a national-level outage, that warrants a look from multiple orgs. and given the number of threat actors like china, NK, iran, and russia, who are, and have, made aggressive efforts in this space -- and have strong reasons to do so now -- its not crazy for the US fed'gov to want to know a little more, and offer to help. but again, entirely possible it's unrelated.

overstay8930
0 replies
1d1h

This is normal for high profile outages, even if you are small you can still engage with the CISA if you think there's foul play.

unforeseen9991
1 replies
1d5h

Due to the gross incompetence these companies operate at, it's too hard to tell the difference.

pylua
0 replies
1d5h

Unfortunately, unlike cyber security, there are no off the shelf products that are being sold to help companies with general incompetence.

indigodaddy
6 replies
1d5h

Is the outage always reflected by how many bars? I have ATT yearly prepaid and currently 3 bars, which is about normal for current area..

nonethewiser
5 replies
1d5h

Tried to use it? Other comments suggest it's down nationwide but you are saying you're up?

saltyollpheist
4 replies
1d4h

Having bars doesn't necessarily mean calls can be made or received. Mine keeps fluctuating between 0 and 5 bars but wherever I try to make a call, it claims I'm not registered with my current carrier, which uses AT&T as its backbone.

nonethewiser
1 replies
1d4h

yeah thats why I asked if he used it

saltyollpheist
0 replies
1d4h

NGL I replied to the wrong person. It's been a long morning. My apologies and keep on keeping on.

13of40
1 replies
1d4h

That's something I've noticed in general with 5G - basically five bars means you've got a good connection to the tower, but you could have bad service due to a bottleneck upstream of that. (At least for data - for voice that might be more negligible.)

selectodude
0 replies
1d2h

That’s been an issue for a lot longer than that, at least in dense metro areas. Even back in the 3G days I’d have 5 bars and no data due to tower congestion.

chevman
6 replies
1d2h

Outage Over

Status: Restored AT&T FINAL, Service Degradation, Global Smart Messaging Suite AT&T Global Smart Messaging Suite

Event description: FINAL, Service Degradation Impacted Services: MMS MT Start time: 02-21-2024 22:00 Eastern, 21:00 Central, 19:00 Pacific End time: 02-22-2024 11:00 Eastern, 10:00 Central, 08:00 Pacific

Downtime: 780 minutes

Dear Customer, We are writing to inform you that Global Smart Messaging Suite is now available. The MMS MT service has been restored and our team is currently monitoring Thank you.

AT&T Business Solutions Kind Regards,

The AT&T SMS Service Administrator

nonethewiser
4 replies
1d1h

I think a communication like this should include that they are investigating the root cause (assuming they aren’t completely sure) and that they will share it, and state where.

Maybe im reading to much into it but it bothers me that thats not in the communication.

freedomben
1 replies
1d1h

They most certainly are investigating the root cause, and probably there's a witch hunt developing, but as far as customers go I would expect AT&T's attitude to be "none of your business." I've worked with many of these types of companies before, and outside of the occasional cool CS rep, their cultures are lots of information hoarders and responsibility dodgers. Taking responsibility for a problem is a good way to ensure you never get promoted.

keanebean86
0 replies
1d

"A recently departed employee had a core router's power going through a wall switch. This was done to facilitate quick reboots. A cleaning contractor turned off the switch thinking it was a light. It took us several hours to determine the situation and restore power"

dv_dt
1 replies
1d

I think the telecom issue playbook is significantly different than the SaaS playbook. Not sure if that’s just cultural or if there are other drivers - maybe paying customer telecom interfaces are simpler and more closed than typical SaaS?

aksss
0 replies
23h54m

IME, telecom as an industry is highly focused on the RCA, ICA, and uptime, and has had that embedded culturally for decades. Sharing the information publicly doesn't have much value, in the balance, unless there are a string of incidents where an acute perception problem needs to be addressed. This would more likely result in a marketing and advertising strategy rather than the sharing of technical RCA details. Additionally, one must consider that not all RCA details are fit for public disclosure. _You_ may be interested in deets, but John Q. Public is not interested beyond "Is it fixed yet?". If you want insider perspective, work in/with the industry. It's fascinating stuff.

wolverine876
0 replies
22h25m

Doe that include cellular voice calls?

roamerz
5 replies
1d4h

I wonder if this extends to the Public Safety part of AT&T - FirstNet.

pgrote
2 replies
1d4h

Yes, it does. Family member in the medical field is experiencing issues.

roamerz
1 replies
1d4h

Dang that's a bad look for them. Hopefully they can provide a thoughtful and honest postmortem.

hoppyhoppy2
0 replies
23h46m

medical field

postmortem

Let's hope that everyone lived so a postmortem won't be necessary :)

EvanAnderson
1 replies
1d3h

It does. I was the on call support technician for a public safety answering point (PSAP-- aka a 911 center) this morning. At 04:14 Eastern today I received a call that the law enforcement and fire mobile units on FirstNet all disconnected from the VPN.

The AT&T land lines (CAMA trunks provided by the ILEC Frontier) that handle the 911 service did not fail. Only mobile service failed.

roamerz
0 replies
1d2h

Yeah I have had to make that call myself many times in the past. Never with FirstNet but definitely with AT&T and Verizon. It would be awesome if the carriers could put out a reliable announcement to the affected accounts that there is an outage - it would definitely simplify the pre-support call triage an 1AM.

mstudio
5 replies
1d5h

My ATT Phone is in SOS mode. However, ATT's outages status reports:

  > All clear! No outages to report.
  > We didn’t find any outages in your area. Still having issues?
https://www.att.com/outages/

Scoundreller
1 replies
1d4h

At least they’re in SOS mode. When Rogers in Canada had a total blackout (cellular, home internet, MPLS, corp circuits, their radio stations , everything), phones showed zero bars, but the towers were still powered on and doing some minimum level of handshake so phones didn’t go into SOS mode.

If you tried to make a 9-1-1 call, it would just fail. It wouldn’t fail over to another network because the towers were still powered up but unable to do anything, and Rogers couldn’t power them down because their internal stuff was all down.

Like a day later they said you could remove your SIM card to do a 9-1-1 call. Thanks guys.

Of course, no real info from the provider during the outage. Turns out they did an enterprise-risking upgrade on a Friday morning and nobody at the org seemed to have a “what if this fails plan”. CTO was on vacation and roaming phones were black too and he thought it was just an issue for him.

https://en.m.wikipedia.org/wiki/2022_Rogers_Communications_o...

ezfe
0 replies
23h16m

Some people earlier on this morning said they couldn't make 911 calls. I wonder if it was the same issue and perhaps AT&T cut the towers completely pending a fix. Purely speculation.

inferiorhuman
0 replies
1d3h

I'm seeing a variety of outages listed there as of 08:30 Pacific, mostly landline. There are a couple wireless outages shown in Sonoma (and listed as impacting Sonoma and Ventura counties). The initial cause is shown as "maintenance activity".

https://imgur.com/a/oXZpEX9

derbOac
0 replies
20h49m

Our land-based internet with a different [large] ISP went out about 18 hours before this stuff with wireless started, and is still going on. We've been getting similar contradictory messaging the whole time, and they seem confused about what's causing it. We got a message a couple of hours ago it had been resolved and then 30 minutes later said it was not.

It could be entirely coincidental and unrelated to the stuff with other networks, but the timing was odd and I have never ever seen anything like this outage from them. I can think of one time it was out for around 2 hours in the last 5 years, and it was with a very specific infrastructural upgrade they knew about.

dangus
0 replies
1d4h

I’m seeing an outage reported on this page.

efitz
5 replies
1d6h

Reporting status externally is hard. I worked at a cloud provider and while we had very detailed metrics about how our systems were functioning, it was difficult to distill that information in a way that customers would understand if they were being impacted or not, and what the impact would be. Just reporting raw numbers wouldn’t give customers the context to understand what was actually going on.

How do you actually report, for example, that .003% of your customers are having a really bad day but the rest are just fine?

mistrial9
1 replies
1d4h

because you define thresholds for event classification.. the difference between 10024 customers having failure and 10100 customers having failure is not the question, right? when many hundreds of thousands of customers are failing at once, is that really very difficult to determine?

secondly, there are financial, management and information security pressures to NOT REPORT reality to the public. This happens VERY OFTEN in real business. In fact, that is why legal enforcement actions and real consequences are crucial versus Big Business.

efitz
0 replies
1d3h

In a cloud, massive outages are the rare events, and technically easy to report.

Small outages happen all the time, and are difficult to report accurately.

I think AWS has pivoted to trying to report status in each individual customer's support portal, so that they can give a dashboard that reports the state of the cloud from that customer's perspective. That way a rack down that only affects a few customers is only reported to those customers, and the dashboard doesn't have to always be red for everyone (or green for everyone, even those affected).

bradleyjg
1 replies
1d4h

If it can’t be done then don’t do it. A dot that’s always green, no matter what, is worse than nothing.

RajT88
0 replies
18h27m

Agree. As a customer, you need to be able to correlate your impact to anything which might be going on.

Because those cloud provider internal tools are often wrong about who's impacted and who's not.

12345hn6789
0 replies
1d6h

Aren't these exactly what p99 illustrates? Of course you'd have to have your use cases logged well enough to be able to aggregate the 99th percentile accurately to each customer flow

bdcravens
5 replies
1d5h

My wife and I are same account (AT&T, Texas) and hers is down but mine isn't. Only difference is my phone is slightly newer (15 Pro Max vs 14 Pro)

inemesitaffia
1 replies
1d3h

Did I see you on Reddit? As in both are down now?

bdcravens
0 replies
1d1h

No that wasn't me.

thechao
0 replies
1d4h

My 12 year old has been lording it over the rest of the family, for the same reason.

chasd00
0 replies
23h58m

In the same boat (AT&T/Texas) but reversed, my wife's newer iphone was SOS but my older one was not. Her's came back online about 1:30PM central after being down since she woke up around 6AM central. afaik both my kid's phones were working, I'm sure i would have heard howls of anguish if not. Both of their phones are maybe 4-5 generations older iphones.

breischl
0 replies
1d4h

Maybe yours is set up for WiFi calling?

hamandcheese
4 replies
1d3h

Late yesterday, I tried to place a 911 call while driving on the 101 north toward San Francisco.

Before I could complete my report, the call dropped, and for the next 10 or so minutes I had no service, only "SOS".

I'm on Verizon, and the timing doesn't match up with this headline, but now I'm suspicious.

organsnyder
2 replies
1d2h

Sounds like a standard dropped call due to a dead zone.

gnicholas
1 replies
1d

VZW doesn't have dead zones on 101 in the Bay Area, in my experience.

organsnyder
0 replies
7h9m

Could be issues with a single tower or something.

xyst
0 replies
1d2h

Sounds typical for Verizon tbh

acefaceZ
4 replies
1d4h

What is the big deal? Just use your HAM radio.

zingababba
0 replies
23h9m

I agree unironically

xyst
0 replies
1d2h

I’m more of a smoke signal guy.

engineer_22
0 replies
23h52m

That reminds me, I passed the General and Tech but didn't follow through on my application...

chomp5977
0 replies
21h26m

Funny you say this lol I did exactly that

softwaredoug
3 replies
1d6h

I'm just curious how we as a society manage the uptime of such a critical service? Do we have laws or enforcement in place to regulate / enforce how well basic cellular service (at least the emergency tier) must work in the US?

Just imagining the dropped emergency calls today, etc?

(Genuine question)

testfrequency
2 replies
1d6h

I don’t have anything positive to add, except noting that FirstNet is also down. I’m sure there’s a lot of chaos for emergency responders right now :/

EvanAnderson
0 replies
21h15m

FirstNet absolutely went down in my locality (western Ohio, near Dayton). I was on shift for after hours support of a public safety answering point (AKA a 911 center) and I received a call at 04:14 Eastern stating that all the law enforcement/fire mobile units on FirstNet disconnected from the VPN and that no FirstNet cell phones could make/receive calls or send/receive text messages.

partiallypro
3 replies
1d7h

I haven't had service since around 3am this morning, my internet (fixed wireless) was also down around that time but has since come back on. I had just assumed it was local tower maintenance until I woke up and see it's still out and it's effecting millions. But was surprised to see it barely mentioned/getting engagement here given the scale.

flyinghamster
1 replies
1d4h

Saw my Comcast was down also (SW Chicago suburbs), about 3AM CST, but it was back before 4AM. I figured it was a neighborhood outage when rebooting the cable modem did nothing, and then it came back without any fanfare.

My T-Mobile phone hasn't had any problems, knock on wood.

n0ot
0 replies
1d3h

I'm also in the southwest suburbs, and have Comcast. My internet went out for about ten minutes at around 9:20 CST. I disabled WIFI on my iPhone only to discover I was still offline, and had no signal. Not sure whether the Comcast outage was related, but it came back up very quickly, whereas I still have no cell service (AT&T). My wife does have service, even though we both have iPhone 15 Pros.

IAmGraydon
0 replies
1d4h

According to downdetector, 3AM was when this all started.

mulippy
3 replies
1d3h

This is affecting roughly 0.02% of the US population; is it really that bad?

bitmasher9
1 replies
1d3h

What percentage of the population would need to be impacted before a tech failure is relevant on a technical news site?

k_roy
0 replies
1d2h

Seven. Exactly 7%

syntaxing
0 replies
1d3h

Assuming you’re using the 74K reported number by Downdetector, that’s a self reporting system. The real amount of people effected is probably magnitudes beyond that.

joshstrange
3 replies
1d6h

Seeing "SOS" only on iPhone currently. I got worried something had gone wrong with auto-bill pay since I only noticed after I was driving.

It's interesting how naked I feel without access to the internet. I reach for it way more often than I would have ever guessed, something you only notice when it's not there. Last March my area saw large wind storms that knocked out power for almost a week (I'm not in a rural area). I can work around the loss of power but the cell tower(s) that service my area could not handle the load and/or the signal in my house was weak and I was unable to load anything. Not having internet was way worse than not having power and I ended up driving a few hours away to my parent's house instead of staying home.

jhickok
0 replies
1d

I started driving across the US at 3am, didn't notice for the first few minutes until I tried pulling up the address in Apple Maps. Sure was strange following interstate signs for ~10 hours!

flerchin
0 replies
1d5h

Yes I've felt the same way. I feel like we have an instinctual need for social connection that we've filled with internet. Luckily, we do still have meat-space friends and family.

bombcar
0 replies
1d5h

My earliest computers were amazingly capable and powerful devices, I could do anything I could think of and spend hours and hours on them.

Now my computer is insanely more powerful but without an Internet connection it feels dead and useless.

wingspar
2 replies
1d4h

I’m on Cricket, which is a prepaid cell plan ‘subsidiary’ of AT&T, and calls/internet are work fine.

So not a network outage, but account related?

jtbayly
0 replies
1d4h

That would make sense, except that wifi calling is still working.

jrochkind1
0 replies
1d3h

My Cricket phone is not fine. Android phone, giving me regularly repeated alert messages "No voice service. Temporarily turned off by your carrier."

I think your experience is just that it's not 100% outages, some people still have service.

teleforce
1 replies
19h38m

I think this is the main reason developers should always be designing their software with local-first (aka offline-first) approach in-spite of the Internet and the cloud [1].

Regarding the cellular based authentication, it is perfectly doable to be securely authenticated even without the any connectivity and there is a solution to this based on combination of MFA and OTP [2].

[1] Martin Kleppmann talk on local-first (LoFi):

https://news.ycombinator.com/item?id=39444519

[2] A lightweight and secure online/offline cross-domain authentication scheme for VANET systems in Industrial IoT:

https://peerj.com/articles/cs-714/

mucle6
0 replies
18h41m

I think you're right, and I'd like to amend the "should always" to "after product market fit". I'm always trying to stop myself from prematurely optimizing

talldatethrow
1 replies
1d1h

Did anyone have a phone fail during this outage?

My Pixel 4XL was working at 2am as I placed it on the charger (even though it had about 80%). Noticed zero bars. Shrugged and went to sleep.

Woke up at 9am and the phone is totally non operational.

Yes I know sometimes phones just break, but less rarely while not moving, not wet, not dropped, or due to a charging issue.

What a coincidence.

stevewodil
0 replies
14h19m

Android has historically done poorly in areas with no signal with regards to battery life. Did it come back after charging it?

stuart73547373
1 replies
1d3h

probably unrelated but a nurse friend mentioned that a healthcare system got hacked last night (didn't have further details)

organsnyder
0 replies
1d2h

Is that nurse friend senior leadership in their organization? Otherwise, it's extremely unlikely that they would have any knowledge of an event like this so soon after it occurred. And the people who do have this knowledge (should) know not to talk about it until after an investigation has been conducted.

Many systems issues are mistakenly thought by non-technical users to be "hacking".

nkotov
1 replies
1d7h

SOS Only on AT&T down here in Charlotte

nathanyz
0 replies
1d7h

Same in Florida

kraig911
1 replies
1d4h

My military friends said to buy gas today... Another friend said solar flares. I can't tell if it's a foreign actor or nature.

throwawaaarrgh
0 replies
1d3h

Occam's razor says it's none of those things. The simplest explanation is somebody just fat-fingered a change, which happened to coincide with a lot of other things to result in a big outage. Happens all the time.

flerchin
1 replies
1d5h

I wonder how much an outage like this would end up costing them. Maybe it would even be net positive as some of their operating costs like data transfer would be down.

neom
0 replies
1d5h

A similar scale outage in Canada ended up costing the company around $3.80 per subscriber ($170MM CAD). Coincidently, I read that's about the same amount that was calculated as impact to the Canadian economy ($175MM).

https://www.catchpoint.com/blog/rogers-outage

betaby
1 replies
1d4h

Are there any technical details?

fuzzfactor
0 replies
1d2h

AT&T mobile 5G hotspot lost internet a few hours ago.

Resets futile.

Shows "Mobile Broadband Disconnected"

However can still ping google, with excellent under 10ms response.

a_band
1 replies
1d4h

AT&T seems down in Eastern TN too. All phones in SOS mode.

niblettc
0 replies
1d4h

Hey neighbor! SOS mode here in north AL too.

8b16380d
1 replies
1d2h

Back up for me (Raleigh, NC)

macshome
0 replies
1d2h

It just came back for me as well in Winston-Salem, NC

xyst
0 replies
1d2h

ATT used to be the premium cellular provider in the US. Guess they got too big and neglected their ops side.

ww520
0 replies
1d1h

Does this affect voice only? Or affect both voice and cellular data?

w10-1
0 replies
23h31m

Likely unrelated: I can report that last week my phone number was somehow assigned to two devices for ~3 days.

I use a carrier that leases from a major (Verizon). There were clearly disconnects between the systems that even tier-3 support expected to automatically resolve overnight (i.e., be eventually consistent). Still, after 2 such overnight waiting periods failed, I got someone to spec a fix and see it through over the course of 3 hours. They were clearly surprised that both their system and the major system were not reporting correctly, and the solution came only after making changes while ignoring status.

I got a sickening feeling that enterprise software layers mated with eventually-consistent web persistence across two outsourcing organizations to produce a situation where truly no one understood what was happening -- or could even figure it out. This resolve only when some youngish person had the guts to just make changes that should work -- so we're back to the days of heroes.

vaxman
0 replies
1d3h

https://www.usatoday.com/story/news/politics/2024/01/31/fbi-...

Note: This site does not like to entertain contentious topics and has rate limited me over my prior post about factory and foundry installed malware in Chinese manufactured equipment because I did not respond to a suspicious demand for proof, instead opting for a sarcastic reply. I can live without being associated with Garry Tan's company, YCombinator. (I thought to post this link while posting to another site, not because I hangout here.)

tonetegeatinst
0 replies
1d

Wondering if we ever get technical write-ups from these type of outages?

Is this only effecting non priority accounts?

tomcam
0 replies
1d8h

I barely noticed because my AT&T service is so bad on a normal day (not being facetious).

tmaly
0 replies
1d4h

I wonder if this is a bad software update to the cell towers?

theturtle32
0 replies
1d2h

Here in Long Beach, CA, my husband and I both had our AT&T phones go into SOS mode at about 1:20am local time, and it resolved for both of us about an hour later, and is still resolved.

thekrendal
0 replies
1d5h

Checking in from the St. Louis, MO area, the outage is in effect here. Two lines on separate accounts are down.

Thankfully, WiFi calling seems to be functioning.

I can't wait to see a postmortem on this outage.

that_lurker
0 replies
1d5h

”with some of those impacted unable to reach 911 emergency services.”

How can one carrier going down affect your ability to make emergency calls?

testfrequency
0 replies
1d6h

Take this with a very fine grain of salt, but I read on a forum elsewhere that it was Cisco infrastructure related.

Do I believe that? No clue. I believe it more than people speculating the timing corresponds with the solar flare or nation state taking it out.

sivm
0 replies
1d5h

My wife has a Samsung Z fold 3 and she has service. I have an iPhone with eSIM in the same house and do not.

seatac76
0 replies
1d4h

Verizon is working just fine.

rglover
0 replies
1d5h

Middle Tennessee only getting SOS on iPhone (AT&T).

nwsm
0 replies
1d6h

I have no ATT cell service in Boston. I'm on an iPhone 14

msrenee
0 replies
1d4h

Eastern Nebraska here. Woke up to no connection at 4am. It was restored by 8 am. Now at 10, it's out again.

Edit: And back again a few minutes later.

mikeweiss
0 replies
17h13m

15 hours later and the title of this post still stays Verizon and T-Mobile when it was only AT&T

midwestfounder
0 replies
1d6h

I wonder if this is a cyber incident. Curious if any telecom folks know what the most likely explanation for an event like this would be, and what telltale signs/symptoms might first indicate this was caused by something nefarious.

kylecazar
0 replies
1d6h

My brother sitting next to me with an AT&T Pixel 7 has no service.

Me with a Pixel 7 Pro (dev preview android) on the same plan: not affected.

Strange.

kkdeligate
0 replies
1d2h

I live in Houston TX and have 4 lines on my account. 2 of them don’t have data yet including myself. My att account don’t show our numbers or the status of the lines. I think the outage started about 2am while I was listening to a podcast. Calls and text on WiFi works though.

kickofline
0 replies
1d5h

DebtDeflation: Verizon and T-Mo both issued statements that they have no outages and the issue is just their customers being unable to call AT&T customers. Looks like most of the AT&T network in the US is down though.
kennethrc
0 replies
1d5h

Phoenix area, sending this from my 5G NRSA-enabled router on AT&T

jsjohnst
0 replies
1d5h

Have AT&T cell signal in Central Jersey, but calls fail and no useable internet data connectivity unless using wifi calling.

joshu
0 replies
23h53m

Someone I know installed an iOS update at the same time; when the phone came back up, it was dead. What a frustrating coincidence?

jakedata
0 replies
1d2h

Service resumed in the Boston area about 15 minutes ago. No muss no fuss, just magically resumed like nothing ever happened. What they share of the post-mortem should be interesting.

isaacdl
0 replies
1d5h

Downdetector's front page[0] seems to indicate issues with other carriers as well - Verizon and TMobile. There are are several other providers showing on the front page, but I think a lot of them are MVNOs on the big three networks.

[0] https://downdetector.com/

htalat
0 replies
1d5h

a family member in Pittsburgh, PA does not have service.

grungydan
0 replies
1d5h

Isn't it wonderful that we allow almost entirely unregulated monopolies that have no actual obligation to provide the service for which they take your money?

Telecoms, airlines, insurance scam..ahem companies...

Must be nice. I wonder if I could just stop showing up to my job and keep collecting my check. No? Why not? We allow the entities that write our laws and have politicians rubber stamp them to do it. Seems a mite unfair.

foobarian
0 replies
21h11m

AT&T, T-Mobile and Verizon users

... isn't that pretty much everybody? Not sure if I know of any other actual plant owners

flerchin
0 replies
1d5h

I wonder if an outage like this costs them any money at all. They probably save on data transfer and other operating costs.

doctator
0 replies
1d2h

back up in NY

dmane11
0 replies
1d5h

No service on my iPhone 15 pro att e-sim here in Chicago

defly
0 replies
1d1h

JFYI in December largest Ukrainian operator Kyivstar was down for days after sophisticated Russian hacker attack

cranberryturkey
0 replies
1d6h

Att down I. California

commandlinefan
0 replies
1d5h

What's weird is that I have two phones with AT&T service (one provided by work, one my own). I have service on the work phone, but not my personal phone.

cholmon
0 replies
1d7h

Another reason why SMS-based 2FA is a bad idea.

chasd00
0 replies
1d

my wife's phone wasn't working for half the day but mine was. Same provider and same location (i'm assuming being in the same house meant we probably were connecting to the same tower). Browsing around new sites, i think even solar flares are in the realm of possibility.

browningstreet
0 replies
1d5h

Reno — AT&T cell and fiber are working normally

bonyt
0 replies
1d5h

Working here, in a subway station in NYC.

bochoh
0 replies
1d5h

Checking in from the Northeast (north of Albany, NY) and can confirm no cell service at all this morning (ATT)

blantonl
0 replies
1d6h

I can confirm that San Antonio, Dallas, and Austin are all down. My dual SIM iPhone shows 0 bars for AT&T and Google Fi (T-Mobile) is fine. My wife's phone just shows SOS.

arprocter
0 replies
1d6h

NYC ATT Android SMS 'unable to send message'

Strangely sending to that phone from an iPhone looks like it sent, but nothing was received

Calling from Android to iPhone gives 'your phone is not registered', iPhone to Android plays a 'call could not be completed'

iMessage between iPhones seems okay

alphabettsy
0 replies
1d6h

Can confirm. AT&T cellular is completely dead for me. Other carriers are working though.

alberth
0 replies
1d2h

I'm on AT&T and impacted (no cell services since I woke up this morning).

What's also odd is I'm not even able to log into my ATT.com account.

When I try to log in, it states "want to pay your bill".

So my immediate concern was, did my credit card expire and my service was turned off.

I'm still hoping that's not the case, and it's just that I'm impacted by this outage (and not something else).

SirMaster
0 replies
1d5h

I'm trying to decide if I am surprised or not that something like this relies on a single central point of failure. (well I assume it's a single point of failure given how this appears to have happened nation wide all at once)

You would think that a nation wide cell service would be more distributed and that it wouldn't all go down at once like this.

But maybe there are some good enough reasons that there needs to be some central components or systems that make it all work.

Miner49er
0 replies
1d5h

Interestingly, I'm on an MVNO that uses AT&T and have no issues.

LorenDB
0 replies
1d7h

My Verizon hotspot slowed to a crawl yesterday. Now it makes sense.

JohnMakin
0 replies
1d3h

Was out since about midnight pacific time til ~7am. I just used my ATT network to respond to a MFA prompt and the device thought I was ~120 miles from where I currently am.

Decentino
0 replies
19h40m

While it’s true that our phones have essentially become an extension of our being today, you can still recognize when someone is being excessively protective about theirs. If your wife never leaves her phone unattended, doesn’t allow you to use it, and always places it screen-down, you can be sure it holds proof of her transgression. Her actions reflect the characteristic behavior of a cheating wife. The question now is, how to catch a cheating wife? If only you could get your hands on her phone, you’d have all the proof you need to know whether your suspicions were unfounded or spot on. To catch a cheating wife using her cell phone, be prepared to sacrifice some sleep. Once you’re sure she has dozed off, retrieve her phone and unlock it using her fingerprint on the touch id (or whatever biometric it uses). That is if you don’t know her passcode. Then, hire Tomcyberghost @g mail.com to hack the target's cell phone information without physical access

BuckYeah
0 replies
1d4h

Our corporate att phones are all down in multiple large campuses. About 10,000 phones total