return to table of content

GCP Incidents

Animats
41 replies
3d9h

As someone who's into virtual worlds, and a user of Second Life, it's impressive to see how well those systems stay up. There hasn't been a total outage of Second Life in 5-10 years. Once Amazon's networking went down in a way that prevented new logins for a whole day, but existing logins remained. The 3D world, which has a lot of stuff going on even with no users around, continued to work. This is an extremely complex one of a kind system, and it just keeps cranking along. It's very distributed; one region (a 256x256m square) can crash and restart without taking down its neighbors. Users see the failed region as a square hole in the ground filled with water until the server restarts, which takes about two minutes. So outages are quite graceful. It's currently hosted on AWS, but it doesn't have to be.

What fails? The associated webcrap. The Marketplace, which is just a catalog and shopping cart. The forum system, which is outsourced to Invision, seems to fail several times a month. The messaging system, which is just a lightweight social network. The billing system. The outgoing payments system. Amazon's outgoing HTTPS proxy. All of those have failed several times in the last year. Even the JIRA system conked out once.

The quality of web software is underwhelming.

Hammershaft
30 replies
3d6h

As someone who has moved from native app dev to web dev I just feel my productivity and satisfaction has plummeted. The stack (html, js, css, browser functionality) that makes up the web is just not fit for the purpose of rich client applications.

The development of the thick leaky abstractions that make up web frameworks have consumed millenia of human man hours, and yet the experience of developing & using the resulting web applications is still trash.

curtisblaine
17 replies
3d6h

To be fair, you probably should expect to be less proficient and satisfied embracing a stack that you weren't using before.

hnlmorg
14 replies
3d6h

I’m not the GP but have been building websites off and on since the 1994.

Back in the 90s it was hacky, but the extent of those hacks were small enough that even hobbyists could memorise those edge cases. Whereas these days there are so many footguns and edge cases that even seasoned professionals find it impossible to memorise everything. The amount of trial and error it takes to build a modern site is immense. With native applications it is a lot easier to write unit tests to catch this stuff but how do you unit a CSS property alignment across 3 browser engines? The answer is almost always to hire QA experts because setting up automated tests has become so complicated and burdensome that it’s now a profession in its own right.

It doesn’t help that “web” is really a plethora of different technologies: CSS, HTML, JS, HTTP, TLS, and image libraries like SVG, JPEG, PNG and GIF. And a few of those have entire subcategories of crap to wade through, like cross site origin headers, subtle incompatibilities in ECMAScript, support of different web extensions, different HTTP protocols, CSS incompatibilities, differing viewport sizes, TLS ciphers, etc.

And that’s just user facing code. What about backend? SQL or No-SQL? Table indexes, web server solutions, load balancing, caching, CDNs, backend server side code, how you cluster your web services, where you host it…etc.

Then you have to secure the damn thing because it’s open to the public.

And all this just to produce interactive documents!!

But the worst thing of all is once you’ve finally wrapped your beard around everything, it’s all out of date again because the latest batch of 20-somethings got so fed up learning this shit as well, that they’ve just gone and rewritten half of the stack from scratch again.

It’s all massively over engineered because we keep on trying to polish this turd rather than accepting we have outgrown the web spec and need something more tailored.

And it’s only getting worse. With more and more software switching to Electron and a growing trend of native applications embracing WASM for portable modules, we keep doubling down on this turd — acting like if we spend enough time polishing it then one day it might turn to gold.

So as someone who’s multi-disciplined, I too find web development the least satisfying of all the software development domains I’ve worked on. It’s down right frustrating at times.

imiric
13 replies
3d4h

Back in the 90s it was hacky, but the extent of those hacks were small enough that even hobbyists could memorise those edge cases.

You're either misremembering, or have some thick rose-tinted glasses on.

The late 90s were times of the Wild West Web. Every browser had custom behaviour and rendered pages differently. JavaScript and CSS were still new, and it took years for the implementations to be standardized. Websites had "Best viewed in Netscape" or "Built for IE" badges. Java applets, ActiveX, Flash, and a bunch of other technologies surfaced to make early versions of web "apps" actually possible. Even as late as 2005, projects like jQuery were needed to make web development easier.

I'd say that it wasn't until the late aughts, well after "web 2.0", that web development didn't feel hacky. By then, though, the JS ecosystem exploded, and frontend frameworks took off.

So I get the sentiment that modern web development is difficult, bloated and buggy, but the good news is that the web is fully mature now, and there are plenty of projects that make it possible to build simple and robust web sites. Web developers are spoiled with choice, so a hard task is simply selecting the right technologies to use. Unfortunately, sometimes this is out of our hands, or we inherit decade-old codebases that make maintenance a nightmare. But it's never been easier to write plain HTML/CSS/JS that will work across browsers, and deploy it as a static site that works reliably at massive scale. If you need more features, then complexity can creep in, but it is possible to minimize failure points with good engineering.

from-nibly
7 replies
3d3h

Agreed. The thing that has changed about the web is the expectations have increased. There is no way that people could make the websites users expect today with only technology from the 90s.

civilitty
5 replies
3d2h

The expectations of apps have skyrocketed too which is why everyone is using Electron to cope. If GUI development were still as easy as the drag and drop builders in Delphi 7 and VB6 with a bunch of static 800x600 monitors and double-click to edit component code-behind, we’d be shoehorning fewer apps into HTML and making more desktop apps.

I remember how big of a pain in the ass Windows distribution (alone) was in the 2000s with entire businesses like InstallShield built around it. Tack compatibility testing on top of that and a desktop app for a single OS because a massive effort. With the web you could just upload PHP and off to the race. That slowly evolved into what we had today via jquery and electron and friends but what won was distribution and ease of development, despite all the handwringing people do about frontend. The grass isn’t greener on the other side and hasn’t been for decades.

It’s not even remotely competitive anymore. My favorite example is GoldenLayout: it takes me less than a few hours to read the docs and implement a full blown tabbed and split screen interface using a third party library and combine it with any framework I want like React, Svelte, or just vanilla JS. Each desktop framework have their own solutions to the problem usually in the form of commercial components but even in the best case, they take lots more time to wire together.

hnlmorg
4 replies
3d2h

I was writing resizeable desktop applications in the 90s and it was easy then too. Differing screen resolutions is not a recent thing.

The reason for Electron's popularity is because:

1. web developers are cheaper and easier to hire than people with desktop software development experience

2. Electron is cross platform

Point 2 is a compelling reason. But it's not like other cross platform toolkits haven't existed. They've just been left to languish because of point 1.

If companies only hire web developers then of course everyone is going to invest time into writing Electron apps and banging on about the benefits of web applications. It becomes a self-fulfilling prophesy.

I say this with experience having watched the tides turn, first hand, as a software developer.

civilitty
3 replies
3d2h

I was there too with a Sony GDM-FW900 monitor with 2300x1400 resolution and my experience was very negative. Both as a user and as a developer (though I was just a kid getting started in ‘99). Any software that wasn’t made by a major company was a crapshoot and Delphi layout was brittle for my own apps. Anything that got deployed to university computers with their bulk purchased monitors was mostly fine except for the deployment process.

There are more web developers because it was a lot easier, even before npm existed and frontend frameworks went out of control. That’d didn’t happen in a vacuum - Electron was really late on the scene as far as frontend tech goes. It was originally made to support an IDE and not regular business apps and still won.

hnlmorg
2 replies
3d

I was there too with a Sony GDM-FW900 monitor

I’m talking about the early to mid 90s of web development and you reply with a monitor released around 2000. That’s a completely different era.

with 2300x1400 resolution and my experience was very negative.

That’s more an issue with OSs not doing scaling back then than it is a problem with desktop software.

You’d still have the same issue today if you run a super high resolution with zero scaling.

There are more web developers because it was a lot easier

It’s has a lower barrier for entry but I wouldn’t call it easier.

even before […] frontend frameworks went out of control.

So you’re basically agreeing with me then. The current state of affairs is out of control.

Electron was really late on the scene as far as frontend tech goes. It was originally made to support an IDE and not regular business apps and still won.

Its origins doesn’t disprove my point. Lots of things start out as one thing and evolve into something else. That isn’t unique to the web nor Electron.

civilitty
1 replies
2d23h

I thought we were talking about "resizeable desktop applications in the 90s"? The reason I bring up that monitor is that I had the misfortune of using dozens of desktop applications written in the 90s to control lab equipment and spent copious amounts of time manually tiling them in all kinds of positions and sizes on a high resolution monitor. Scaling was definitely not the issue thanks to great eyesight. I was lucky if they supported a non 4:3 aspect ratio. Anything that wasn't a graphical or CAD or other app that did its own rendering or was developed by a large dev team was a crapshoot.

Lots of absolutely positioned buttons clipped by a resize, toolbars that didn't hide their buttons behind a menu when the window was too small, uncollapsible panels with minimum widths that exceeded the main content, and so on. Most of them were about as resizable as a water balloon when you smash it on the ground.

hnlmorg
0 replies
2d17h

Lab equipment software has hardly been the pinnacle of desktop software. Even native desktop evangelists moan about the quality of some of that software. So I wouldn't use that as your benchmark given they're already known to be generally below par. In fact I'd go further than that and say lab software is notorious for having terrible UX.

I can assure you that plenty of good software did exist. I know this because I wrote some of it ;)

hnlmorg
0 replies
3d2h

There is no way that people could make the websites users expect today with only technology from the 90s.

That's my point though. The web should never have been bastardised into making applications. We should have seen the desires of web2.0 and built an entirely new software stack purpose built for online applications.

It wasn't fit for applications then and it still isn't now. It's just we keep trying to convince ourselves it is because we're too locked into the web now to ever move past it.

hnlmorg
4 replies
3d2h

You're either misremembering, or have some thick rose-tinted glasses on.

The late 90s were times of the Wild West Web.

Or we are talking about different eras. I'm on about early 90s.

Every browser had custom behaviour and rendered pages differently.

They did. But there was less of spec to have to concern yourself with. The biggest day to day differences was around frames (which was a pain in the arse) and tables (which was easy to work around).

JavaScript and CSS were still new, and it took years for the implementations to be standardized.

You didn't need Javascript most of the time and CSS incompatibilities were easy to remember (I'm talking about cognitive overhead here)

Even as late as 2005, projects like jQuery were needed to make web development easier.

That's in the region of 10 years after when I'm talking about. A completely different era. By that point the web had already turned into the shitshow it is now.

I'd say that it wasn't until the late aughts, well after "web 2.0", that web development didn't feel hacky.

I couldn't disagree more. By that point people had Stockholm syndromed themselves into believing web technologies were actually good. But it wasn't.

So I get the sentiment that modern web development is difficult, bloated and buggy

Its not a sentiment. It's a fact

the good news is that the web is fully mature now,

No its not. We're just Stockholm syndromed into thinking its mature. But half the technologies we use are constantly churning. That's not the definition of mature.

Web developers are spoiled with choice, so a hard task is simply selecting the right technologies to use.

The hard part is finding something that will still be around in 5 years time.

or we inherit decade-old codebases that make maintenance a nightmare.

I've worked on plenty of decade-old codebases and I'd never rank web development in there precisely because of the aforementioned churn. Web tech goes out of date so quickly that it never gets to live past a decade...never mind multiple decades. It's simply not a mature platform to write software on despite what you claim.

But it's never been easier to write plain HTML/CSS/JS that will work across browser

Who writes plain HTML and JS? There's so much bloat required to get anything to look modern that nobody writes plain web sites any longer (In fact I did for one of my open source projects and people hated it and rewrote it in Vue).

It was much easier to write plain HTML et al in the 90s. In fact that was exactly how web development back then was done.

and deploy it as a static site that works reliably at massive scale

That's literally how sites were originally written. It's not a new invention ;) The web was intended to be a static collection of documents. What we've since done is tried to turn it into an application framework. And that's what it sucks at.

If you need more features, then complexity can creep in, but it is possible to minimize failure points with good engineering.

Sure, but again this isn't a new invention. This was the case right from the inception of the web. It's just gotten a hell of a lot harder to do good engineering for the web.

imiric
3 replies
2d21h

Or we are talking about different eras. I'm on about early 90s.

I think you're misremembering then. There _was_ no web development to speak of in the early 90s. The web was largely a niche technology until the mid-90s. Mosaic released in January '93, Netscape in October '94, and IE in August '95. By the end of '93, there were a total of 130 websites[1], most of them from universities and research centers. By the end of '94, a whopping 2,278 websites. JavaScript first appeared in September '95 (Netscape), and CSS in August '96 (IE).

You didn't need Javascript most of the time and CSS incompatibilities were easy to remember (I'm talking about cognitive overhead here)

Depending on what you're building, you still don't need JS most of the time today. The difference is that today all browser implementations are ECMAScript compliant, and the core functionality is much more capable than in the 90s, so you can get by with just sprinkling JS where and when you need it, without resorting to frameworks, build tools, libraries, and any of the complexities commonly associated with modern frontend web development. This is without a doubt, an objectively better state than what we had in the 90s.

Of course, actually relying on external dependencies would make your life easier, so the difficult task is picking the right technology to use from a sea of poorly built and maintained software. This is the drawback of a platform exploding in popularity, but it doesn't say anything about the web itself.

As for CSS, how can you honestly say incompatibilities were easy to remember? Netscape was initially pushing for its own competing styling format, JSSS[2], and it didn't officially support CSS until version 4.0 (June '97). Even then, not all CSS properties were supported[3]. So it wasn't even a matter of remembering incompatibilities; developers literally had to target specific browsers, and even specific versions of browsers. Vendor prefixes were required for pretty much everything, and are still used today, though thankfully, core CSS features are widely supported, and they're only needed for advanced features. There's no way that all of these incompatibilities were easier to deal with in the 90s.

That's in the region of 10 years after when I'm talking about. A completely different era. By that point the web had already turned into the shitshow it is now.

jQuery appeared precisely as a response to the lackluster state of JS in browsers, and to make development easier by not worrying about browser incompatibilities. My point is that up until then, web development _was_ a shitshow.

Its not a sentiment. It's a fact

Funny how I can disagree with a "fact" then...

The hard part is finding something that will still be around in 5 years time.

It's really not, unless you're chasing the latest hype train. jQuery is 17, React is 10, Vue is 9, etc. And like I said, you don't strictly need any of it. If you write standards-compliant HTML/CSS/JS, it will serve you for decades to come with minimum maintenance. You've been able to do the same since arguably the late 2000s.

Who writes plain HTML and JS?

Many people do.

There's so much bloat required to get anything to look modern that nobody writes plain web sites any longer

That is factually not true.

That's literally how sites were originally written. It's not a new invention

I'm not saying it is. My point is that you can still do that today.

[1]: https://en.wikipedia.org/wiki/List_of_websites_founded_befor...

[2]: https://en.wikipedia.org/wiki/JavaScript_Style_Sheets

[3]: https://en.wikipedia.org/wiki/CSS#Difficulty_with_adoption

hnlmorg
2 replies
2d17h

I think you're misremembering then. There _was_ no web development to speak of in the early 90s. The web was largely a niche technology until the mid-90s. Mosaic released in January '93, Netscape in October '94, and IE in August '95. By the end of '93, there were a total of 130 websites[1], most of them from universities and research centers. By the end of '94, a whopping 2,278 websites. JavaScript first appeared in September '95 (Netscape), and CSS in August '96 (IE).

My first website went public in 1994. Before then I was writing stuff purely for a private intranet. So I'm definitely not misremembering.

By 1995 I had released an online RPG (it was very rudimentary but it worked).

By around 1997 (give or take, this was a hobby project so cannot remember the exact year) I had a full 3D web site available via VRML. Wasn't much of a success because most people didn't have 3D capable graphics cards back then. I think it was a couple of years before 3D accelerators became the norm.

1998 I was experimenting with streaming HTML chatrooms (that required a lot of hacks to get working because we are talking pre-AJAX here) and bots written in Perl.

For most of the 90s I was on the cutting edge of web technologies. So I remember the era well.

This is without a doubt, an objectively better state than what we had in the 90s

Is it though? Better capabilities doesn't always equate to something being objectively better. Particularly if those capabilities are a complete clusterfuck to code for, as the current web standards are.

True elegance of an ecosystem isn't about raw capabilities, else we'd still be writing everything in assembly. Its about the ease of which it is to accomplish a task. I'd argue that the current web isn't elegant in the slightest. A polished turd is still a turd.

Of course, actually relying on external dependencies would make your life easier, so the difficult task is picking the right technology to use from a sea of poorly built and maintained software. This is the drawback of a platform exploding in popularity, but it doesn't say anything about the web itself.

The problem isn't the choice. The problem is that "the right technology to use" is more about what's in vogue at the moment than it is about that's mature.

When you look at other popular technologies, you still have choice but there's also mature stacks to choose from. The moment anything web related becomes "mature" (and I used this term loosely here) the next generation of developers invent something new.

jQuery appeared precisely as a response to the lackluster state of JS in browsers, and to make development easier by not worrying about browser incompatibilities. My point is that up until then, web development _was_ a shitshow.

It was. And it's a bigger shitshow now. Glad you finally understand the point I'm making.

Funny how I can disagree with a "fact" then...

That doesn't mean I'm wrong ;)

It's really not, unless you're chasing the latest hype train. jQuery is 17, React is 10, Vue is 9, etc. And like I said, you don't strictly need any of it. If you write standards-compliant HTML/CSS/JS, it will serve you for decades to come with minimum maintenance. You've been able to do the same since arguably the late 2000s.

jQuery isn't recommended any more. React isn't popular any more. Vue is probably the only item there that has merit and that's still less than a decade old.

You talk about "decades" and cannot list a single framework that is still in widespread use and more than 10 years old.

Many people do.

Many people also solder their own CPUs. But that doesn't mean anyone does it for stuff that actually matters.

That is factually not true.

Yes it is. Simply saying it isn't doesn't disprove my point.

I'm not saying it is. My point is that you can still do that today.

And you can still hand solder your own CPU today. But that doesn't many anyone does that for professional sites.

The only reason people stick up for the current status quo is because they either don't know any better or Stockholm syndromed into living with the status quo.

As someone who's written software in more than a dozen different languages for well over 3 decades, every time I come back to writing websites I always feel disappointed that this is what we've decided to standardise on. You're points that its capable aren't wrong. But that doesn't mean it's not still a shitshow. Raw capability alone simply isn't good enough -- else we'd still be writing all of our software in assembly.

imiric
1 replies
2d13h

My first website went public in 1994.

So yours was one of the first 2,278 websites? Congrats.

I don't see how any of your accomplishments are relevant, but thanks for sharing.

So your point is that the web when JavaScript and CSS were in their infancy, before web standards existed and were widely adopted, before AJAX and when you had to use "a lot of hacks" to implement streaming... that _that_ web was somehow easier to work with than the modern web? That sounds delusional.

VRML, along with Java applets, ActiveX, Flash, and a myriad other technologies around that time were decidedly not web-native (i.e. a W3C standard, implemented by all browsers). They only existed because the primitive state of the early web was incapable of delivering advanced interactive UIs, so there were competing proposals from all sides. Nowadays all of these technologies are dead, replaced by native web alternatives.

Better capabilities doesn't always equate to something being objectively better. Particularly if those capabilities are a complete clusterfuck to code for, as the current web standards are.

Which particular standards are you referring to? Native HTML5/CSS3/ES2015+ are stable and well supported standards, and you've been able to target them for nearly a decade now. Their capabilities are obviously much greater compared to the early web, but this is what happens when platforms evolve. If you dislike using them, then I can't convince you otherwise, but I'm arguing against your point that the state of the web was somehow better in the 90s.

The problem isn't the choice. The problem is that "the right technology to use" is more about what's in vogue at the moment than it is about that's mature.

That's a problem caused by the surrounding ecosystem, not the web. How is this different from VRML being replaced by X3D in 3 years? The good thing is that today you can safely rely on native web technologies without fearing that they'll disappear in a few years. (For the most part. Standards still evolve, but once they're widely adopted by browsers, backwards compatibility is kept for a long time. E.g. HTML4/CSS2/ES5 are still supported.)

If you're talking about frontend frameworks and libraries, again: they're not a standard part of the web, and you don't have to use them. If you do, it's on you to manage whatever complexity and difficulty they bring to your workflow.

True elegance of an ecosystem isn't about raw capabilities, else we'd still be writing everything in assembly. Its about the ease of which it is to accomplish a task.

I fail to see how all the improvements of the past 20 years made things more difficult. The capabilities have evolved because user expectations have grown, and complexity arises from that. But if you were to build the same web sites you were building in the 90s with modern technologies, like your streaming HTML chatrooms site, you would find the experience vastly easier and more productive. This is an objective improvement.

jQuery isn't recommended any more.

Because it's not needed anymore, because JS has evolved leaps and bounds since 2006, and implementations in all browsers are standardized. It's still the most popular JS library by far, and used by 77.3% of all websites[1].

React isn't popular any more.

It's in the top 10 most popular JS libraries. And how come you're judging based on popularity anyhow? Above you were criticizing choosing technologies based on what's "in vogue at the moment" over "what's mature". React is a _mature_ UI library, and is a safe choice in 2023, unless you're chasing the latest hype train.

You talk about "decades" and cannot list a single framework that is still in widespread use and more than 10 years old.

JavaScript frameworks as a concept are barely a decade old. React isn't a framework, it's a library. Like I said, jQuery is the most popular library and is 17 years old. Underscore (2009), Bootstrap (2011), Lodash (2012), and many more, are still in widespread use today.

But my point is that _today_ you don't strictly need any of them to build advanced interactive experiences. If you do want to, though, there are many to choose from that simplify development of modern UIs, without being a "clusterfuck" to work with IME. htmx, Lit and Tailwind are all lightweight, well maintained, and help with quickly iterating without resorting to full-blown frameworks. If you do want a framework, Svelte is now 7 years old, so quite mature, and is very pleasant to use.

Yes it is. Simply saying it isn't doesn't disprove my point.

I thought the fact that you're reading and typing this on a forum built with simple HTML, CSS and minimal amounts of JS would make this self-evident. (The fact it uses a bespoke backend is irrelevant; this could just as well be served by a mainstream backend stack.)

But to save you a web search, here are other examples courtesy of ChatGPT[2].

As someone who's written software in more than a dozen different languages for well over 3 decades, every time I come back to writing websites I always feel disappointed that this is what we've decided to standardise on.

Nice humblebrag again, but if you'd be willing to accept that the web has grown exponentially since the days you were building websites before JavaScript and CSS existed, that there are orders of magnitude more web developers and software now than back then, and that the core web technologies are more mature and stable than they've ever been, then you'd be able to see that the status quo is not so bad.

I have more issues with the modern state of centralized mega-corporations and advertising ruining the web than anything I can complain about the technology itself. But that's a separate topic.

[1]: https://w3techs.com/technologies/overview/javascript_library

[2]: https://chat.openai.com/share/6b15c659-e47c-4b64-aace-f8d6e9...

hnlmorg
0 replies
2d6h

So your point is that the web when JavaScript and CSS were in their infancy, before web standards existed and were widely adopted, before AJAX and when you had to use "a lot of hacks" to implement streaming... that _that_ web was somehow easier to work with than the modern web? That sounds delusional.

My point was that the amount of hacks required these days has grown exponentially.

VRML, along with Java applets, ActiveX, Flash, and a myriad other technologies around that time were decidedly not web-native

Ofcourse they weren't. I never implied otherwise.

Nowadays all of these technologies are dead, replaced by native web alternatives.

Indeed. Technologies that are exponentially harder to write the same code in. Hence my point: modern web tech is a shitshow.

Which particular standards are you referring to? Native HTML5/CSS3/ES2015+ are stable and well supported standards, and you've been able to target them for nearly a decade now. Their capabilities are obviously much greater compared to the early web, but this is what happens when platforms evolve. If you dislike using them, then I can't convince you otherwise, but I'm arguing against your point that the state of the web was somehow better in the 90s.

You're fixated on that point and it's not what I said. I said it was easier to grok in the 90s and has just gotten worse over time. Which is a fact.

I also said the current web is an unfit clusterfuck that people are Stockholm syndromed into believing is good. Everything you've posted thus far reinforces that Stockholm syndrome point.

> React isn't popular any more.

It's in the top 10 most popular JS libraries. And how come you're judging based on popularity anyhow? Above you were criticizing choosing technologies based on what's "in vogue at the moment" over "what's mature". React is a _mature_ UI library, and is a safe choice in 2023, unless you're chasing the latest hype train.

I haven't worked with a single engineer, how hasn't bitched and moaned about React. And I've managed a lot of engineering teams over the years.

Vue is a different matter.

JavaScript frameworks as a concept are barely a decade old. React isn't a framework, it's a library.

It's both. The term "framework" has an pretty meaning in software and React falls under that heading quite comfortably. What's happened, and why you're confused, is that kids have overloaded the term with "web framework" to mean something more specific. React on its own isn't a "web framework" in the trendy web sense but it's still 100% a "framework" in the stricter software development sense.

This is actually another great example of the lack of consistency in the web ecosystem.

That all said, React can certainly fall under the "web framework" umbrella. Wikipedia when used in real world systems. Hence why wikipedia lists it: https://en.wikipedia.org/wiki/Comparison_of_JavaScript-based...

But my point is that _today_ you don't strictly need any of them to build advanced interactive experiences.

You never had to. You're making another strawman argument because you're not only claiming I'm saying you need these frameworks (you don't) but also making it sound like this is something that's only come about because of the modern web (which isn't true).

I thought the fact that you're reading and typing this on a forum built with simple HTML, CSS and minimal amounts of JS would make this self-evident. (The fact it uses a bespoke backend is irrelevant; this could just as well be served by a mainstream backend stack.)

HN is far from your typical website. lol

Nice humblebrag again, but if you'd be willing to accept that the web has grown exponentially since the days you were building websites before JavaScript and CSS existed, that there are orders of magnitude more web developers and software now than back then, and that the core web technologies are more mature and stable than they've ever been, then you'd be able to see that the status quo is not so bad.

It's not a "humblebrag", it's an illustration that my opinion comes from years of experience using a multitude of different technologies. Honestly, I think you need to diversify your experience too because your comments fall firmly into the Stockholm syndrome bracket I described by the fact that seem completely unwilling to accept that we could have all the same power of the current web but massively more simplified and elegant if we were to redesign things from the ground up. There are so many footguns that developers need to learn simply because of the way how the web has evolved. And all you keep harping on about is that "its powerful" -- sure. But so is assembly. Yet literally no-one advocates writing commercial desktop software in assembly.

The problem here is trying to convince someone that the domain which they earn their living from is a shitshow, is simply always going to be met with opposition because you have no impartiality. Whereas people like myself and the OP do. And that's why we make the comments we do when we say that the web is unsatisfying to develop against.

tormeh
1 replies
3d5h

Writing GUIs can be easy. It’s just hard on the web. Some complexity is unavoidable, but the web stack is so broken it makes things 5x harder than it needs to be.

Timon3
0 replies
3d2h

Do you have concrete examples that are much harder in the web than in native? When I started developing, I very much liked the web technologies for not being in my way - you have to learn their pitfalls and so on, but once you're up and running, you can do anything you want. Compared to that, every time I touch a native toolkit feels like an absolute nightmare the second you leave the trodden path. Even things like "changing the color of specific parts of a control" are often times either completely impossible without fully re-implementing it, or lead to bugs that make you wish you didn't try.

ForHackernews
9 replies
3d5h

I hear this over and over, but what is the alternative cross-platform stack for building rich local GUI apps? Qt?

BrutalCoding
5 replies
3d4h

I’d say Flutter, a GUI framework backed by Google and it’s open source.

I’ve recently ported over a popular project called “llama.cpp” to Dart (language behind Flutter) and I’ve recently made YT video’s showing it running natively on macOS, Linux, Android, iOS, iPadOS and next up is Windows.

The official Ubuntu installer is made with Flutter too nowadays. But to be fair, last time I tried QT was somewhere in 2018, it might be a good option too.

chrismorgan
3 replies
3d4h

But please don’t make web apps with Flutter. It has determinedly taken the pure-canvas route which makes it very unpleasant to use for a significant fraction of users, and all kinds of things just don’t work the right way and can’t. (I’ve written about the problems here on HN quite a few times, search and you’ll find them.)

m_sahaf
2 replies
3d3h

For web, Flutter has 2 renderers and offers 3 options:

- html: Uses HTML elements and CSS

- canvaskit: which, as you mentioned, uses canvas to own the full drawing process

- auto: defaults to canvaskit on web but html on mobile.

Source: https://docs.flutter.dev/platform-integration/web/renderers

chrismorgan
1 replies
3d1h

Curious, I haven’t heard of the HTML renderer. Is it any good?

m_sahaf
0 replies
3d

I don't know what metric to use of "good", but the throwaway app I've used it for worked flawlessly with both, so good enough for me ¯\_(ツ)_/¯

markdog12
0 replies
3d2h

Is the port open-source? I'd like to try it, if so.

vintagedave
1 replies
3d5h

Or Delphi.

speed_spread
0 replies
3d4h

These days I'd suggest Lazarus. Delphi is pricing itself out of the market.

wg0
0 replies
3d3h

I see Electron, Tauri/Capacitor and friends as a viable route.

Second choice in terms of DX is Flutter. There's nothing like that out there in terms of DX although there are tons of other issues notably around performance.

Third would be Qt but that's not a viable tool because of licensing.

The first two options have no such licensing issues.

kmlx
0 replies
3d5h

The stack (html, js, css, browser functionality) that makes up the web is just not fit for the purpose of rich client applications.

you get much better mileage from web pages than rich client apps. one thing i’ve learned and applied everywhere i worked is to let the web be the web, don’t force things that come from other platforms.

ben_w
0 replies
2d20h

As someone still in native, I got a similar vibe when switching from UIKit to SwiftUI. While this happened simultaneously with a lot of other changes (including using VIPER instead of MVC/MVVM), I'm also trying to use SwiftUI for one of my personal projects and find it disappointingly more difficult than the time I learned C++ by diving into someone else's badly written project.

Conversely, another side project is a JS game done in an old-school OO pattern and vanilla (no libraries no frameworks), and it's easy.

I want to like automagical solutions that the new frameworks keep touting, but everywhere I'm just seeing things that make it harder to work out what's going on. Half the stuff on the web should be a thin bit of pure HTML/CSS/image data with no JS at all, build server-side, and where the interactions are done. Like, the HN comment form I'm in right now is:

<form class="itemform" action="/xedit" method="post">…<input type="submit" value="update">

etc. — it still works even when I disable JS.

phartenfeller
5 replies
3d7h

The quality of web software is underwhelming

I hate where "web scale" has brought us. Because some 0.01% of giants have tremendous scalability problems, every small project needs an overly complicated architecture that consists of layers of services. In the end, nobody understands the monster, and the complexity brings more issues than it solves. But still, this is somehow a standard today.

I have a lot of love for static HTML sites or simple backend/frontend solutions. The web is great, but current development trends are not.

havkom
4 replies
3d5h

Yep. If you keep off the “latest trends frameworks” and just keep it as vanilla and simple as possible web development can be productive, scalable and pleasant.

fallingknife
3 replies
3d4h

HN does that, and yet has outages and is so non performant that it can't handle when a post gets a couple thousand comments.

richardw
0 replies
2d20h

Good enough has crushed many pretenders over the decades. People would love to spin some JS microservice wizardry and lift the HN user base. Why it’s very hard if not impossible is worth multiple aha moments when considering “sustainable advantage”.

mrweasel
0 replies
3d3h

Not really, HN uses Postgresql, which should be fine, but it's also written in a pretty niche language. I think the argument in that an implementation in C# or Java, which is as boring as it gets in the web world, would fair much better.

JCharante
0 replies
3d2h

Well HN is probably in the top 1% of forums

chii
1 replies
3d4h

What fails? The associated webcrap.

my suspicion is that the threshold level of a developer capable of doing the type of development that is required of second life (a graphics client, a networked server etc) using their tech stack (presumably something C++) is higher than web.

The inherent crappiness of the web is not due to the tech. It's due to the low barrier of entry.

davedx
0 replies
3d4h

Hi, gamedev who moved to webdev here.

Think this example is probably more from budget than dev skills. It’s perfectly possible to build reliable billing webapps, and there are 100k experienced devs who could do it.

Second Life obviously prioritised their core biz and product, which is the right thing to do.

supriyo-biswas
0 replies
3d8h

Amazon's outgoing HTTPS proxy.

Is this ELB or Cloudfront we’re talking about?

dustingetz
0 replies
3d1h

conways law affects the most those production support systems that everybody needs but nobody wants to pay for. Any boss will put their best talent on the critical revenue driving systems (e.g. adtech which is bounded by backend scalability). UI hasn’t really found a modern economic model yet (post Windows era), there are not research orgs dedicated to UI like there are infrastructure, databases, OS, ML etc

esafak
13 replies
3d9h

When they say moving are off Google Cloud services to bare metal, where do they plan to move?

bbarnett
4 replies
3d9h

My response to this, is that there are endless ways, and places to do this.

There are far more colos, people that will rent you a rack, and bandwidth, than VPS types. And you can rent servers too, instead of buying your own.

Colo is literally 10000x cheaper than many AWS deployments. I've seen million dollar bills drop to tens of thousands per year.

And of course, you can always deploy in house, in your own server room.

dilyevsky
2 replies
3d8h

I’ve done some modeling and for our high cpu, high egress, medium storage multi-million $ a year deploy it was 70-90% lower cost than cloud when you factor in amortized cost of boxes, remote hands, transit etc. Pretty substantial but not 10000x ;)

iot_devs
1 replies
3d8h

Was this comparing AWS on demand pricing or the 3 years plan?

dilyevsky
0 replies
3d8h

3 year plan. Basically for colos you also get it cheaper if you sign for 3y so that’s more apples to apples. Equipment “annual” cost was calculated over 5-7 year lifetime - I didnt go as far as calculating how much you could recover if you pawned it after 3 years…

bagels
0 replies
3d9h

Sure, colo can be cheaper. What kind of infrastructure was this? Or was this just bandwidth bills?

tehlike
1 replies
3d8h

My guess is hetzner.

philipswood
0 replies
3d

Hetzner is awesome, but last time I checked it's consumer-grade hardware.

supriyo-biswas
1 replies
3d8h

Many data centers provide colo/hardware renting facilities, such as Equinix, Coresite, Digital Realty etc. (Even AWS got started off those, though they mostly build their own data centers now.)

londons_explore
0 replies
2d10h

When small companies get big, there must sometimes be legacy compute jobs still left on the original infrastructure right? Ie. Jeff's original php script to trigger some automation that never got adopted by any team.

Do all the big tech companies still have a box somewhere full of legacy stuff that 'probably isn't important, but not worth turning off just incase it is'?

ghusto
1 replies
3d3h

A data centre or (less likely) their own office. This was the way things were done not that long ago ;)

asylteltine
0 replies
3d2h

And they are terrible for ops and security. People also used incandescent lights not that long ago

ur-whale
0 replies
3d7h

where do they plan to move?

Basement of their office?

We reached the same conclusion they did a while back and went back to good-old self-hosted.

Reliability has been as good as cloud and TCO is divided by a factor of 10.

dharmab
0 replies
2d14h

Any business can rent space in a colo pretty easily. The constraint is mostly hiring engineers with experience racking and stacking boxes, and willing to drive to the colo when on call.

vel0city
10 replies
3d2h

It's hilarious people are bashing GCP for having one compute instance go down and the author acknowledges it's a rare event. On AWS I've got instances getting forced stopped or even straight disappearing all the time. 99.95% durability vs 99.999% is way different.

If they had the same architecture on AWS it would go down all the time IME. AWS primitives are way less reliable than GCP, according to AWS' docs and my own experiences.

NineStarPoint
3 replies
2d15h

This is very different from my experience. In my years with AWS I’ve only had an instance get stopped once for a reason that was weird AWS background stuff that had nothing to do with my application. I don’t think I’ve ever had or even heard of an instance just disappearing.

vel0city
1 replies
2d15h

By "disappear" I mean the instance failed hard and couldn't be restarted. It's just gone. Usually related to the EBS volume dying.

But yeah, usually when they die they can just be relaunched. Still they die way more often on AWS than in GCP, and will just end up staying stopped. Until very recently they couldn't even migrate the instances when the underlying hardware had some maintenance, you had to stop and relaunch it on your own. FFS most decent hypervisors have had live migrations for decades and yet I still get notifications of "this instance will stop on x day..." emails. I should never see that. The cloud provider should keep the instance running forever. There's no excuse.

yolovoe
0 replies
1d18h

I don’t know why you’re getting downvotes. What you’re saying sounds true to me, and I work in the core of EC2.

I am guessing you’re using newer instance types if their reliability is still questionable. Or you have a huge fleet of instances so you see a steady rate of failures every year.

Our failure rate on the commonly used instance types if fairly low. We have several types of failures and in some bad failure cases, live migration isn’t possible and your instance won’t even be restarted.

AWS already asks people to expect failures and plan around this with multi AZ deployments.

If you want stability, sign an NDA with AWS and ask for fleet wide reliability metrics for various instance types. There’s a surprisingly huge variance.

berniedurfee
0 replies
1d23h

Same. 12+ years of using AWS and there’s been 1 instance of a server (RDS) going down due to something outside of our control.

Restoring a snapshot got us back running quickly. If we were multi-az, we probably wouldn’t have noticed.

deanCommie
1 replies
2d17h

EC2 [0] and GCP Compute [1] have the exact same SLAs, which is 99.99%, dipping below which gets you a 10% refund. Dipping below 95% gets you a 100% refund.

[0] https://aws.amazon.com/compute/sla/

[1] https://cloud.google.com/compute/sla

vel0city
0 replies
2d15h

By the links you shared instance level SLA on AWS is 99.5%. GCP instance level is 99.99%. That's not the same.

For each individual Amazon EC2 instance (“Single EC2 Instance”), AWS will use commercially reasonable efforts to make the Single EC2 Instance available with an Instance-Level Uptime Percentage of at least 99.5%

The underlying storage isn't the same as well, and that matters more. EBS is 99.95% durable. Even standard zonal PD's on GCP are >99.99%, balanced are >99.999%, SSDs are >99.9999%.

Even if it was 99.99% (it's not on AWS) what's the point of having your instance be 99.99% if the underlying disks might disappear? That's something I've seen happen multiple times on AWS, never once on GCP.

belter
1 replies
2d6h

In general in Cloud and as somebody said, you should Architect assuming everything fails all the time.

vel0city
0 replies
2d2h

So why not have EC2s have a 50% SLA. Have them all force quit at some random interval between 2 hours and 200 hours, guaranteed. Have EBS volumes just corrupt your data every week. Why bother with SLAs at all when the solution is buy more redundant resources?

Or how about having actually reliable primitives?

I don't disagree, if you need extreme reliability build your infra to handle multi-az, even multi-region outages. But sometimes I'd rather just have an instance just stay online instead of having to pay for it three times over and still have it reasonably be expected to not corrupt itself. Hypervisor and storage technology could make that happen, as it's true on other clouds and has been true in the data center for decades.

I can have an instance on GCP with it's block storage having 99.9999% durability. I can't do the same with gp3 on AWS without having to deal with the complexity of clustering and all it's headaches and costs, the volume has a durability of 99.95%. Why is that an unreasonable ask?

Wuzado
1 replies
3d

The article doesn't seem to mention AWS, really. I also feel like the primary issue is the lack of communication and support, even for a large corporate partner.

Seems like they're moving to bare-metal, which has an obvious benefit of being able to tell your on-call engineer to fix the issue or die trying.

vel0city
0 replies
2d23h

But in this case the answer from AWS would have been that's their SLA and you need to just be ready to handle an instance getting messed up from time to time, because it's guaranteed to happen.

davidgerard
9 replies
3d4h

HOW TO CHOOSE A CLOUD PROVIDER

* AWS: you will pay to have stuff work properly and you like having customer service

* Azure: you hate yourself, you're running Windows or both

* Google: you're cheap enough that basic functionality is an optional extra

* Oracle: lol

* Hetzner: cheap, good service, the finest pets in the world, no cattle

nijave
3 replies
3d3h

Azure

Or you have big enterprise customers that have a grudge against Amazon and Google and refuse to use anything else.

twisteriffic
2 replies
3d2h

Azure is great if you stick to the three golden oldies: DTU-based SQL, app service and service bus. Maybe table storage if you're feeling lucky. Anything else leads to pain and $$$, because it's likely that no one at Microsoft is being forced to use it.

jwnin
1 replies
3d1h

vCore SQL is solid and predictable. Azure's vm offering is also highly reliable, and they host more linux workloads than Windows now.

nijave
0 replies
1d4h

Azure's vm offering is also highly reliable

Not in my experience. Running just under 100 VMs, they'd randomly fail and restart about once a year. One month, something went terribly wrong with our k8s cluster and nodes were becoming unhealthy and being replaced every few hours to the tune of 500 replacements in a month for a 60 node cluster.

Premium SSD v2 is fairly good released generally about a year ago. Premium SSD was pretty painfully slow.

diamondfist25
3 replies
3d3h

What about DO?

monlockandkey
2 replies
3d3h

You should always reach out to use Digital Ocean, Linode, Vultr as your starting point. Aws and the gang are mega mega expensive compared to what you pay for a vps. If you require services beyond compute, database and storage, then use the big names. Otherwise save yourself headache with complexity, unpredictable and absurd costs.

Please don't use AWS especially as a startup, you are going to kill yourself paying for compute, database and egress that is multiples times what you get from a vps. AWs is NOT cheap

davidgerard
1 replies
3d

my personal site is on Hetzner, fwiw. they are extremely good IME

monlockandkey
0 replies
2d20h

Yes Hetzner is a excellent choice

candiddevmike
0 replies
3d2h

* Hetzner: cheap, good service, the finest pets in the world, no cattle

You can absolutely do cattle with Hetzner. They support imaging and immutable infrastructure. They don't have a native auto scaling equivalent, but if you're using Kubernetes, they have a cluster autoscaler: https://github.com/kubernetes/autoscaler/blob/master/cluster...

Hetzner Cloud's biggest gap for me right now is secure VMs: they don't support encrypted disks, UEFI, secure boot, or TPMs (and certainly nothing like AMD SEV). You can get a fat bare metal Ryzen box through Robot and DIY though.

wg0
8 replies
3d7h

In 2022, we experienced continual networking blips from Google’s cloud products. After escalating to Google on multiple occasions, we got frustrated. So we built our own networking stack — a resilient eBPF/IPv6 Wireguard network that now powers all our deployments. Suddenly, no more networking issues.

My understanding is that the network is a VLAN programed via switches for VMs so when you create VPC, you're creating a VLAN probably.

So how can an overlay (UDP/Wire guard) be more reliable if the underlaying network isn't stable?

PS: Had even 1/10th of issues have happened on AWS with such a customer, their army of solution architects would be camping in conference rooms every other week reviewing architecture, taking support engineers on call and what not.

devsda
6 replies
3d6h

My guess is that whatever clever network optimizations that Google has are probably interfering with their traffic.

By building their own network stack, they are skipping them and also wireguard might be better equipped to dealt with occasional faults as it built on udp which is inherently unreliable.

Bluecobra
5 replies
3d2h

I have a direct cross connection to Google in a colocation facility (aka Dedicated Interconnect). One issue I found is that Google would randomly shuffle around their BGP routers which would cause BGP to flap and briefly losing all connectivity. When I raised this issue with support their answer was that this is expected behavior and we need to purchase a redundant connection. Mind you this isn’t cheap, we’re talking around $2,000 per month for a 10G connection when you add up all the GCP/colo fees.

It’s pretty laughable that they can’t preserve TCP connections when they migrate their cloud routers around. I have had BGP uptimes on direct cross connects for over a year with other vendors on bare metal.

betaby
2 replies
3d

Google doesn't do any 'black magic' even though they do presentations and publish papers. Their edge infra is very boring, and yes, they shuffle edge a log and their edge routers like everybody's' else - off the shelf Juniper/Cisco without any tcp session preservation.

londons_explore
1 replies
2d10h

Migrating an in-use TCP session from one host to another is far from easy.

I did it for a project and the number of corner cases is insane - both in the TCP protocol (what of the window has holes in? What if the connection is half closed?), but also in the OS's handling of the TCP state and interactions with userspace (will be correctly wake up a process poll()'ing a socket if we migrate the socket after a packet is received but before the kernel wakes the poll()ING process?)

Bluecobra
0 replies
2d1h

I’m not saying it’s easy, but a company like Google should have no problem implementing this. If you ever used Vmotion it does a good job of migrating a live VM to another physical host. Also enterprise firewalls have no problem moving TCP/NAT state from an active to passive firewall.

bushbaba
1 replies
3d1h

It’s about scale. Google was built for an order of magnitude greater scale where such reliability of a single link would be cost prohibitive.

However in general if you need high uptime, you’ll need multiple peering links. AWS and azure also recommend the same.

lokar
0 replies
3d

It’s more that a single link/router/host/vm/switch/etc can never be reliable enough, so don’t waste time and money chasing that. Build your software to tolerate it. This approach is pervasive throughout all of Googles systems.

readams
0 replies
3d6h

It's not creating a VLAN by programming switches. It's all done in an overlay network.

This is out of date but gives you the idea https://www.usenix.org/conference/nsdi18/presentation/dalton

363082a9-58a7
7 replies
3d9h

I've had an experience with GCP that involved a very enterprise-y feature breaking in a way that clearly showed the feature never worked properly up until this point (aside from causing downtime when they tried to quietly fix it). GCP reps proceeded to remind everyone in the call in which they were supposed to explain what happened they were under NDA, because admitting to the above would've been a nightmare for regulated industries.

HenryBemis
6 replies
3d8h

I always wonder whether an NDA can prevent you speaking/whistleblow to a Regulator, Police, DA, or some (truly) state authority.

I would like to assume, 'no, you can always report a crime'.

dragoncrab
4 replies
3d7h

Not in the US or EU. However, they can still sue you in the US and due to the broken legal system there, it will cost you decent money even if they are bound to lose from day 1.

wkat4242
3 replies
3d6h

In Holland too. Even if someone sues you maliciously, you still have to pay the state a fee to be heard. Otherwise the judgement will fall to the enemy party by default.

You can recuperate this cost from them when you win but you're still out of pocket for your time and the money until you manage to cash it from them which can be hard. And they can keep doing it. The system is very unfairly biased in favour of people with lots of money

gtirloni
2 replies
3d5h

This is pretty absurd to me.

I'm not a lawyer so don't quote me on this, but here in Brazil, it seems the losing side has to pay the winning side's lawyers and other expenses. It's not clear cut how much will be paid and there are different rates but the message seems to be clear: don't sue willy-nilly because there will be consequences, so make sure you have a case.

wkat4242
0 replies
3d4h

You can get it back from them, the judge would normally assign all costs to the losing party. However actually getting it can be difficult as the court doesn't bother to help with that.

iudqnolq
0 replies
3d4h

Imagine you have an 80% chance of winning but Google will spend millions of dollars defending against your lawsuit. Is suing them worth the risk of a judgement that will ruin your life in perpetuity?

That's why in the US the losing side only has to pay if they committed some kind of misconduct.

There's a fair argument for both systems though, I just want to point out the American system isn't clearly absurd.

Edit: To a certain extent you can control your own costs. You might find a lawyer willing to donate their time, or you might find a scrappy team willing to take on Google for less money. But Google will always hire a top firm with high billing.

supriyo-biswas
0 replies
3d8h

Company lawyers like flexing their muscles regardless of the actual legality of such agreements or clauses.

As an example, in India, non-competes are outright illegal since the Indian Contracts Act directly states that any clauses in a contract restraining a lawful profession or trade will be disregarded, and yet most companies out there will add a non-compete clause.

doubloon
6 replies
3d5h

"reasons why Oxide has a business #12390"

asylteltine
4 replies
3d2h

An oxide rack has a minimum cost of something like 600k not including all the infra you need to run a rack, maintenance, and then needing to upgrade

mst
2 replies
3d1h

Railway's bill was into the multiple millions per year at the very least so that doesn't necessarily rule it out.

asylteltine
1 replies
3d

That’s one misconception about leaving cloud people think it’s a one time cost compared to opex, but in reality you are just moving the spending. Devops for your now custom on prem workflows, toil due to inferior tooling compared to cloud, physical costs like electricity and cooling, space for the racks, physical security, high availability, etc I mean there’s so much downside.

mst
0 replies
3d

I said "doesn't necessarily rule it out" rather than a stronger claim advisedly.

You're entirely correct that there are an unfortunate number of people who hold that misconception, but (a) I'm not one of them (b) that wasn't my point.

mbStavola
0 replies
3d

In the post they say they pay Google "multiple millions" of dollars already. Depending on their needs, the TCO of Oxide racks may end up being less than what they pay GCP.

latchkey
0 replies
2d21h

That's just moving the goal posts around.

Kwpolska
6 replies
3d6h

You should've migrated many months ago, if a cloud provider forces you to build your own networking or registry, you shouldn't use that cloud provider.

supriyo-biswas
2 replies
3d5h

Well for folks building out cloud infrastructure, building your own networking stack and registry is a good way to achieve platform independence, without which you'll be left at a disadvantage and vulnerable to the whims of cloud providers who may or may not extend volume discounts, thus indirectly harming your ability to compete.

wmf
0 replies
3d

An over-overlay is almost never the right solution. If you want platform-independent networking you should use an API shim layer that configures the underlying VPCs the way you want.

chrisandchris
0 replies
3d3h

Doesn't the requirement of "building your own networking stack" outcompete with "just colocate"?

I would assume that the skills required for building your own network ist just as high as hosting your own hardware. But I don't have any data point, so just a wild guess.

politelemon
2 replies
3d6h

That was the first thing that struck me, the 'workarounds' stagger belief, but they seem to be casually dropped in (?).

If I were in a situation where my company was contemplating implementing building our own registry/network stack, then the benefits of using a cloud provider are gone, and I would have considered moving to another provider... not saying "I can fix him". This feels like a sunken cost perhaps that is the right term.

mrj
1 replies
3d2h

I would bet that they were thinking about colocating and would need to have secured intra-service communication anyhow. In Google this is transparent but at a (probably yet identified facility) it'd be up to them to provide.

justjake
0 replies
2d13h

(Blogpost Author) Yup. We've been thinking about colocation for a while, so we've just been building these up. Basically all that's left is to make our volume storage bulletproof. We'll do that as we're moving stateless workloads to bare metal early next year, and ideally be off GCP EOY latest

simo7
5 replies
3d7h

Interesting, I’m starting to think undocumented thresholds are quite common in GCP.

I experienced something similar with Clod Run: inexplicable scaling events based on CPU utilization and concurrent requests (the two metrics that regulate scaling according to their docs).

After a lot of back and forth with their (premium) support it turns out there are additional criteria, smthg related to request duration, but of course nobody was able to explain in details.

politelemon
3 replies
3d6h

Unnanounced changes too, there was a Firefox outage in 2022 due to GCP:

https://hacks.mozilla.org/2022/02/retrospective-and-technica...

merb
2 replies
3d4h

sorry but the blame here was 100% on Mozilla. No matter which http version, headers should always be treated as case-insensitive. Blanking anything on google here is just stupid. The problem was nih-syndrome and ignored the http spec.

mst
1 replies
3d1h

Mozilla are entirely clear that this was their bug.

However, GCP changing the default under their infrastructure without prior warning was still unacceptable.

Operations work should (IMO must) be conducted with the expectation that any major change like that will expose existing bugs in deployed code.

(I've done enough ops work in my life that I'd love to say 'will potentially expose' but in practice there's always -something- that breaks and if I don't find it in the first 24h after a major change I'm going to spend the next two weeks waiting for the shoe drop to happen)

merb
0 replies
2d9h

GCP does send mails when you abo‘d them. GCP is not to blame if they used auto. Heck if your loadbalancer sends you headers lowercase with a new http version it should not result in a bug. GCP‘s change was fine. Their software had a bug that would‘ve led to request smuggling.

klon
0 replies
3d2h

Yes, we have also experienced undocumented limits for Cloud Run. For us it was an obscure quota for max network packages per second per instance. Really infuriating and took 6 months to track down what it was. I think it has been documented here now: https://cloud.google.com/run/quotas#cloud_run_bandwidth_limi...

supermatt
4 replies
3d7h

No doubt all cloud providers have their problems.

For my day job, over the last 2 years we have discovered and reported multiple issues with Keyspaces, Amazon Aurora, and App Runner. In all cases these issues have resulted in performance degradation, and AWS support wasting our time sending us chasing our tails. After many weeks of escalation, we eventually ended up with project leads who confirmed the issues (some of which they were already aware of, yet the support teams had wasted our time anyway!) and (some of them) have since been resolved.

We are stuck with Keyspaces for the time being, but now refuse to use any non core services (EC2, EBS, S3). As soon as you venture away from those there be dragons.

wavemode
3 replies
3d2h

Oh, for goddamn sure. Half the services on AWS, probably, are very poorly designed or very poorly run (or both). CloudWatch stands out to me as one that is mind-bogglingly buggy and slow. To the point of basically being a "newbie trap" - when I see companies using it for all their logging, I assume it's due to inexperience with the many alternatives.

At least the compute services are reliable.

yibers
2 replies
3d

I actually use cloudwatch quite a lot. I didn't notice many bugs or slowness, but I assume I am missing something. Can you perhaps point to some specific issues you had with Cloudwatch?

wavemode
0 replies
2d23h

The user interface is unintuitive, text search is slow, querying is even slower, refining the metric graphs to a time period is really annoying, the graph controls are consistently buggy in my experience... those are just off the top of my head

rubiquity
0 replies
3d

$$$ and a strange API. The internal metrics service that Amazon has used for ages works much better for power users. CW is slowly becoming like it.

rurban
4 replies
3d5h

In our experience, Google isn’t the place for reliable cloud compute, and it’s sure as heck not the place for reliable customer support.

Always was, always will be. For them customers are always the last

asylteltine
3 replies
3d2h

GCP is the ugly stepchild of Google. They prioritize their own infra which unlike Aws doesn’t even run on gcp! It’s a joke. They don’t dogfood anything. All Google infra runs on separate systems (both) or dedicated deployment like their own spanner clusters. Google employees look down on gcp employees like second class citizens

mst
2 replies
3d1h

They prioritize their own infra which unlike Aws doesn’t even run on gcp!

I don't believe that to be the case, last I heard a year or two back the vast majority of it -does- run on a google GCP tenant account and what didn't was largely at least in the process of migration planning.

(my source here is "pillow talk with a senior GCP engineer" and I don't believe she had any reason to lie to me)

asylteltine
0 replies
3d

That’s not been my experience having done consulting for Google. They may have some stuff on gcp but they don’t dog food much like Aws does (literally running Amazon on top of vanilla dynamo and kinesis). Google has a lot of custom infra

anonacct37
0 replies
2d10h

Sorry, but that's not true.

I wish it was. But running anything that bridges google3 and GCP is a nightmare. Outside of acquisitions and OSSish stuff like chrome, it's really rare to see GCP used. Oddly enough their corp eng team does quite a bit with making GCP accessible to the rest of the company.

Source: former Google SRE who actually worked on one of those teams.

ghusto
4 replies
3d4h

I know AWS isn't cool or sexy, but shit works.

motoboi
3 replies
3d3h

It’s sad to see people rediscovering that GCP is not a serious product over and over again.

markbnj
1 replies
3d3h

What's your personal experience with it? We've been on the platform for almost eight years. Three clusters, hundreds of compute VMs, four or five public and private DNS zones, 10+ cloud sql dbs, about the same number of memorystore and firebase instances. We just don't see these issues as related in the OP, and when we do have problems support has been fast and helpful. Not to gainsay their experience, since it was obviously frustrating, but truthfully you can find similar stories about all compute providers.

motoboi
0 replies
2d16h

Not everyone get cancer from cigarettes you know? See the avalanche of horror stories. I had mine.

acdha
0 replies
3d1h

That’s what I was thinking. Adopting GCP after years on AWS was eye-opening: I’d previously had a good impression but it was a constant cycle of “oh, we don’t have that - build your own” and issues which have been open for years full of customers asking and GCP PMs stalling.

Then the price increases started, making it even harder to defend paying more for less.

ransom1538
2 replies
3d2h

"On December 1st, at 8:52am PST, a box dropped offline; inaccessible. And then, instead of automatically coming back after failover — it didn’t. Our primary on-call engineer was alerted for this and dug in. While digging in, another box fell offline and didn’t come back"

This makes no sense. A machine restarted and you had catastrophic failure? VMs reboot time to time. But if you design your setup to completely destroy itself in this scenario, I don't think you will like a move to AWS, or god forbid, your own colo.

wavemode
1 replies
3d2h

Read the article more carefully. The article (the text you quoted, even) clearly states that the machine didn't "restart". It crashed and didn't come back online.

And nowhere in the article do they state that this was a "catastrophic failure" - Railway itself didn't go down entirely. But Railway is a deployment company, so they are re-selling these compute resources to their customers to deploy applications. So when one of those VMs goes down and doesn't automatically failover, that's downtime for the specific customer who was running their service on that machine.

As they state:

During manual failover of these machines, there was a 10 minute per host downtime. However, as many people are running multi-service workloads, this downtime can be multiplied many times as boxes subsequently went offline.

For all of our users, we’re deeply sorry.
xyzzy_plugh
0 replies
3d

TFA is a bit too light on details. Boxes due, it's a fact of life. I don't really follow what "didn't come back online" is supposed to mean. Nodes aren't 100% durable. A lot depends on your particular configuration.

In any case, there's no world where all VM failures trigger automatic reboots. Expecting that to be the case just makes no sense. Automatically failing over should be handled at another layer, for which there a many possibilities.

Manually restoring nodes sounds like a "pets, not cattle" problem.

Long ago, we used to run into this on AWS all the time before we started automatically aging them out.

hermitcrab
2 replies
3d8h

We are a small software company (2 people) and we've also had plenty of issues with Google over the years. Mostly related to Google Adwords. For example:

https://successfulsoftware.net/2015/03/04/google-bans-hyperl...

https://successfulsoftware.net/2016/12/05/google-cpa-bidding...

https://successfulsoftware.net/2020/08/21/google-ads-can-cha...

https://successfulsoftware.net/2021/05/04/wtf-google-ads/

If Google have no interest in providing decent support to the author of the original article, who are paying megabucks to Google, what hope do small businesses like mine have?

biorach
1 replies
3d4h

Google have no interest in providing support
hermitcrab
0 replies
3d1h

No, they want to do things on a massive scale. Which means it is really difficult to talk to a human for support. And even if you manage it, it might be some badly trained subcontractor. But somehow there are always humans available to ring you up and tell you how you can spend more on Google Adwords.

tlogan
1 replies
3d1h

All these cloud service providers have bugs and issues.

But the problem with Google is that their support seems somehow disconnected from the real world. There is support, and they do respond to chats, calls, or emails. However, it often feels like I'm talking to someone who doesn't genuinely care about my concerns or do understand what I’m talking about.

Good support is hard to come by and hard to implement. So I really don't know what is missing in Google's support that exists in AWS support. Maybe because AWS support staff are trained to first put themselves in the customer's shoes and understand the problem from my perspective.

M_bara
0 replies
2d22h

The mantra at aws is customer experience. Any time there’s an outage or impact, the first numbers to be stated are customer impact related. In fact, as an engineer you might be empowered enough to recommend a refund for a customer (even though the customer may be at fault) and the refund will go through.

Full disclosure: worked for aws devops for a couple of years

testernews
1 replies
2d15h

“ We paid them multiple millions of dollars per year”

Never heard of railway but paying this many $$$ per year should give you a dedicated support rep. But google doesn’t do support for anything lol

londons_explore
0 replies
2d10h

Oh - they give you a support rep. Just the support rep is powerless to do anything.

tedd4u
1 replies
2d22h

It sounds like if you deploy on Railway they don't automatically handle a box dying (e.g. with K8s or other) -- "half the company was called in to go through runbooks." When they move to their own hardware, how will they handle that?

londons_explore
0 replies
2d10h

GCP is pretty reliable - for a smallish deployment you could probably go a couple of years before seeing a machine die.

So they probably never built in health checks and auto fail over.

nomilk
1 replies
3d5h

In our experience, Google isn’t the place for reliable cloud compute

In the early days of cloud computing unreliability was understandable, but for Google to be frustrating its large customers in 2023 is a pretty bad look.

Curious to know if others have had similar experiences, or if the author was simply unlucky?

whirlwin
0 replies
3d5h

I don't know how it happened, but I used GKE for a side project, which was overkill for such a small project, and I could live with $100 /month, but the bill kept creeping up to $300 and later $400 with no apparent explanation or workload increase. I had no choice but to revert to something else. Ended up with good old Heroku with $20/month and never regretted it

kgeist
1 replies
3d5h

We have automated systems in place to detect and resolve this. We’re notified in Discord

Isn't Discord hosted on GCP, too? If it goes down, monitoring also goes down?

justjake
0 replies
2d13h

(Blogpost Author). We use Discord to notify. Our monitoring runs directly to PagerDuty for anything we actually need to action on.

StopHammoTime
1 replies
3d6h

I have a lot of interaction with Google Cloud Support, mostly around their managed services. I am genuinely not over-impressed with their service, considering with similar employers of size on AWS the support experience was always wonderful.

However, I will say if you are on Google Cloud and you have a positive interaction, make a big deal about someone helping you. Given the rarity it occurs, it’s not a big deal to really go out of your way to reward someone with some emphatic positive feedback. I’ve had four genuinely fantastic experiences and there’s always a message to a TAM that flows soon after. I hope more people like those I interacted with get rewarded and promoted.

latchkey
0 replies
2d21h

However, I will say if you are on Google Cloud and you have a positive interaction, make a big deal about someone helping you.

This. These sorts of discussions are like bike shedding over vi/emacs.

Only the complaints make it to the front page on HN. I've been using GCP off and on for projects for a decade now. Built multiple very successful businesses on it. Sure it hasn't been all perfect, but I'm an overall happy camper.

Having also used AWS heavily when I was on the team building the original hosted version of Cloud Foundry, I'd never go back to them again. It was endless drama.

strstr
0 replies
3d7h

Sounds like a genuinely frustrating experience.

Bit confused about why nested virt has anything to do with their problems given that they aren’t using virt inside the VMs. Softlocks are a generic indication of a lack of forward progress.

Same confusion with the MMIO instructions comment. If that’s about instruction emulation, not sure why it matters where it happens? It’s both slow and bound for userspace anyway. If it’s supposed to be fast it should basically never be exiting the guest, let alone be emulated.

Sounds like the author is a bit frustrated and (understandably) grasping at whatever straws they can for that most recent incident.

niuzeta
0 replies
3d

I wonder how many of these stories it would take before it starts affecting Google's bottom line. I've tinkered with GCP on small side projects, sure - but after exposure of these stories for over a decade in HN, I can never recommend GCP as a serious cloud alternative. I can't imagine I'm the only one in this boat.

lawgimenez
0 replies
3d5h

If you go to Google’s issue tracker, you will find a lot of issues that were ignored. For example, this [0]issue that caused our ANR rate to dip.

[0] https://issuetracker.google.com/issues/230950647

fidotron
0 replies
3d5h

Maybe it is me but this doesn’t exactly reflect well on anyone. Isn’t the value prop of railway not having to worry about things like this? It doesn’t matter what the problem is - you shouldn’t be passing such problems on to customers at all.

I have worked on a product that caused such a spike on Google App Engine that within 20 minutes of it going public Google were on the phone explaining their pagers all went off, and in that case resolved to temporarily bump the quota up for 48 hours while a mutual workaround was implemented. The state of Google Cloud today seems just another classic case of the trend of blaming the customer.

asylteltine
0 replies
3d2h

I work at a company that spends billions on AWS and we intentionally have minimal gcp deployments and ban compute there because of how unreliable gcp is and how awful (outsourced) their support is. Gcp has excellent products but garbage operations. Who is running that clown show? It could have easily been the #2 cloud if they knew what they were doing

annoyed_eng
0 replies
3d2h

Generally, I think over the last few years, GCP has lost its way.

There was a time several years ago where they were a meaningfully better option when looking at price / performance for compute / storage / bandwidth when compare to AWS. At the time, we did detailed performance testing and cost modeling to prove this for our workload (hundreds of compute engine instances etc).

Support back then was also excellent. One of our early tickets was an obscure networking issue. The request was quickly escalated then passed from engineers in different regions around the world until it was resolved. We were very impressed. It was a change on the GCP end that ended up being reverted. We quickly got to real engineers who competently worked the problem with us to resolution.

The sales team interactions were also better back then. We had a great sales rep who would quickly connect us with any internal resources we needed. The sales rep was a net positive and made our experience with GCP better.

Since then, AWS has certainly caught up and is every bit as good from a cost / performance standpoint. They remain years ahead on many managed services.

The GCP support experience has degraded significantly at this point. Most cases seem to go to outsourced providers who don’t seem able to see any data about the actual underlying GCP infrastructure. We too have detected networking issues that GCP does not acknowledge. The support folks we are dealing with don’t seem to have any greater visibility than we do. It’s pathetic and deeply frustrating. I’m sure it’s just as frustrating for them.

The sales experience is also significantly worse. Our current rep is a significant net negative.

We’ve made significant investments in GCP and we hate seeing this happen. While we would love to see things improve, we don’t see any signs of that actually happening. We are actively working to reduce our GCP spend.

A few years ago, I was a vocal GCP advocate. At this point, I’d have a hard time suggesting anyone build anything new on GCP.

Sytten
0 replies
3d5h

Whishing all the best to the railway team, they really are building something nice. Hopefully the move to bare metal will mean price reductions for customers. I am philosophically opposed to cloud providers charging per user on top of very expensive resources but it might just be me.