HN comments for: A ChatGPT mistake cost us $10k

Aurornis

138 replies

20h50m

2024-06-09 21:39:12 UTC

On one hand, thanks for being honest about a story of how this bug came to be.

On the other hand, I don’t think advertising the fact that the company introduced a major bug from copy and pasting ChatGPT code around and that they spent a week being unable to even debug why it was failing.

I don’t know much about this startup, but this blog post had the opposite effect of all of the other high quality post-mortem posts I’ve read lately. Normally the goal of these posts is to demonstrate your team’s rigor and engineering skills, not reveal that ChatGPT is writing your code and your engineers can’t/won’t debug it for a very long time despite knowing that it’s costing them signups.

userbinator

47 replies

20h42m

2024-06-09 21:47:30 UTC

It read like no one really knew what they were doing. "We just let it generate the code and everything seemed to work" is certainly not a good way to market your company.

mort96

18 replies

18h42m

2024-06-09 23:47:36 UTC

Eh I imagine they looked over the code as well, doing code review -- and at first glance, the code looks reasonable. I certainly wasn't able to catch the bug even though I tried to find it (and I was given a tiny collection of lines and the knowledge that there's a bug there!).

If anything, I think this says something about how dangerous ChatGPT and similar tools are: reading code is harder than writing code, and when you use ChatGPT, your role stops being that of a programmer and becomes that of a code reviewer. Worse, LLMs are excellent at producing plausible output (I mean that's literally all they do), which means the bugs will look like plausibly correct code as well.

I don't think this is indicative of people who don't know what they're doing. I think this is indicative of people using "AI" tools to help with programming at all.

throw46365

7 replies

18h6m

2024-06-10 00:23:43 UTC

I don't think this is indicative of people who don't know what they're doing. I think this is indicative of people using "AI" tools to help with programming at all.

I think using AI tools to write production code is probably indicative of people who don't really know what they are doing.

The best way not to have subtle bugs is to think deeply about your code, not subcontract it out -- whether that is to people far away who both cannot afford to think as deeply about your code and aren't as invested in it, or to an AI that is often right and doesn't know the difference between correct and incorrect.

It's just a profound abrogation of good development principles to behave this way. And where is the benefit in doing this repeatedly? You're just going to end up with a codebase nobody really owns on a cognitive level.

At least when you look at a StackOverflow answer you see the discussion around it from other real people offering critiques!

ETA in advance: and yes, I understand all the comparison points about using third party libraries, and all the left-pad stuff (don't get me started on NPM). But the point stands: the best way not to have bugs is to own your code. To my mind, anyone who is using ChatGPT in this way -- to write whole pieces of business logic, not just to get inspiration -- is failing at their one job. If it's to be yours, it has to come from the brain of someone who is yours too. This is an embarrassing and damaging admission and there is no way around it.

ETA (2): code review, as a practice, only works when you and the people who wrote the code have a shared understanding of the context and the goal of the code and are roughly equally invested in getting code through review. Because all the niche cases are illuminated by those discussions and avoided in advance. The less time you've spent on this preamble, the less effective the code review will be. It's a matter of trust and culture as much as it's a matter of comparing requirements with finished code.

abraae

4 replies

17h54m

2024-06-10 00:35:44 UTC

And where is the benefit in doing this repeatedly? You're just going to end up with a codebase nobody really owns on a cognitive level.

You could say the same about the output of a compiler. No one owns that at a cognitive level. They own it at a higher level - the source code.

Same thing here. You own the output of the AI at a cognitive level, because you own the prompts that created it.

l33t7332273

1 replies

16h36m

2024-06-10 01:53:52 UTC

No one owns that at a cognitive level

Notwithstanding the fact that compilers did not fall out of the sky and very much have people that own them at the cognitive level, I think this is still a different situation.

With a compiler you can expect a more or less one to one translation between source code and the operation of the resulting binary with some optimizations. When some compiler optimization causes undesired behavior, this too is a very difficult problem to solve.

Intentionally 10xing this type of problem by introducing a fuzzy translation between human language and source code then 1000xing it by repeating it all over the codebase just seems like a bad decision.

throw46365

0 replies

16h28m

2024-06-10 02:01:20 UTC

Right. I mean... I sometimes think that Webpack is a malign, inscrutable intelligence! :-)

But at least it's supposed to be deterministic. And there's a chance someone else will be able to explain the inner workings in a way I can repeatably test.

throw46365

0 replies

17h38m

2024-06-10 00:51:34 UTC

You could say the same about the output of a compiler.

Except, for starters, that you're not using the LLM to replace a compiler.

You're using it to replace a teammate.

theamk

0 replies

15h36m

2024-06-10 02:53:54 UTC

Yes, and when compilers fail, it's a very complex problem to solve, that usually requires many hours from experienced dev. Luckily,

(1) Compilers are reproducible (or at least repeatable), so you can share your problem with other, and they can help.

(2) For common languages, there are multiple compilers and multiple optimization options, which (and that's _very important_) produce identically-behaving programs - so you can try compiling same program with different settings, and if they differ, you know compiler is bad.

(3) The compilers are very reliable, and bugs when compiler succeeds, but generates invalid code are even rarer - in many years of my career, I've only seen a handful of them.

Compare to LLMs, which are non-reproducible, each one is giving a different answer (and that's by design) and finally have huge appear-to-succeed-but-produce-bad-output error rate, with value way more than 1%. If you had a compiler that bad, you'd throw it away in disgust and write in assembly language.

throwaway2037

1 replies

16h42m

2024-06-10 01:47:44 UTC

    > I think using AI tools to write production code is probably indicative of people who don't really know what they are doing.

People said the same to me for using Microsoft IntelliSense 20 years ago. AI tools for programming are absolutely the future.

throw46365

0 replies

16h36m

2024-06-10 01:53:29 UTC

But not the now, quite obviously.

Colour me cynical but I don't feel like pretending the future is here only to have to have to fix its blind incompetence.

whakim

2 replies

17h8m

2024-06-10 01:21:16 UTC

I totally disagree with this. You might as well argue that we shouldn't use code-completion of any kind because you might accidentally pick the wrong dependency or import. Or perhaps we shouldn't use any third-party libraries at all because you can use them to write reasonable-looking but incorrect code? Heck, why even bother using a programming language at all since we don't "own" how it's interpreted or compiled? Ultimately I agree that using third-party tools saves time at the cost of potentially introducing some types of bugs. (Note that said tools may also help you avoid other types of bugs!) But it's clearly a tradeoff (and one where we've collectively disagreed with you the vast, vast majority of the time) and boiling that down to AI=bad misses the forest for the trees.

aniviacat

1 replies

15h10m

2024-06-10 03:19:43 UTC

It is possible to use autocompletion correctly.

It is possible to use libraries correctly.

It is not possible to use AI correctly. It is only possible to correct its inevitable mistakes.

whakim

0 replies

13h10m

2024-06-10 05:19:41 UTC

AI can provably write code without mistakes?

mattdw

2 replies

16h57m

2024-06-10 01:32:26 UTC

I'm a some-time Django developer and... I caught the bug instantly. Once I saw it was model/ORM code it was the first thing I looked for.

I say that not to brag because (a) default args is a known python footgun area already and (b) I'd hope most developers with any real Django or SQLAlchemy experience would have caught this pretty quick. I guess I'm just suggesting that maybe domain experience is actually worth something?

aragilar

0 replies

15h55m

2024-06-10 02:34:47 UTC

I missed it, because I was really confused what the models were doing: why is there an id and a subscription_id? Are the user_id fields related?

Sakos

0 replies

15h6m

2024-06-10 03:23:50 UTC

I've since moved on to primarily working with Java, so it's been a few years since working with Django on a daily basis and the default still jumped out to me immediately. Experience and domain knowledge is so important, especially when you need to evaluate ChatGPT's code for quality and correctness.

Also, where were their tests in the first place? Or am I expecting too much there?

mbreese

1 replies

16h58m

2024-06-10 01:31:11 UTC

This is an error that should probably have been caught just based upon the color of the text when it was typed/pasted into the source code. Of the uuid() call was in quotes, it would have appeared as text. When you’re blindly using so much copy/pasted code (regardless of the source), it’s really easy to miss errors like this.

But our existing tools are already built to help us avoid this.

Back in the day, I used a tool from a group in Google called “error-prone”. It was great at catching things like this (and Lorne goal NPE in Java). It would examine code before compiling to find common errors like this. I wish we had more “quick” check tools for more languages.

thaumasiotes

0 replies

15h5m

2024-06-10 03:24:46 UTC

This is an error that should probably have been caught just based upon the color of the text when it was typed/pasted into the source code. Of the uuid() call was in quotes, it would have appeared as text.

It's not in quotes. It's a function call.

The issue is that the function call happens once, when you define the class, rather than happening each time you instantiate the class.

ufmace

0 replies

14h7m

2024-06-10 04:22:44 UTC

I don't know Python or SQLAlchemy that great, though I do have the benefit of it being cut down to a small amount of code and being told there was a bug there. That said, I didn't see the actual bug, but I did mentally flag that as something I ought to look up how it actually behaved. It's suspicious that some columns used `default` with what looks like Python code while others used `server_default` with what appears to be strings that look more like database engine code. If I was actually responsible for this, I'd want to dig into why there is that difference and where and when that code actually runs.

It's also the case that "code review" covers a lot of things, from quickly skimming the code and saying eh, it's probably fine, to deeply reading and ensuring that you fully understand the behavior in all possible cases of every line of code. The latter is much more effective, but probably not nearly as common as it ought to be.

jwells89

0 replies

15h4m

2024-06-10 03:25:24 UTC

This is why I typically only use LLMs in programming as a semi-intelligent doc delver with requests like, “give me an example of usage of X API with Y language on Z platform”.

Not only does it goof up less frequently on small focused snippets like this, it also requires me to pick the example apart and pay close enough attention to it that goofups don’t slip by as easily and it gets committed to memory more readily than with copypasting or LLM-backed autocomplete.

Terretta

12 replies

18h50m

2024-06-09 23:39:46 UTC

It read like no one really knew what they were doing. "We just let [devs] generate the code and everything seemed to work" is certainly not a good way to [whatever].

Except, have you met startup devs? This is by and large the "move fast then unbreak things" approach.

masijo

4 replies

18h42m

2024-06-09 23:47:46 UTC

This is why working for startups gives me PTSD. I wouldn't recommend it to anyone.

almost_usual

3 replies

17h57m

2024-06-10 00:31:56 UTC

The idea of inheriting a ChatGPT code base no one understands now makes it worse.

hehdhdjehehegwv

2 replies

17h10m

2024-06-10 01:19:44 UTC

Just give it to GPT-5 for a refactor, easy!

pclmulqdq

1 replies

16h13m

2024-06-10 02:16:11 UTC

It terrifies me that I have heard people say this unironically.

hehdhdjehehegwv

0 replies

13h47m

2024-06-10 04:42:32 UTC

It’s the natural outcome of SV types denigrating the value of education.

Forget knowing anything, just come up with a nice pitch deck and let the LLM write the stack.

Not wholly surprised these people are YC backed. I’ve got the impression YC don’t place much weight on technical competence, assuming you can just hire the talent if you know how to sell.

Well, now replace “hire some talent” with “get a GPT subscription and YOLO”, and you get the foundation these companies of tomorrow are going to be built on.

Which hey, maybe that’s right and they know something I don’t.

SilasX

4 replies

18h31m

2024-06-09 23:58:13 UTC

For that matter, has OP even met the HN accepted wisdom? "No one knows what they're doing, everyone's faking it, it's fine if you are too" -- so don't take it as a red flag when your fumbling around keeps blowing up, because it surely must work that way everywhere else.

My early rant against this mentality: https://news.ycombinator.com/item?id=19214749

tmpz22

3 replies

17h39m

2024-06-10 00:50:22 UTC

Its very humbling coming out of startup-land and working with big tech engineers and realizing their tooling runs circles around everybody else and enables them to be much more precise with their work and scale, though it isn't without trade-offs.

sanderjd

2 replies

17h16m

2024-06-10 01:13:22 UTC

Yeah but a lot of that is just the accrual of improvements that is possible with a lot of resources over a long period of time.

People working in "big tech" aren't fundamentally better at building reliable tools and systems; the time and resource constraints are entirely different.

crabmusket

1 replies

15h33m

2024-06-10 02:56:45 UTC

And the stakes! This outage might have cost the OP $10k. A similar snafu at a larger company might have cost tens of millions or more.

fragmede

0 replies

14h33m

2024-06-10 03:56:10 UTC

The big tech tooling probably cost tens of millions of dollars to create, and probably had a couple $10k mistakes on the way to getting it written and running.

ruszki

0 replies

15h35m

2024-06-10 02:54:53 UTC

It’s possible to move fast the same way, but break less things than this. For example, in this case, they said that they introduced tests to mitigate this. I can assure you that introducing tests takes more time than Google searches to check in like 2 minutes what each lines really does.

Kesty

0 replies

11h6m

2024-06-10 07:23:15 UTC

The idea of moving fast is to have extensive logs and alerts so you fix all error fasts while they appear without "wasting time" with long expensive tests in a phase where things change every day.

5 days to find out you have "duplicate key" errors in the db is the opposite of fast

bogota

5 replies

17h26m

2024-06-10 01:03:49 UTC

This is getting more common. I have already had people try to tell me how something works from a chat gpt summary. This would have led to us taking a completely different direction… 5 minutes of reading the actual docs and I found out they were wrong.

Now at a new company i have caught several people copy pasting gpt code that is just horrendous.

It seems like this is where the industry is headed. The only thing i have found gpt to be good at is solving interview questions although it still uses phantom functions about 50% of the time. The future is bumming me out.

gravescale

3 replies

16h11m

2024-06-10 02:18:13 UTC

Like people who post "here's what ChatGPTx said" instead of their own answer. Quite literally, what is the point?

However, I don't think it's really bad for the technical industries long term. It probably does mean that some companies with loose internal quality control and enough shiftless employees pasting enough GPT spew without oversight will go to the wall because their software became unmaintainable and not useful, but this already happens. It's probably not hugely worse than the flood of bootcamp victims who wrote fizzbuzz in Python, get recruited by a credulous, cheap or desperate company and proceed to trash the place if not adequately supervised. If you can't detect bad work being committed, that's mostly on the company, not ChatGPT. Yes, it may make it harder, a bit, but it was oversight you should already have been prepared to have, doubly so if you can't trust employee output. It also probably implies strong QA, which is already a prerequisite of a solid product company.

Normal interest rates coming back will cut away companies that waste everyone's time and money by overloading themselves on endlessly compounding technical debt.

foota

1 replies

16h5m

2024-06-10 02:24:05 UTC

Is the idea here that normal (read: low?) interest rates will let companies spend more time getting things right?

gravescale

0 replies

15h37m

2024-06-10 02:51:58 UTC

No, the idea is that historically-normal interest rates around the 5-10% mark won't be conducive to free VC cash being sprayed around for start-ups to wank themselves silly over "piv-iterating" endlessly over spamming complete nonsense and using headcount and office shininess as a substitute for useful and robust products.

Yes, it makes the barrier higher even for good products and helps entrench incumbents, but short of a transnational revolution, the macroeconomic system is what is it and you can only chose to find the good things in it or give up entirely.

johnny22

0 replies

13h29m

2024-06-10 05:00:27 UTC

Like people who post "here's what ChatGPTx said" instead of their own answer. Quite literally, what is the point?

Yeah I've seen this and i hate it. If i wanted to know what chatgpt said I'd just ask it myself.

droopyEyelids

0 replies

17h13m

2024-06-10 01:16:15 UTC

It’sa Tower of Babel like effect

PhasmaFelis

4 replies

18h32m

2024-06-09 23:57:22 UTC

It read like no one really knew what they were doing.

In my experience, hardly anyone in software does know what they're doing, for sufficiently rigorous values of "know what you're doing." We all read about other people's stupid mistakes, and think "haha, I would never have done that, because I know about XYZ!" And then we go off and happily make some equally stupid mistake because we don't know about ABC.

zeroonetwothree

2 replies

17h32m

2024-06-10 00:57:14 UTC

There’s a difference between not knowing what you are doing and making a mistake.

throw46365

0 replies

17h23m

2024-06-10 01:06:29 UTC

The "zen" of LLMs is that they do not see a real distinction between these two things, or either of these two things and success ;-)

PhasmaFelis

0 replies

17h26m

2024-06-10 01:03:25 UTC

An awful lot of mistakes are made because one didn't know something that would have enabled one to avoid it. Not knowing what you don't know is difficult to work around.

throw46365

0 replies

17h26m

2024-06-10 01:03:37 UTC

I dunno. I tend to annoy people when taking on jobs by telling people what I am concerned about and do not understand, and then sharing with them the extent to which I have managed to allay my own concerns through research.

I turn down a lot of jobs I don't feel confident with; maybe more than I should.

An LLM never will.

gravescale

1 replies

16h38m

2024-06-10 01:51:51 UTC

We just let it generate the code and everything seemed to work

Everyone using C++20 compilers: side-glancing monkey.

pclmulqdq

0 replies

16h12m

2024-06-10 02:17:02 UTC

The difference is that you can read the code that ChatGPT generates.

krick

0 replies

16h40m

2024-06-10 01:49:50 UTC

I don't really care about them marketing their company, but, Jesus, seriously, that's how software is going to be written now. TBH, I'm not really sure if it's that much different from how it was, but it sounds just... fabulous.

jasonlotito

0 replies

15h13m

2024-06-10 03:16:17 UTC

It read like no one really knew what they were doing.

Then no one knows what they are doing. I really don't know any company that doesn't make what could be considered rookie mistakes by some armchair "developer" here on HN.

viraptor

31 replies

20h37m

2024-06-09 21:52:17 UTC

They spent 5 days. The bug type is pretty common and could easily be done by a developer. (It's a similar class to the singleton default argument issue that many people complain about) Meh, I don't mind the cautionary tale and don't think chatgpt was even relevant.

It's actually a tricky bug, because usual tests wouldn't catch it (db wiped for good isolation) and many ways of manual testing would restart the service (and reset the value) on any change and prevent you from seeing it. Ideally there would be a lint that catches this situation for you.

hn_throwaway_99

12 replies

20h21m

2024-06-09 22:08:03 UTC

TBH, while I definitely could see this being an easy bug to write, something is definitely wrong if it took 5 days to identify the root cause of this bug.

That is, I'm struggling to understand how a dive into the logs wouldn't show that all of these inserts were failing with duplicate key constraint violations. At that point at least I'd think you'd be able to narrow down the bug to a problem with key generation, at which point you're 90% of the way there.

I also don't agree that "usual tests wouldn't catch it (db wiped for good isolation)". I'd think that you'd have at least one test case that inserted multiple users within that single test.

viraptor

10 replies

20h14m

2024-06-09 22:15:38 UTC

The bug was in multiple subscriptions not just users. And I can't think of one non-contrived reason to do it. Even when testing the visibility/access of subscriptions between users you need 2 users, but only one subscription.

jbattle

6 replies

18h18m

2024-06-10 00:11:52 UTC

create a subscription for a test user. delete it. Make sure you can create another subscription for the same user.

create subscriptions with and without overlapping effective windows

Those seem like very basic tests that would have highlighted the underlying issue

zeroonetwothree

3 replies

17h29m

2024-06-10 01:00:23 UTC

Hindsight is 20/20. It’s always easy to think of tests that would have caught the issue. Can you think of tests that will catch the next issue though?

hn_throwaway_99

1 replies

14h0m

2024-06-10 04:29:53 UTC

Sure, hindsight is 20/20, but a bunch of these comments are replying to the assertion "And I can't think of one non-contrived reason to do it" (have a single test case with multiple subscriptions). That's the assertion I think is totally weird - I can think of tons of non-contrived reasons to have multiple subscriptions in a single test case.

I wouldn't pillory someone if they left out a test case like this, but neither would I assert that a test case like this is for some reason unthinkable or some outlandish edge case.

viraptor

0 replies

12h53m

2024-06-10 05:36:53 UTC

Go on then - so far the examples I've seen don't make sense in the context of stripe.

jbattle

0 replies

17h0m

2024-06-10 01:29:01 UTC

I'd argue that a suite of tests that exercise all reasonably likely scenarios is table stakes. And would have caught this particular bug.

I'm not talking about 100% branch coverage, but 100% coverage of all happy paths and all unhappy paths that a user might reasonably bump into.

OK maybe not 100% of the scenarios the entire system expresses, but pretty darn close for the business critical flows (signups, orders, checkouts, whatever).

viraptor

0 replies

13h13m

2024-06-10 05:16:54 UTC

Have a look at stripe API. You don't delete subscriptions. You change them to free plan instead / cancel and then resume later. This for would not result in deletion of the entry. You also can update to change the billing date, so no overlapping subscriptions are needed. Neither test would result in the described bug.

Ozzie_osman

0 replies

17h40m

2024-06-10 00:49:07 UTC

Or add some debug logging? 5 days into a revenue-block bug, if I can't repro manually or via tests, I would have logged the hell out of this code. No code path or metric would be spared.

hn_throwaway_99

2 replies

20h5m

2024-06-09 22:24:18 UTC

What? This doesn't make any sense:

1. First, if you look at the code they posted, they had the same bug on line 45 where they create new Stripe customers.

2. The issue is not multiple subscriptions per user (again, if you look at the code, you'll see each Subscription has one foreign key user_id column). The problem is if you had multiple subscriptions (each from different users) created from the same backend instance then they'd get the same PK.

viraptor

1 replies

19h45m

2024-06-09 22:43:56 UTC

Not every user needs a stripe customer. I'm creating the stripe entries only on subscription in my app.

Your second point is true, but I don't see what it changes. Most automated unit/integration testing would just wipe the database between tests and needing two subscribed users in a single test is not that likely.

hn_throwaway_99

0 replies

17h15m

2024-06-10 01:14:39 UTC

needing two subscribed users in a single test is not that likely.

Apparently not.

qeternity

0 replies

20h15m

2024-06-09 22:14:32 UTC

Yeah, we have quite literally caught bugs like this in 5 minutes in prod not bc we made a mistake, but bc a customer’s internal API made a schema change without telling us and our database constraints and logging protected us.

But it took about 5 minutes, and 4 of those minutes were waiting for Kibana to load.

brigadier132

5 replies

20h0m

2024-06-09 22:28:57 UTC

db wiped for good isolation

Why? In fact, not having good isolation would have caught this bug. Generate random emails for each test. Why would you test on a completely new db as if that is what will happen in the real world?

zeroonetwothree

3 replies

17h27m

2024-06-10 01:02:11 UTC

It makes your tests more robust. Generally you don’t want tests that are too sensitive to external state since they will fail spuriously and become useless.

brigadier132

2 replies

17h7m

2024-06-10 01:22:34 UTC

Of course your tests shouldn't be sensitive to external state. Why would other tests running affect your test?

viraptor

1 replies

14h34m

2024-06-10 03:55:15 UTC

They shouldn't. But they do. We're not perfectly spherical developers and we all make mistakes. Sometimes it's also extremely tricky to figure out what state is leaking, especially is it's an access race issue and happens only for some tests and very rarely. If you haven't seen that happening, you just need to work on larger projects.

brigadier132

0 replies

2h43m

2024-06-10 15:45:57 UTC

I've worked at several large companies. In our e2e tests we did not create isolated dbs per test. If the test failed because of other tests running that's a bug in the test and that person would get automatically @mentioned on slack and they would have to fix the build.

viraptor

0 replies

19h41m

2024-06-09 22:47:56 UTC

It's extremely common. You want to know that objects/rows from one test don't bleed into another by accident. It allows you to write more strict and simpler assetions - like "there's one active user in the database" after a few changes, rather than "this specific user is in the database and active and those other ones are not anymore".

Leaking information between test runs can actually make things pass by accident.

crooked-v

3 replies

20h31m

2024-06-09 21:58:27 UTC

and could easily be done by a developer

...who didn't know how the ORM they were using worked. That's what makes them look so bad here: nobody knew how it worked, not even at the surface level of knowing what the SQL actually generated by the tool looks like.

TheRoque

2 replies

20h9m

2024-06-09 22:20:55 UTC

In their defense, I find SQLAlchemy syntax quite horrible, and I always have to look up everything. It also got a 2.0 release recently which changes some syntax (good luck guessing which version ChatGPT will use), and makes the process even more annoying.

zo1

1 replies

20h1m

2024-06-09 22:28:08 UTC

SQLAlchemy syntax is ridiculously obvious and straightforward as long as you're not doing anything weird.

The takeaway here is that they weren't mature enough to realize they were, in fact, doing something "weird". I.e. Using UUIDs for PKs, because hey "Netflix does it so we have to too! Oh and we need an engineering blog to advertise it".

Edit. More clarity about why the UUID is my point of blame: If they had used a surrogate and sequential Integer PK for their tables, they would never have to tell SQLAlchemy what the default would be, it's implied because it's the standard and non-weird behavior that doesn't include a footgun.

sgarland

0 replies

3h15m

2024-06-10 15:13:59 UTC

Unfortunately, UUID as PK is an extremely common pattern these days, because devs love to believe that they’ll need a distributed DB, and also that you can’t possibly use integers with such a setup. The former is rarely true, the latter is blatantly false.

brown9-2

2 replies

17h48m

2024-06-10 00:41:50 UTC

looking at the query logs for the nighttime period should have made the bug fairly obvious

8n4vidtmkvmk

1 replies

16h34m

2024-06-10 01:55:55 UTC

They even said they had sentry set up.. they'd notice the duplicate key error immediately.

ihumanable

0 replies

2024-06-10 18:25:55 UTC

I read the part where they said they poured through "hundreds of sentry logs" and immediately was like "no you didn't."

This is not an error that would be difficult to spot in an error aggregator, it would throw some sort of constraint error with a reasonable error message.

jackspawn

1 replies

20h30m

2024-06-09 21:59:23 UTC

agree, but this sounds like it would produce logs/error messages which could then lead to a solution, quicker... if the logs were captured and propagated sufficiently

viraptor

0 replies

20h18m

2024-06-09 22:11:33 UTC

Oh definitely. Their ops experience seems very low. But I've found it extremely uncommon to see anything better in smaller projects.

dragonwriter

1 replies

20h9m

2024-06-09 22:20:29 UTC

It's actually a tricky bug, because usual tests wouldn't catch it (db wiped for good isolation)

Volume/load testing (or really, any decent acceptance testing) would catch it.

viraptor

0 replies

20h2m

2024-06-09 22:27:25 UTC

Load testing - yes, but it's not that usual unfortunately. (Even though it should be) Acceptance testing - again, maybe, if they use 20 or so subscriptions in one batch, which may not be the case.

8n4vidtmkvmk

0 replies

16h31m

2024-06-10 01:58:00 UTC

A functioning dev env would have caught this if they manually tested more than once. Typically you dont run 40 dev instances. Or a staging environment.

koito17

22 replies

20h33m

2024-06-09 21:56:54 UTC

More importantly, what was the motivation behind a rewrite from TypeScript to Python? From the article

  Our project was originally full stack NextJS but we wanted to first migrate everything to Python/FastAPI.

Seems like this entire mess could've been avoided if they had stuck with their existing codebase, which seemed to have been satisfying their business requirements.

nicce

17 replies

20h25m

2024-06-09 22:04:35 UTC

NextJS

There is well-hidden vendor-lock when using NextJS, at least.

brigadier132

9 replies

20h2m

2024-06-09 22:26:56 UTC

There is no vendor lockin with nextjs

nicce

7 replies

19h48m

2024-06-09 22:40:56 UTC

Simply put - if you want get the best out of the framework, you need to host it in Vercel. Otherwise, there are better options for frameworks. No need to ”fight it”.

You will find many issues from GitHub which are not considered because they would make the framework ”better” or easier to use on other clouds.

mattigames

4 replies

19h3m

2024-06-09 23:26:30 UTC

Making things worse in a free offering by a company to profit from premium offerings by the same company its the pinacle of capitalism, reminds me of a recurrent joke I have with a friend while playing Call Of Duty, that they will get greedier and soon will sell not only character's skins but also shaders/textures for the maps, oh so you want to see something better than placeholder textures? We have the DLC just for you!

kevindamm

2 replies

18h35m

2024-06-09 23:54:16 UTC

More likely outcome: ads on static textures and between lobbies, oh but you can pay to turn them into _theme of the week_.

nicce

1 replies

18h25m

2024-06-10 00:04:00 UTC

Don't give them ideas...

kevindamm

0 replies

17h48m

2024-06-10 00:41:06 UTC

Environment destruction gameplay but there's always another ad under the ads, except when it's a lootbox.

I can imagine worse, too! They haven't even really started turning that knob yet.

tmpz22

0 replies

17h33m

2024-06-10 00:56:30 UTC

The Video game metaphor stretches pretty far.

Madden has a monopoly license for NFL content. For a decade the biggest complaint was how they gate kept rosters behind the yearly re-release. Eventually they allowed roster sharing but they put it behind the most god awful inept UI you could possibly imagine such that practically casual gamers wouldn't bother with it.

Then Madden came out with Madden Ultimate Team (like trading cards MTX) and have been neglecting non-MUT modes ever since. They don't explicitly regress the rest of their game, they just commit resources to that effect.

Its like malicious compliance. They don't embrace, extend, extinguish, but they get a similar effect just with resourcing, layoffs, whatever.

throw156754228

1 replies

10h54m

2024-06-10 07:35:11 UTC

there are better options

Do you mind naming some NextJS alternatives without potential vendor lock in? Mulling a change in my fe.

nicce

0 replies

10h43m

2024-06-10 07:46:22 UTC

For example, Remix has got a lot of traction recently. It is also backed by Shopify where the business model does not conflict.

https://remix.run/

_heimdall

0 replies

18h20m

2024-06-10 00:08:57 UTC

There is no explicit vendor lock-in, but features of the framework are designed heavily towards Vercel-specific features.

The SST team actually has an open-next project[1] that does a ton of work to shim most of the Vercel-specific features into an AWS deployment. I don't think it has full parity and it's a third party adapter from a competing host. The fact that it's needed at all is a sign of now closely tied Next and Vercel are.

[1] https://github.com/sst/open-next

__jonas

6 replies

19h55m

2024-06-09 22:34:13 UTC

that’s a bold claim, could you give an example?

nicce

5 replies

19h13m

2024-06-09 23:16:53 UTC

Probably the most known example https://github.com/vercel/next.js/discussions/19065

It is not an issue if you host in Vercel.

Implementing the requested feature would make the framework much better and easier to use when self-hosted elsewhere. But there is neglection to resolve the issue. This is just one case.

steelbrain

4 replies

18h36m

2024-06-09 23:53:55 UTC

FWIW, we host on Cloudflare and use their API to resize images on the fly and we're fine. Not so much a "lock-in" if other vendors can fill-in, is it?

nicce

1 replies

18h27m

2024-06-10 00:02:05 UTC

You likely needed to do more extra work than needed, when compared to some other options.

The lock-in here is the added developer time and complexity vs. just paying premium.

steelbrain

0 replies

16h51m

2024-06-10 01:38:22 UTC

I disagree with your threshold for what makes something a lock-in but I admire your ideology of less friction in portability

winterrx

0 replies

14h15m

2024-06-10 04:14:01 UTC

What API do you use to resize images? Cloudflare? I know Vercel and NextJS have an <Image> resizing/optimization component that gets pricy.

RamblingCTO

0 replies

13h6m

2024-06-10 05:23:06 UTC

If you need to change it when switching vendors and only they offer it/it's proprietary, it's vendor lock-in.

dgfitz

1 replies

20h28m

2024-06-09 22:01:35 UTC

I’m sure they are influenced by the likes of Reddit and Twitter rewriting their stack. I mean, that’s what has to be done, right? /s

__jonas

0 replies

19h53m

2024-06-09 22:36:12 UTC

Is there a trend there of moving from Next to FastAPI? I would be surprised.

Perhaps they are doing some AI thing and want to have python everywhere.

threecheese

0 replies

17h25m

2024-06-10 01:04:40 UTC

My guess, when I read it, was this would permit them to independently scale the backend on some commodity capacity provider, and then their Nextjs frontend becomes just another React app. OP didn’t mention what their product was, but if it’s AI-adjacent then a python backend doesn’t sound like a terrible idea.

0 replies

16h57m

2024-06-10 01:32:11 UTC

Or if you really want to rewrite your back end, why not just use Express? It would be wildly quicker to rewrite than switching languages. That along with the article makes me question the competency of the company. They got customers, sure, but in the long run these decisions will pile up.

SmellTheGlove

10 replies

20h13m

2024-06-09 22:16:51 UTC

This is a pretty common mistake with sqlalchemy whether you’re using ChatGPT or not. I learned the same lesson years ago, although I caught it while testing. I write plenty of python and I just don’t often pass functions in as parameters. In this case you need to!

For something like this where you’re generating a unique id and probably need it in every model, it’s better to write a new Base model that includes things like your unique id, created/changed at timestamps, etc. and then subclass it for every model. Basically, write your model boilerplate once and inherit it. This way you can only fuck this up once and you’re bound to catch it early before you make this mistake in a later addition like subscription management.

hehdhdjehehegwv

2 replies

17h4m

2024-06-10 01:25:44 UTC

Depending on your viewpoint, ORM are by nature a mistake.

The time people spend learning the quirks of an ORM is much better put into learning SQL.

thruway516

1 replies

16h48m

2024-06-10 01:41:51 UTC

Honestly same can be said about a lot of frameworks. You will pry my vanilla JS debugged with print statements hand-coded in vi from these hands only when they're cold and dead.

hehdhdjehehegwv

0 replies

12h1m

2024-06-10 06:28:44 UTC

Yeah, I have a workflow where I inject JavaScript into arbitrary webpages and store the results. There’s no substitute for vanilla js knowledge when you’re in the weeds.

hamasho

2 replies

18h42m

2024-06-09 23:47:02 UTC

I wonder are there linters to detect those types of mistakes for SQLAlchemy. Even though I'm aware of such pitfalls, it's nice if linters can catch them cause I'm not confident to cache them all the time during code review.

Some linters like pyright can identify dangerous defaults in function call, like `def hello(x=[]): pass` (mutable values shouldn't be a default). Linter plugins for widely-used and critical libraries like SQLAlchemy are nice to have.

whakim

0 replies

16h57m

2024-06-10 01:32:40 UTC

The mutable-default-arguments issue is easy for a third-party to linter to catch because it doesn't require any specific knowledge about primary keys and databases. Are there static typing plugins for other common packages that would catch issues like this?

aragilar

0 replies

15h46m

2024-06-10 02:42:57 UTC

I personally use pylint, which I've found to be the most aggressive python linter.

spamizbad

1 replies

17h4m

2024-06-10 01:25:36 UTC

Ah, but it’s not a characteristic of SQLAlchemy tho. It’s how Python evaluates statements. Both Peewee and the Django ORM work on the same principle with default values.

The intent is to pass a callable, not to call a function and populate an argument with what it returns.

SmellTheGlove

0 replies

15h58m

2024-06-10 02:31:36 UTC

Correct, it's not specific to sqlalchemy - I'm just saying I notice this a lot with sqlalchemy. Probably because it was the first significant bug I had to figure out how to fix when I introduced it in one of my first apps. I guess we never forget the first time we shot ourselves in the foot.

wateroutflow

0 replies

20h11m

2024-06-09 22:18:36 UTC

Yea, more of an issue of how this python library can cause misunderstandings, and ChatGPT failing in the same misunderstanding that would have been made by an engineer who lacks experience in that library.

This mistake would have happened even if they did not use ChatGPT.

thruway516

0 replies

16h58m

2024-06-10 01:31:37 UTC

That explains where ChatGpt is getting that from. I guess it is only as smart as the average coder.

AdamJacobMuller

5 replies

20h17m

2024-06-09 22:12:20 UTC

The fact that they couldn't find it by looking at error logs is weird to me.

This is an entirely forgivable error but should have been found the first time they got an email about it:

"Oh, look, the error logs have a duplicate key exception for the primary key, how do we generate primary keys.... (facepalm)"

Funnily enough, I saw the error in their snippet as soon as I read it but dismissed it thinking there was some new-fangled python feature which allowed that to work like the function def defines that default= accepts only functions so the function gets passed? -- I haven't kept up with modern python and that sounds cool and I figured the bug couldn't be THAT simple.

dalemhurley

4 replies

18h2m

2024-06-10 00:27:55 UTC

I was wondering that too. Why wouldn’t the error be in the logs?

throwaway346434

3 replies

17h47m

2024-06-10 00:42:17 UTC

Guess: the logs were on an ec2 instance that was thrown away regularly, and the overnight reports didn't give reproduce steps or timestamps; so when they checked it "works fine".

There's value in having your backtrace surfaced to end users rather than swallowing an exception and displaying "didn't work".

8n4vidtmkvmk

1 replies

16h23m

2024-06-10 02:06:37 UTC

I don't think showing stack traces to users is good practice? Every time one of my users gets a didn't work message I log the stack trace instead.

dalemhurley

0 replies

13h59m

2024-06-10 04:30:49 UTC

Why would you show them a stack trace? This should be logged.

AdamJacobMuller

0 replies

15h29m

2024-06-10 02:59:57 UTC

it was on some temporary AWS service like lambda or something? (We had eight ECS tasks on AWS, all running five instances of our backend), but, regardless logs should be somewhere persistent.

If they weren't, that should be the first thing you fix.

jorams

4 replies

20h32m

2024-06-09 21:57:39 UTC

Yeah this is not a good thing to advertise.

- They were under large time constraints, but decided a full rewrite to a completely different stack was a good idea.

- They copy-pasted a whole bunch of code, tested it manually once locally, once in production, and called it a day.

- The debugging procedure for this issue so significant it made them dread waking up involved... testing it once and moving on. Every day.

The bug is pretty easy to miss, but should also be trivial to diagnose the moment you look at the error message, and trivial to reproduce if you just try more than once.

atrettel

1 replies

16h55m

2024-06-10 01:34:15 UTC

I'd rather they admit a mistake and learn a lesson from it even if it isn't a good thing to advertise. That said, I agree that you are identifying a more important issue here but I also think you are being a bit too subtle about even if I agree with what you are saying. The real lesson that they should have learned from this ordeal is to never push code directly into production --- period. The article never mentions using a testbed or sandbox beforehand, and I kinda feel like they learned a good lesson but it may in fact be the wrong lesson to learn here.

theamk

0 replies

15h19m

2024-06-10 03:10:51 UTC

I don't see how testbed/sandbox would have helped, unless they'd also have a dedicated QA person _and_ configured their sandbox so have dramatically fewer instances.

Because I can see "create a new subscription" in the manual test plan, but not "create 5x new subscription".

owl57

0 replies

15h22m

2024-06-10 03:07:23 UTC

> trivial to reproduce if you just try more than once

A lot more than once: they had 40 instances of their app, and the bug was only triggered by getting two requests on the same instance.

A bunch of developers including me once spent a whole weekend trying to reproduce a bug that was affecting production and/or guess from the logs where to look for it. Monday morning, team lead called a meeting, asked for everything we could find out, and… Opened the app in six tabs simultaneously and pressed the button in question in one of the tabs. And it froze! Knowing how to reproduce on our computers, we found and fixed the bug in the next 30 minutes.

hehdhdjehehegwv

0 replies

17h1m

2024-06-10 01:28:43 UTC

Ironically one of my frequent GPT questions is “X is supposed to do Y, but is doing Z. What logs should I look at and what error messages to keep an eye out for?”

hitekker

4 replies

17h8m

2024-06-10 01:21:29 UTC

That's an alright takeaway: the team made a rookie mistake and then they made a PR mistake by oversharing.

Otherwise, I think this comment thread is a classic example why company engineering blogs choose to be boring. Better ten articles that have some useful information, than a single article that allows the commentariat to pile on and ruin your reputation.

sillysaurusx

0 replies

15h52m

2024-06-10 02:37:36 UTC

I think it’s an unfair takeaway. I have over a decade of experience and still had to stare at the line to find the bug. If that makes them incompetent, I stand with them. It’s a bug I’ve seen people make in other contexts, not just chatbots.

The AI angle is probably why people are piling on. There’s a latent fear that AI will take our jobs, and this is a great way to skewer home that we’re still needed. For now.

The one thing I will say is that it probably wouldn’t take me days to track it down. But that’s only because I have lots of experience dealing with bugs the way that The Wolf deals with backs of cars. When you’re trying to run a startup on top of everything else, it can be easy to miss.

I’m happy they gave us a glimpse of early stage growing pains, and I don’t think this was a PR fumble. It shows that lots of people want what they’re making, which is roughly the only thing that matters.

marginalia_nu

0 replies

15h31m

2024-06-10 02:58:17 UTC

Eh, I think it speaks fairly well for them.

On the one hand it does seem like a fairly inexperienced organization with some pretty undercooked release and testing processes, but on the other hand all that stuff is ultimately fixable. This is a relatively harmless way of learning that lesson. Admitting a problem is the first step toward fixing it.

A culture of ass-covering is much harder to fix, and will definitely get in the way of addressing these types of issues

hobs

0 replies

15h54m

2024-06-10 02:35:07 UTC

A mistake is interesting if the mistake itself or the RCA is interesting - using sloppy methods isn't really that interesting on its face.

Aurornis

0 replies

13h28m

2024-06-10 05:01:24 UTC

Better ten articles that have some useful information, than a single article that allows the commentariat to pile on and ruin your reputation.

Pile-on aside, the problem with this blog article is that it doesn't really have much of a useful takeaway.

They didn't even really talk the offending line in detail. They didn't really talk about what did to fix their engineering pipelines. It was just a story about how they let ChatGPT write some code, the code was buggy, and the bug was hard to spot because they relied on customers e-mailing them about it in a way that only happened when they were sleeping.

It's not really a postmortem, it's a story about fast and loose startup times. Which could be interesting in itself, except it's being presented more as an engineering postmortem blog minus the actionable lessons.

That's why everyone is confused about why this company posted this as a lesson: The lesson is obvious and, frankly, better left as a quiet story for the founders to chuckle about to their friends.

Lerc

1 replies

20h10m

2024-06-09 22:19:23 UTC

The bad thing is what they did, not that the disclosed it.

I agree that this is probably to their disadvantage, but I would much rather have people admitting their faults than hiding them. If everyone did this the world would be better.

Of course the best solution is to not have faults but that is like saying that the solution to being poor is to have lots of money. It's much easier to say than do.

theamk

0 replies

15h22m

2024-06-10 03:07:54 UTC

The bad thing is their engineering culture and not anything technical. We all make mistakes, the question is how we fix them. Look a the last sentences of the post:

Yes we should have done more testing. Yes we shouldn't have copy pasted code. Yes we shouldn't have pushed directly to main. Regardless, I don't regret the experience.

None of those are unconditionally bad! Every project I've worked on could use more testing; we all copy-pasted code at least occasionally, and pushing to main is fine in some circumstances.

The real problem is that they went live, but their tooling (or knowledge how to use it) was so bad it took 5 days to solve the simple issue; and meanwhile, they kept pushing new code ("10-20 commits/day") while their customers were suffering. This is what really causes the reputation hit.

nxobject

0 replies

20h40m

2024-06-09 21:49:40 UTC

In a way, it does give you an opportunity to think about what you appreciate in a detailed postmortem - not just a single cause, but human and organizational factors too, and an attempt to figure out explicit mitigations. I’ll admit the informality and the breezy tone here made me go “woah, they’re a bit cavalier…”

kidme5

0 replies

18h45m

2024-06-09 23:44:07 UTC

that blog post is all they have so not much to worry about company wise

joemazerino

0 replies

18h13m

2024-06-10 00:16:13 UTC

I appreciate the author's honesty. Its better to see transparently what happened so customers know the problem is fixed.

hunterrr

0 replies

15h7m

2024-06-10 03:22:15 UTC

These criticisms about engineering PR are too heavy handed. Great engineers solve problems and describe problems without finger pointing to place blame. In fact I think that the worst engineers I’ve worked with are the ones most often reaching for someone to place it on.

hehdhdjehehegwv

0 replies

17h54m

2024-06-10 00:35:31 UTC

This is embarrassing, I’d honestly consider pulling this post for your reputation.

Kesty

0 replies

11h9m

2024-06-10 07:20:25 UTC

CEO thoughts: "Oh post-morten are always well received, I should write one for that very really basic bug we had and how we took 5 days to find it, and forget to mention how we fix it or how we have changed our structure so that it never happen again"

Also the CEO: "remember to be defensive on reddit comments saying how we are a small 1 million dollar backed startup and how it's normal do to this king of rookies mistake to be fast."

wavemode

25 replies

20h16m

2024-06-09 22:13:40 UTC

No, a lack of monitoring cost you $10K. Your app was throwing a database exception and nobody was alerted that this was not only happening, but happening continuously and in large volumes. Such an alert would have made this a 5-minute investigation rather than 5 days.

If you haven't fixed that alerting deficiency, then you haven't really fixed anything.

sneak

7 replies

14h33m

2024-06-10 03:56:03 UTC

TBH, if the backend were written in Go, this probably wouldn’t have happened to the extent it did. Somewhere in a log a descriptive error would have shown up.

One of the reasons I use Go whenever possible is that it removes a lot of the classic Python footguns. If you are going to rewrite your backend from Javascript, why would you rewrite it in another untyped, error-prone language?

TheDong

6 replies

14h0m

2024-06-10 04:29:01 UTC

In python it's harder to ignore errors than in Go.

In go, I've definitely seen:

   tx, err := db.Tx()
   defer tx.Commit() // silently ignores the error on committing, which is the important one

That would have masked this error so it didn't get logged by the application.

In python, if you ignore an exception entirely, like I did that error above, you instead get an exception logged by default.

Python's exceptions also include line numbers, where as Go errors by default wouldn't show you _which_ object has a conflict, even if you logged it.

In general, python's logs are way better than Go's, and exceptions make it way harder to ignore errors entirely than Go's strategy.

randomdata

4 replies

12h37m

2024-06-10 05:52:00 UTC

> I've definitely seen

What did they say was the thinking behind it? defer tx.Rollback() would make sense, but defer tx.Commit() is nonsensical, regardless of whether or not the error is handled. It seems apparent that the problem there isn't forgetting to check an error, but that someone got their logic all mixed up, confusing rollback with commit.

TheDong

3 replies

11h39m

2024-06-10 06:50:10 UTC

"defer tx.Commit() is nonsensical"

It's not pure nonsense, it works in the happy path, and it matches the pattern of how people often handle file IO in go.

    f, err := os.OpenFile(...)
    defer f.Close()

... which is another place most gophers ignore errors incorrectly. Just like the "defer tx.Commit()" example, it's collocating the idea of setup and cleanup together.

Those two patterns are so similar, python handles them in the same way:

    with db.begin() as conn: # implicit transaction, gets automatically committed

    with open(...) as f: # implicit file open + close pair, automatically closed

You're of course right that go requires more boiler-plate to do the right thing, but the wrong code has no compiler errors and works in the happy path, and fits how people think about the problem in other languages with sane RAII constructs, so of course people will write it.

bryancoxwell

1 replies

6h23m

2024-06-10 12:06:34 UTC

Unrelated but I’ve never really understood what to do with an error that gets returned by Close() aside from logging it. I’ve also never experienced that error being returned and don’t really understand what could even cause that.

randomdata

0 replies

5h31m

2024-06-10 12:58:03 UTC

A write is not guaranteed to be written if close fails. You almost certainly want to do something with the error state to ensure that the data gets safely stored in that case.

If you are only reading, then sure, who cares?

randomdata

0 replies

5h33m

2024-06-10 12:56:44 UTC

> it's collocating the idea of setup and cleanup together.

Okay, sure, but Rollback is the cleanup function. You always want to rollback – you only sometimes want to commit. I suppose this confirms that someone got their logic mixed up.

sneak

0 replies

13h27m

2024-06-10 05:02:14 UTC

Yeah, each have their benefits. I run all my go projects through golangci-lint which is required for merge to master/main, so not checking the error value would not have made it to prod.

I suppose there are probably similar checkers for Python that would have caught the passing of a scalar value instead of a function.

Perhaps this is an argument for mandatory linting in CI.

throw156754228

4 replies

11h7m

2024-06-10 07:22:07 UTC

Should've been picked up at unit, way before monitoring.

igammarays

3 replies

10h8m

2024-06-10 08:20:58 UTC

The error didn't show up until multiple entities were created. I can forgive someone for not having unit tests for scaling issues.

huygens6363

2 replies

7h7m

2024-06-10 11:22:11 UTC

Creating more than one entry is not a “scaling issue”.

f35caleb

1 replies

5h39m

2024-06-10 12:50:33 UTC

Agreed. I'm chuckling because I was wrestling with this exact same bug on a FastAPI project last night. I caught it because I have a habit of submitting API endpoints multiple times with the same data to see how the database reacts. Got a key collision when I tried to submit the endpoint the second time and figured out that setup didn't create a new UUID each time.

Unit tests are good, yes. Monitoring is also good. But just taking 30 seconds to do some manual testing will catch a LOT of unexpected behavior.

withinboredom

0 replies

4h2m

2024-06-10 14:27:50 UTC

I generally follow this process:

1. Get it working: write the code for the desired behavior, not worrying about making it beautiful, testable, whatever.

2. Get it working well: manually testing and finding edge cases, refactoring to get it testable and writing tests to solidify behavior.

3. Get it working fast: optimizing it to be as fast as I need it to be (can sometimes skip this step). No tests should change here, but only new tests.

asveikau

3 replies

14h2m

2024-06-10 04:27:23 UTC

I think deploying and then going to sleep is the red flag here. They should have deployed the change at 9am or something and had the workday to monitor issues.

lolinder

1 replies

13h30m

2024-06-10 04:59:50 UTC

You can deploy and go to sleep if you have monitoring and alerting and someone getting paged. It shouldn't be a human monitoring for issues anyway, so the only reason to choose 9am over bedtime should be that you don't want to risk a late night page, not that someone will actually be checking up actively on the deployment.

asveikau

0 replies

13h21m

2024-06-10 05:08:18 UTC

For a big launch, you can have your entire team looking at issues in realtime, rather than a small number of people responding to pages in the middle of the night. I think if it's worth it to you there's a big difference.

bruce511

0 replies

13h48m

2024-06-10 04:41:28 UTC

While I agree, it wouldn't have helped. The first n sign-ups per commit worked. So the problem didn't manifest during the day, only when they stopped committing.

Now granted, if they'd carried on doing stuff (but not redeployed production) it may have shown up in say mid-aftenoon, or not depending on volume.

And of course the general guideline of not deploying anything to production early in the day, or on Friday, is still valid.

spencerchubb

1 replies

16h40m

2024-06-10 01:49:33 UTC

I agree this is more of a monitoring mistake, and little to do with chatgpt

rvnx

0 replies

11h52m

2024-06-10 06:37:03 UTC

It could have happened with any programmer writing the code (ChatGPT or not)

huygens6363

1 replies

13h15m

2024-06-10 05:14:47 UTC

IMO way sooner. Some low hanging fruit tests would catch this before it’d even hit git.

Edd1CC

0 replies

3h2m

2024-06-10 15:26:58 UTC

Yeah if one instance created one ID, then any integration tests creating more than one user would have failed. There were no testing or logging on a system with live users while doing a refactor between two dynamic languages

rnts08

0 replies

11h49m

2024-06-10 06:40:49 UTC

This gets more and more common, companies and founders does not think about the infrastructure since they believe that their cloud provider of choice is going to do it for them with it's magic.

As soon as you expect paying customers in your system you need to have someone with the knowledge and experience to deal with infrastructure. That means logging, monitoring, alerting, security etc.

DevOps.. amateurs.

lionkor

0 replies

1h18m

2024-06-10 17:11:29 UTC

Maybe ChatGPT didn't tell them that they needed monitoring?

fragmede

0 replies

14h39m

2024-06-10 03:50:33 UTC

Right? The log message would have said the id isn't unique, and then it would have taken much less time to debug this problem.

Programming when everything works is easy, it's handling the problems that makes it hard.

Kesty

0 replies

11h20m

2024-06-10 07:09:05 UTC

Sure not having tests, is bad. Doing thing with AI without triple checking is dangerous.

But not having error logging/alerts on your db ? That's the crazy part.

This is a new product, is not legacy code from 20 years ago when they thought it was a neat idea to just throw stuff at the db raw, and check for db errors to do data validation, so alerts are hard because there's so many expected errrors.

fredthedeadhead

21 replies

12h41m

2024-06-10 05:48:35 UTC

The blog post is 404ing, here's a Web archive link

https://web.archive.org/web/20240610032818/https://asim.bear...

The author has added an important edit:

I want to preface this by saying yes the practices here are very bad and embarrassing (and we've since added robust unit/integration tests and alerting/logging), could/should have been avoided, were human errors beyond anything, and very obvious in hindsight.

This was from a different time under large time constraints at the very earliest stages (first few weeks) of a company. I'm mostly just sharing this as a funny story with unique circumstances surrounding bug reproducibility in prod (due again to our own stupidity) Please read with that in mind

patates

17 replies

12h35m

2024-06-10 05:54:02 UTC

Could they have deleted it because of all the negativity?

They did make a silly mistake, but we are humans, and humans, be it individually or collectively, do make silly mistakes.

KennyBlanken

12 replies

12h0m

2024-06-10 06:29:14 UTC

If you code for a hobby/fun, yeah, sure, it's a silly mistake.

If you're earning past six figures, are part of a team of programmers, call yourself an professional / engineer, and have technical management above you like a VP of Engineering, yadda yadda....then it's closer to systematic failure of the company's engineering practices than "mistake."

There is a reason we call it software engineering, not software fuckarounding (or, cough, "DevOps Engineeer".)

Software engineering practices assume people are going to make mistakes, and implements procedures to reduce the chances of that making it into production, and reduce the impact of those mistakes if they do make it into production.

jnsaff2

7 replies

11h31m

2024-06-10 06:58:47 UTC

This is how I felt when a Gitlab employee deleted the production database by doing it in the wrong terminal window.

phist_mcgee

2 replies

11h9m

2024-06-10 07:20:31 UTC

Get into the habit of colour-coding your important SSH sessions, red/green/blue can give you a very powerful subconscious check before doing something very silly.

bloqs

1 replies

9h45m

2024-06-10 08:44:53 UTC

How do you do this so that it lasts? I find mine resets to default

Edd1CC

0 replies

8h40m

2024-06-10 09:49:32 UTC

Can’t you set it in the .zshrc file?

codedokode

1 replies

8h50m

2024-06-10 09:39:39 UTC

Why do employees have write access to production DB?

netdevnet

0 replies

7h32m

2024-06-10 10:57:24 UTC

Lazy attitude towards proper role management and poor engineering practices. More common in small companies or small teams managing their own service (db and app)

Really all you need is logging and potentially temporary read access to the db if you need some info that you can't derive from the logs.

jnsaff2

0 replies

7h19m

2024-06-10 11:10:24 UTC

Post mortem for the outage I referred to.

https://about.gitlab.com/blog/2017/02/10/postmortem-of-datab...

HenryBemis

0 replies

10h58m

2024-06-10 07:31:29 UTC

what phist wrote. Color code the backgrounds of your servers, different colors. So anyone who connects to 'take console' in any system is hit by a blinding electric Green/Blue/Red/Yellow and other striking colors.

I assume that all systems already have descriptive names App_DEV_Server1, App_PROD_Server5, etc.

It also helps if (ofc they would be right??) in separate IP groups/WLANS?

If you are running Windows, it's a good idea to use BGINFO.exe by SysInternals (or Winternals as we old people still call it), and display the most relevant info (showing Dev/Prod/UAT/etc.) with big-big-big letters.

gnfargbl

1 replies

11h2m

2024-06-10 07:27:09 UTC

I agree, but in fairness, engineering mistakes do happen all the time, in every organisation. A good engineering culture enables mistakes to be acknowledged and reviewed in an emotionally neutral manner, ideally leaning to a learning experience.

Being on the receiving end of an internet pile-on of "OMG you idiots everyone knows the first thing you do when setting up a flerble cluster is spend a week installing grazoono monitoring!" is not conducive towards building a good engineering culture.

lomase

0 replies

8h31m

2024-06-10 09:58:51 UTC

If I configure yaml or json files with a markov chain am I a idiot? What if I use chat gpt?

lpapez

0 replies

11h17m

2024-06-10 07:11:58 UTC

If you're earning past six figures, are part of a team of programmers...

Compensation is in no way correlated with good engineering practices.

They might be paid much because they're developing something which people are willing to pay for, it doesn't have to be "real engineering".

asdfasdfasdf222

0 replies

7h40m

2024-06-10 10:49:35 UTC

Calling yourself an engineer doesn’t make you one.

kissgyorgy

3 replies

8h55m

2024-06-10 09:34:43 UTC

If they would have made that mistake by writing code and just misunderstood something or oversaw the problem, fine. But making this mistake by copy-pasting from ChatGPT without proper review is just terrible.

sokoloff

2 replies

8h44m

2024-06-10 09:45:37 UTC

I don’t find the source of the error being a careless human writing original code without proper review vs a careless human copy/pasting code without proper review to be significantly different.

kissgyorgy

1 replies

5h26m

2024-06-10 13:03:05 UTC

The difference is that everyone knows about hallucinations, so an LLM never can be trusted by default, they still trusted it blindly.

sokoloff

0 replies

4h30m

2024-06-10 13:59:04 UTC

Is that really worse than trusting blindly the code of a human, about which "everyone knows about" the bugs that humans write as well.

madiele

1 replies

12h14m

2024-06-10 06:15:45 UTC

Seems like he changed the subdomain or something, the article is still up

https://0912i390129ionkjan.bearblog.dev/how-a-single-chatgpt...

boxed

0 replies

11h19m

2024-06-10 07:10:11 UTC

Also 404 now

2510c39011c5

0 replies

9h18m

2024-06-10 09:11:23 UTC

the page on archive.org could not be loaded now...

but google cache still serves a copy...

https://webcache.googleusercontent.com/search?q=cache%3Ahttp...

cmeacham98

16 replies

20h46m

2024-06-09 21:43:23 UTC

I understand how the mistake was made, it seems relatively easy to slip by even when writing code without ChatGPT.

But what I don't understand is how this wasn't caught after the first failure? Does this company not have any logging? Shouldn't the fact the backend is attempting to reuse UUIDs be immediately obvious from observing the error?

andrewxdiamond

7 replies

20h45m

2024-06-09 21:44:55 UTC

They didn’t even know there was an error until the customers came ringing. You always want to know what errors happened before your customers do, logging, alerting, any monitoring at all would have helped them here.

lawgimenez

5 replies

14h55m

2024-06-10 03:33:59 UTC

Experience should tell you to always take a hard look at anything UUIDs.

stouset

4 replies

14h11m

2024-06-10 04:18:23 UTC

Also facepalms here: UUIDs as strings and UUIDv4.

UUIDs are just 128-bit values. They might be conventionally encoded for humans as hex, but storing them as 36-byte (plus a few more for length) strings is a pointless waste of both space and performance.

Kesty

2 replies

11h13m

2024-06-10 07:16:29 UTC

They don't have logs and commit directly to production 10/20 times a day.

I don't think 128 bits vs 36 byte performance it's a main concern right now

stouset

0 replies

3h36m

2024-06-10 14:53:14 UTC

Probably not, but this is something that would have taken all of ten seconds to get right. And the size and performance impact is multiplied at every tier of their application.

It’s also not just the size itself. Despite being fixed-size in practice, these are variable-sized strings in application code which now means gajillions of pointless allocations and indirection for everything. There are a ton of knock-on performance consequences here, all on the most heavily-used columns in your data model.

Worst of all should they actually succeed, this is going to be absolutely excruciating to fix.

bombela

0 replies

9h15m

2024-06-10 09:14:43 UTC

It is most likely not a performance issues right now, but every pessismisation compound and catch up to you eventually.

36B vs 16B today, tomorrow you need an array of it, and now it isn't cache aligned, and more than twice the overhead.

Most likely instead of manipulating a 36B fixed length string, it is handled as a dynamic string, for extra runtime memory allocations, most likely consuming at least 64B per allocation. Etc etc.

Do this all over the codebase and now you know why all the moderne software is a sloth on what was a supercomputer 30y ago.

sgarland

0 replies

5h47m

2024-06-10 12:42:07 UTC

IFF they’re using MySQL (doubtful), it’s a common mistake due to there not being a native type, and needing to know to use BINARY(16) and casting the string back and forth in code.

But in either case – MySQL or Postgres – they’ve still made the classic mistake of using a UUID as a PK, which will tank performance one way or another. They’ll notice right around when it starts to matter.

theamk

0 replies

15h14m

2024-06-10 03:15:50 UTC

They had 5 days... they knew after the first night

faizshah

2 replies

14h13m

2024-06-10 04:16:40 UTC

This sort of process failure is why there’s a Correction of Errors (postmortem) process at Amazon: https://aws.amazon.com/blogs/mt/why-you-should-develop-a-cor...

Specifically asking why did it take so long to detect and why did it take so long to diagnose is useful in these situations.

treflop

1 replies

13h51m

2024-06-10 04:38:23 UTC

I’ve seen two types of people approach problems:

Type 1 tries to find the error message and figure out what it really, really means by breaking down the error message and system.

Type 2 does trial and error on random related things until the problem goes away.

I hate to say that I've seen way more type 2s engineers than type 1, but maybe I’m working at the wrong companies.

faizshah

0 replies

10h53m

2024-06-10 07:36:28 UTC

The way you want to build and lead an engineering team is so that the individual engineer’s approach to the problem doesn’t matter. The idea is to establish deployment safety and ops mechanisms so Type 2 isn’t even possible and your Sr. Engineers coach the jr. engineers in how to use these mechanisms effectively through training and runbooks. The idea of COE is to figure out what mechanisms (availability tests, metrics/alarms, logs, unit tests) were missing so that you can fill those gaps.

drexlspivey

1 replies

20h37m

2024-06-09 21:52:16 UTC

Five days to figure out this bug is pretty crazy ngl

fragmede

0 replies

11h19m

2024-06-10 07:10:48 UTC

it wasn't 5 days of only working on this one problem though, it was over 5 calendar days. even if something is gonna take me one day to implement, it's gonna take three days to get enough focus time between all the other meetings and fires to put out in order to actually get a day's worth of coding done

whakim

0 replies

17h25m

2024-06-10 01:04:34 UTC

Yeah, I agree with this here. I think it's totally reasonable that something as specific as multiple Stripe subscriptions wouldn't be exercised by normal unit testing; as mentioned in the post, this wouldn't have been an easy error to reproduce via an acceptance test; and I think the focus on ChatGPT is overblown (by both the OP and everyone else) and mistakenly passing a String instead of a Callable to a function that accepts either happens all the time. My gut instinct is that not using an ORM would have prevented this particular issue, but that may just be my bias against ORMs speaking; one could easily imagine a similar bug occurring in a non-database context. My real conclusion is that all the folks crowing that they would have definitely caught this bug are either much better engineers than I am, or (more likely) are just a bit deluded about their own abilities.

I am also very confused about the apparent lack of logging or recourse to logging. It's been a while, but if I recall correctly ECS should automatically propagate the resulting Duplicate Key exceptions which were presumably occurring to CloudWatch without a bunch of additional configuration - was that not happening? If it was happening, did no one think to go check what types of Exceptions were happening overnight?

islewis

0 replies

17h11m

2024-06-10 01:18:46 UTC

IMO this is the real issue.

I guarantee you that they _will_ have another production bug like this sometime in the future (every fast paced project will). You'd hope this next one wont take 5 days to identify.

Kesty

0 replies

11h15m

2024-06-10 07:14:40 UTC

Everything can be understandable if this is a small first personal project of someone.

Here we are talking 1.65 MILLION CAD $ backed YC company

chrisjj

16 replies

21h3m

2024-06-09 21:26:48 UTC

Thanks for telling.

Bookmarked for the next time we're told ChatGPT's code error rate is acceptable because we review its code just like an intern's.

morgante

8 replies

20h49m

2024-06-09 21:40:12 UTC

Honestly I'm not sure why ChatGPT has anything to do with this problem.

I remember making the exact same mistake (accidentally using a single function call in a schema) back in 2010. No LLMs required.

The bigger culprit is probably a lack of testing / debugging. This error would immediately get caught if you simply registered twice on a test instance.

fzeroracer

4 replies

20h21m

2024-06-09 22:08:20 UTC

In a world where this entire codebase wasn't generated by ChatGPT, you'd have engineers familiar with the various parts of the system to quickly identify and fix the problem.

Testing and debugging isn't just a matter of stepping through code, it's an exercise of seeing where your mental model of the codebase is faulty versus the current reality of it.

I've encountered similar problems and they'd be fixed in a matter of hours, not days.

audiodude

1 replies

17h45m

2024-06-10 00:44:24 UTC

Yes I don't understand the perceived benefits of re-writing the code in Python/FastAPI if none of them know Python/FastAPI.

chrisjj

0 replies

2h17m

2024-06-10 16:12:33 UTC

"Requirements"?

moody__

0 replies

19h40m

2024-06-09 22:49:44 UTC

This is spot on, the issue is not the mistake per se but in creating a code base that the team themselves are not familiar with. With some intern or team member generated code you can sit down with them and have them walk you through the code and introduce you to their reasoning, but you can't do that with an LLM. The author even admits to just mimicking the existing structure that the LLM started with when they had to expand, which sounds like a first commit for a team member first getting familiar with some new code. Part of the benefit of being able to write your own code is that you can do it in a way that clicks for you. Hopefully this lets you debug and extend it efficiently. I don't know why someone would squander this opportunity.

chrisjj

0 replies

2h18m

2024-06-10 16:11:23 UTC

In a world where this entire codebase wasn't generated by ChatGPT, you'd have engineers familiar with the various parts of the system to quickly identify and fix the problem.

I wonder why this victim didn't ask ChatGPT to identify and fix the problem...

yen223

0 replies

14h49m

2024-06-10 03:40:10 UTC

This error would immediately get caught if you simply registered twice on a test instance

Friendly reminder: check if your codebase is actually testing this!

One of the interesting consequence of running unit tests with a fresh database everytime is that problems related to unique constraints seldom get caught by unit tests.

dragonwriter

0 replies

17h22m

2024-06-10 01:07:19 UTC

Honestly I'm not sure why ChatGPT has anything to do with this problem.

Because if humans had written and reviewed the code, multiple team members would have had to have learned Python, and SQLAlchemy specifically, which, even if the mistake was initially made as many times as ChatGPT did, there would have been multiple independent opportunities for it to be caught and questioned and the relevant knowledge shared during development.

ChatGPT may be able to crank out immense volumes of superficially functional code, but if its your only “teammate” that understands the libraries used and touches the code, its a huge single point of failure.

chrisjj

0 replies

2h20m

2024-06-10 16:09:01 UTC

Honestly I'm not sure why ChatGPT has anything to do with this problem. I remember making the exact same mistake (accidentally using a single function call in a schema) back in 2010. No LLMs required.

So /that's/ where ChatGPT learned it! :)

Closi

5 replies

20h43m

2024-06-09 21:46:48 UTC

Your take-away is that this is ChatGPT's fault rather than a failure of testing?

asddubs

1 replies

20h35m

2024-06-09 21:54:55 UTC

i can see missing this during testing since it relies on doing the flow twice. though i have a hard time imagining a team of people not figuring this out in less than 5 days

jackspawn

0 replies

20h25m

2024-06-09 22:04:40 UTC

to be fair, they didnt think it was a problem at the beginning, which can happen... if you made a mistake with your logging/monitoring

probably_wrong

0 replies

10h5m

2024-06-10 08:24:51 UTC

There is a comment higher up in this thread [1] about how "ChatGPT 4-o can spot the error immediately" which I believe exemplifies the point the parent is making, namely, that any criticism of ChatGPT is met with "that's only because you didn't use enough of it", aka the "more cowbell" defense.

If HN is to be believed you should use ChatGPT to generate your data insertion code, to port it to a different language, and to ask for which error it made when you asked before. What to do if an error slips through in this version is always an exercise left for the reader.

[1] https://news.ycombinator.com/item?id=40630906

hansvm

0 replies

15h40m

2024-06-10 02:49:02 UTC

Their take-away is that when people tell them ChatGPT produces excellent code you have a nice example to the contrary. The business had many failures, not the least of which is that `default = ... foo() ...` should have jumped out in a cursory glance at the code as needing further inspection, almost no matter which language the developers in question are most comfortable with.

chrisjj

0 replies

1h40m

2024-06-10 16:49:25 UTC

My takeaway is fault lies with s/he who trusted ChatGPT.

ben_w

0 replies

20h43m

2024-06-09 21:46:24 UTC

If you're treating ChatGPT code differently than your interns' code, you're going to miss some serious issues regardless of which one isn't under a spotlight.

I had to fix some intern code once, and… well, I can't give too many details, but I will say that an FAQ shouldn't consist entirely of quotations from a TV show from a different country in a language the app doesn't support.

primitivesuave

15 replies

13h48m

2024-06-10 04:41:13 UTC

I spotted the error instantly. With all due respect to your team - this has nothing to do with ChatGPT and everything to do with using a programming model that your team does not have sufficient expertise in. Even if this error managed to slip by code review, it would have been caught with virtually any monitoring solution, many of which take less than 5 minutes to set up.

KennyBlanken

6 replies

12h44m

2024-06-10 05:45:04 UTC

It's not some innocent mistake. The title is purposefully clickbait / keyword-y, implying that it was chatgpt that made the 'mistake' for SEO and to generate panicked clicks.

"We made a programming error in our use of an LLM, didn't do any QA, and it cost us $10k" doesn't generate the C-suite "oh shit what if ChatGPT fucks up, what's our exposure!?" reaction. There's a million middle and upper management posting this article on LinkedIn, guaranteed.

It's like the Mr. Beast open-mouth-surprised expression thumbnail nonsense; you feel incredibly compelled to click it.

While we're on the subject: LLMs can't make "mistakes." They are not deterministic.

They cannot reason, think, or do logic.

They are very fancy word salad generators that use a lot of statistical probabilities. By definition they're not capable of "mistakes" because nothing they generate is remotely guaranteed to be correct or accurate.

Edit: The mods boosted the post; it got downvoted into oblivion, for obvious reasons, and then skyrocketed instantly in rank, which means they boosted it: https://hnrankings.info/40627558/

Hilarious that a post which is insanely clickbait (which the rules say should result in a title rewrite) got boosted by the mods.

I'm sure it's a complete coincidence that the story was apparently authored by someone at a Ycombinator company: https://news.ycombinator.com/item?id=40629998

vsuperpower2020

1 replies

12h22m

2024-06-10 06:07:06 UTC

By definition they're not capable of "mistakes" because nothing they generate is remotely guaranteed to be correct or accurate.

This makes no sense. Only things that are guaranteed to be correct or accurate can make mistakes? Everyone knows what "mistake" means in this context. Nobody cares what your preferred definition of mistake is.

moritzwarhier

0 replies

6h21m

2024-06-10 12:08:13 UTC

A mistake is usually seen as something that happens when someone (or metaphorically also, something) makes an error, but is capable of solving similar problems through understanding.

Hard to put in words.

But that's roughly what is concerning about the "AI makes mistakes" narratives.

It implies they are caused by a (fixable) fault in reasoning or memory.

LLM "AI" will always respond that it made a "mistake" when you correct it.

It is trained to do so, and humans often behave similarly.

It is hard to come up with a good definition for "mistake", yes.

That does not change that using this in case of LLM hallucinations is misleading.

brabel

1 replies

12h6m

2024-06-10 06:23:21 UTC

By definition they're not capable of "mistakes" because nothing they generate is remotely guaranteed to be correct or accurate.

By that logic, nothing is capable of making mistakes :D.

Hilarious that a post which is insanely clickbait (which the rules say should result in a title rewrite) got boosted by the mods.

You have a distorted view of what clickbait is and the rules of this site. I suggest you go calm down and try to stop hating on a technology which is just that: a technology! Like any other, it can be misused, but think about why exactly you feel so passionate about this particular technology.

multjoy

0 replies

11h22m

2024-06-10 07:07:44 UTC

Why do you feel bound to defend this particular technology?

It is, after all, incapable of feeling hurt.

cbozeman

0 replies

11h57m

2024-06-10 06:32:15 UTC

The only part of your post that I do not agree with is:

It's like the Mr. Beast open-mouth-surprised expression thumbnail nonsense; you feel incredibly compelled to click it.

I feel incredibly compelled to ignore it.

Akronymus

0 replies

10h44m

2024-06-10 07:45:33 UTC

It's like the Mr. Beast open-mouth-surprised expression thumbnail nonsense; you feel incredibly compelled to click it.

Sponsorblock is great for combating that. (Altough, I conciously avoid channels that mostly do clickbait anyways)

arjvik

3 replies

13h24m

2024-06-10 05:04:57 UTC

To be fair, if I wasn't looking for this bug I never would have spotted it. That being said, you're entirely right that any monitoring or even the most basic manual testing should have instantly caught this.

astromaniak

1 replies

13h3m

2024-06-10 05:26:29 UTC

Easy enough, ChatGPT can write verification code along the main line. Just ask.

rvnx

0 replies

11h53m

2024-06-10 06:36:02 UTC

+ humans could have done this mistake as well

Akronymus

0 replies

10h45m

2024-06-10 07:44:03 UTC

Those kinda issues are why I ALWAYS make an integration test with calling the same insert multiple times.

I did step into that particular trap more than once (passing the result, rather than the function)

fragmede

1 replies

11h40m

2024-06-10 06:49:21 UTC

Interestingly, you know who else spotted the error? ChatGPT-4o. Annoyingly you can't share a chat with an image in it, but pasting in the image of the bad code, and prompting "whats wrong with the code" got ChatGPT to tell me that:

* UUID Generation in Primary Key: The default parameter should use the callable uuid.uuid4 directly instead of str(uuid.uuid4()). SQLAlchemy will call the function to generate the value.

* Date Default Value: server_default=text("(now())") might not work as expected. Use func.now() for server-side defaults in SQLAlchemy.

* Import Statements: Ensure uuid and text from sqlalchemy are imported.

* Column Definitions: Consider using DateTime(timezone=True) for datetime columns to handle time zones.

It then provided me with corrected code that does

    id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()), unique=True, nullable=False)

where the addition of lambda: fixes the problem.

Kesty

0 replies

11h25m

2024-06-10 07:04:22 UTC

ChatGPT-4o might spot it when asking about the code directly, but this was a conversion from js to python, errors where chatgpt/copilot or any other AI will allucinate or make mistakes to be as close as the original code are very common in my experience.

The other common issue is if the original code has thinsg chatgpt doesn't like (misspell, slightly wrong formatting) it will fix it automatically, or if he really think you should have added a particular field you didn't add.

kachapopopow

0 replies

12h25m

2024-06-10 06:04:55 UTC

Having no real experience with python I would assume uuid.uuid4() was some schema definition (like in prisma), so honestly the fact that this bug exists is not surprising at all and I would have done the same mistake myself, but yah one kubectl logs would have been able to catch it immediately.

...also from next.js and prisma to python? ...what?

aforwardslash

0 replies

11h3m

2024-06-10 07:26:20 UTC

This. And it seems the team wasn't able to do basic troubleshooting from either database or application log. This was a simple error - what will happen when transient errors (such as implicit locks on tables, etc) occurs. These guys shouldn't be writing code - at all.

kazinator

14 replies

15h10m

2024-06-10 03:19:42 UTC

I've written less than 1000 lines of Python in total probably, but I correctly spotted the problem.

Python has this misfeature whereby it didn't correctly crib Common Lisp's evaluation strategy for the expressions that give default values to optional function arguments.

When you have an argument like foo=obj.whatever() the obj.whatever() is evaluated (would you believe it!) at the time the definition of the function is being processed, not at the time when the function is being called and the value is needed.

Is suspect this was done on purpose, for efficiency. Python has another misfeature: it has no literal syntax for certain common objects like lists. There is [1, 2, 3], but that is a constructor and not a literal: it has to create a new list every time it is evaluated and stuff it with 1, 2, 3. (Unless a clever compiler can prove that this can be optimized away without harm.)

The designer didn't want a parameter like list=[] to have to construct a new, empty list object each time the argument is omitted. In Lisp '(1 2 3) and '() are true literals. Whenever they are referenced, they denote the same object. The programmer has a choice here: they can use (list 1 2 3) as the default value expression or '(1 2 3). The former is like [1, 2, 3]: it yields a new object each time that is mutable; the other will (almost certainly) yield the same object and cannot be reliably, portably modified.

Hey, modern popular languages have most of the features of Lisp, so you're not missing anything.

sdwr

11 replies

14h25m

2024-06-10 04:04:27 UTC

When you have an argument like foo=obj.whatever(), the obj.whatever() is evaluated at the time the definition of the function is being processed, not at the time when the function is being called.

This can't be correct, surely? What if .whatever() relies on internal state that changes after obj is initialized (or after the function surrounding foo is declared, not sure what you're saying)?

MrJohz

7 replies

14h13m

2024-06-10 04:16:52 UTC

It is correct, it's one of the most surprising things about Python and it causes a number of mistakes, even for experts.

The easiest way to see this is by running something like this and seeing what gets printed out and when:

    print("1. start")
    
    def function(arg=print("2. func definition")):
        print("4. func call")
    
    print("3. after definition")

    function()
    function()
    function()

You should see that the print statement in the default position is called once, when the function definition itself is being evaluated, not when the function gets called.

raverbashing

2 replies

12h36m

2024-06-10 05:53:32 UTC

it's one of the most surprising things about Python and it causes a number of mistakes, even for experts.

Except every python 101 text seems to go over it, and people seem to have suddenly forgotten about it

ChatGPT driven development maybe?

MrJohz

1 replies

12h23m

2024-06-10 06:06:11 UTC

This has been a source of bugs since long before ChatGPT. I suspect Python tutorials mention it so often because it's such a surprising feature and causes so much confusion. I've been using Python for fifteen years, and it's the sort of thing that I'll still absent-mindedly forget if I'm not careful.

int_19h

0 replies

12h9m

2024-06-10 06:20:26 UTC

Ironically, in C++ - which does it the other way around - the tutorials mention that often because it is also such a surprising feature (i.e. people do not expect evaluation to occur every time there).

Thing is, there's no obviously correct behavior here, and there are valid arguments to be made either way. Which is why many languages dodge the bullet by only allowing for compile-time constants as defaults in that context (and if you want to evaluate something at runtime, you can always use an optional and do an explicit check inside the body, or provide an overload).

fragmede

1 replies

14h3m

2024-06-10 04:26:42 UTC

My canonical example is that you want a function to have an argument that defaults to an empty list:

    def func(arg=[]):
        arg.append(1)
        print(arg)
    func()
    func()
    func()

shoving a print into a function definition is weird and not something you'd do normally. But someone who doesn't know this footgun is going to write a function that defaults to an empty list, and then tear their hair out when things are broken.

MrJohz

0 replies

12h19m

2024-06-10 06:10:21 UTC

That's true, this is the example that most people will run into in practice. But the print example is useful because it shows what's going on more explicitly: the default argument gets evaluated as part of the function's definition. I think this helps people get a better intuition for how Python's interpreter works when it comes to evaluating function definitions.

johnny22

0 replies

13h34m

2024-06-10 04:55:34 UTC

I find this fascinating because while I don't write much python, this is the behaviour I would assume to be correct based on everything else I've seen. I wouldn't expect it to be evaluated on every call. Definitely shows that your previous familiarity can trip you up (or not)

OptionOfT

0 replies

13h59m

2024-06-10 04:30:35 UTC

And because it is evaluated once, unlike in say TypeScript, when you modify `arg` in the first call, the results are visible in the second call.

chii

1 replies

14h16m

2024-06-10 04:13:15 UTC

https://stackoverflow.com/questions/1132941/the-mutable-defa... and https://www.valentinog.com/blog/tirl-python-default-argument...

basically, having a default argument value in a function definition means to evaluate it during definition time of that function, not when the function is invoked.

This is a foot gun.

bushbaba

0 replies

14h6m

2024-06-10 04:23:00 UTC

if the default argument is an object it’s reused between invocations. Hence why setting default parameters to empty list / empty dict is flagged by static analysis suites

kazinator

0 replies

14h8m

2024-06-10 04:21:49 UTC

There is an obj visible at the point of the function definition. When that function definition takes place, obj.whatever() is called, and the value stashed away. Thus the call obj.whatever() has to work at that time. The stashed value is retrieved whenever the corresponding function argument is missing.

voidnap

0 replies

14h4m

2024-06-10 04:25:42 UTC

While you are correct, that is not what is happening in the blog post. Their issue was inside a class definition, not a function.

lispm

0 replies

11h37m

2024-06-10 06:52:02 UTC

Good explanation...

animex

13 replies

13h59m

2024-06-10 04:30:43 UTC

No, ChatGPT made you the money that your app generated since you had no ability to implement it otherwise/without ChatGPT. Your inability to code, debug, log, monitor cost you the $10k. ChatGPT is net positive in this story.

cuppypastexprt

5 replies

12h46m

2024-06-10 05:43:45 UTC

Looking at this team's project at github.com/reworkd, it clearly tells the maturity of the product as well as the team. Emoji driven development. Emoji's for all commit messages. Monkeys, bananas, rockets, fireworks, you name it, they have it in their commit message.

wg0

0 replies

11h10m

2024-06-10 07:19:08 UTC

Just wondering what's the end game of these AI startups. YC would reject many other reasonably sound ideas I hear but this space is highly speculative and I don't see any such down-the-stream-chatgpt-wrapper startup reaching a billion dollar IPO.

shawabawa3

0 replies

10h13m

2024-06-10 08:16:45 UTC

the emoji's seem fine

instead of `bug: fix blah`, it's `:bug:: fix blah`, which, honestly actually seems clearer and easier to parse at a glance

edit: hacker news doesn't support unicode emojis

mulmboy

0 replies

11h14m

2024-06-10 07:15:31 UTC

sure it's casual but the things you're pointing out are merely an alternative style and don't suggest lack of maturity. judging a book by its cover

cube00

0 replies

12h23m

2024-06-10 06:06:55 UTC

The co-founder's PR descriptions aren't much better https://github.com/reworkd/AgentGPT/pull/1372#issue-19887599...

PUSH_AX

0 replies

11h25m

2024-06-10 07:04:10 UTC

Commit messages are overrated.

kidme5

4 replies

13h16m

2024-06-10 05:13:54 UTC

$10k.. peanuts. Elon might lose $500B to his xAI mistake that came out today:

https://grook.ai/share?id=e269e88a7b1a71eff4f176c864b30161&x...

kidme5

1 replies

12h35m

2024-06-10 05:54:16 UTC

post: https://news.ycombinator.com/item?id=40628584

tgv

0 replies

10h8m

2024-06-10 08:21:09 UTC

Which is flagged for some reason.

appplication

0 replies

4h23m

2024-06-10 14:06:28 UTC

What an I supposed to see here? What’s the mistake/tie in to him losing something

Arn_Thor

0 replies

11h14m

2024-06-10 07:15:48 UTC

What’s the mistake? Putting on the “Captain Obvious” cape and summarizing news clippings?

rvnx

0 replies

11h51m

2024-06-10 06:38:09 UTC

Especially for a $20/month cost...

majormajor

0 replies

13h45m

2024-06-10 04:44:47 UTC

It was already implemented, seems like they had ability there:

Our project was originally full stack NextJS but we wanted to first migrate everything to Python/FastAPI.

What happened was that as part of our backend migration, we were translating database models from Prisma/Typescript into Python/SQLAlchemy. This was really tedious. We found that ChatGPT did a pretty exceptional job doing this translation and so we used it for almost the entire migration.

ChatGPT wasn't a net positive if they wouldn't have tried to do this migration up-front without it.

Possibly they had better error logging in the other stack, possibly they didn't, possibly they needed it less because they were actually writing the code for it themselves and knew how it worked.

("Write all the code a second time before turning on monetization" is itself an interesting decision, of course.)

la64710

11 replies

20h39m

2024-06-09 21:50:24 UTC

I dont get it , line 56 every time it was invoked would call the uuid4 function to generate a unique id isn’t it? Or was the issue due to the uuid4 function not getting invoked for any reason?

sjansen

2 replies

20h18m

2024-06-09 22:11:23 UTC

Line 56 is executed as the file is loaded. Simplified, the line is essentially:

  id = Column(default=str(uuid.uuid4()))

As written, a UUID is generated once and used to set a class-level attribute. Each Python process would generate a unique value, so it wouldn't be immediately obvious. Most of the time Python's ability to run code as a file is loaded is helpful, but this is one the well known gotchas.

Although I'm not a SQL Alchemy user, I assume the fix is essentially the same as it would be for Django. So the correct code would have been essentially:

  id = Column(default=uuid.uuid4)

Instead of executing `uuid4()` and caching a single UUID value, it would have executed `uuid4()` each time a new object was created.

jacob019

1 replies

17h54m

2024-06-10 00:35:00 UTC

default=lambda: str(uuid.uuid4())

pvillano

0 replies

17h29m

2024-06-10 01:00:33 UTC

ChatGPT, rewrite this as a named function

Sure,

```

UUID4 = uuid.uuid4()

def default_id():

    return UUID4

```

donatj

1 replies

20h34m

2024-06-09 21:55:23 UTC

I'm not a Pythonista, but I think the deal is that line 56 was only executed once, at class definition, so every time the server spun up, you got a new uuid that could only use once

la64710

0 replies

20h29m

2024-06-09 22:00:11 UTC

Ah so it would get invoked one time for the first user that resulted in a task (container ) initialization on the backed ECS cluster and once all 40 of the tasks were running new users will just reuse one of the running containers and already instantiated object in memory.

pfisherman

0 replies

13h25m

2024-06-10 05:04:16 UTC

You should generally be setting attributes in the init method of your class, unless you want it to act like a class attribute - shared by all instances. The exception is some special types of classes such as dataclasses or pydantic objects, which do it automatically but require a post init method if you want to do anything more than set values passed to the constructor.

Tbh, if they were writing something for prod they should have factored out the uuid generation and passed values into a pydantic dataclass w/ validators for all the fields. Just sayin.

crooked-v

0 replies

20h34m

2024-06-09 21:55:29 UTC

It's invoked once when the class is instanced in Python, then reuses that same value for as long as the class lives in memory.

crockeo

0 replies

20h34m

2024-06-09 21:55:24 UTC

The issue is that uuid4() would be called a single each time the app was launched when that code was first loaded. Each record produced by an individual instance of the app would have the same ID.

bakugo

0 replies

20h34m

2024-06-09 21:54:57 UTC

line 56 every time it was invoked would call the uuid4 function to generate a unique id isn’t it

Yes, but that line is only evaluated once when the class is declared (aka when the application starts), it's not evaluated for every instance.

TheCleric

0 replies

20h29m

2024-06-09 22:00:47 UTC

Since this looks like table definition code it likely would run once per db migration. So in all likelihood it ran once when they created the database tables which itself essentially ended up with a definition like “default=89abcdef-0000-0000-0000-fedcba987654”

Neywiny

0 replies

16h55m

2024-06-10 01:34:14 UTC

Interesting that more people weren't asking for more explanation on this. I feel like the article did a poor job but other commenters explained it well. More commonly I see this with `def f(x=list()):...` or similar. Linters will even warn about it. It really shows a lack of Python knowledge, so I'll jump on the bandwagon of rewriting this with such pressure was a bad idea.

hipadev23

11 replies

20h35m

2024-06-09 21:53:57 UTC

This is a great example of why SQLAlchemy is a terrible ORM, and chatGPT is not alone in making the same mistake as millions of engineers, in fact likely where it learned the mistake.

default = python code evaluates the default value as necessary for each new record

server_default = the initial CREATE TABLE uses this computed (from python) value, thus the hardcoded UUID. They also could have done server_default=text("uuid_generate_v4()") if they had that corresponding module installed on postgres.

viraptor

5 replies

20h28m

2024-06-09 22:01:38 UTC

I'm not sure what you think SQLalchemy can fix here? There has to be an option to pass a static default value and the library does not have a visibility into the parse tree. What's the proposed solution?

Nullabillity

2 replies

8h43m

2024-06-10 09:46:45 UTC

There has to be an option to pass a static default value

Does there, though? You can always set `Column(default = lambda: 1234)`

viraptor

1 replies

7h54m

2024-06-10 10:35:34 UTC

Yes, because you're creating a table and the value needs to be encoded as string in the CRATE TABLE query. You could in theory pass a lambda which is disassembled in SQLalchemy, then checked against the pattern returning a constant, then the constant itself gets used... But I'm not sure SQLalchemy would go that way. It's not as crazy as Linq.

To be explicit: You're not seeing a function called later in python. You're setting an attribute on a database table.

Nullabillity

0 replies

6h53m

2024-06-10 11:36:11 UTC

No[0], default= is inserted into the DML queries by the client. A DDL DEFAULT clause is only generated if you set server_default= instead.

[0]: https://docs.sqlalchemy.org/en/20/core/defaults.html

hipadev23

0 replies

17h21m

2024-06-10 01:08:29 UTC

I'm not sure what you think SQLalchemy can fix here

Separate table definition (DDL) from row-insertion (DML) definitions along with their corresponding defaults.

dragonwriter

0 replies

19h59m

2024-06-09 22:30:15 UTC

One solution would be to have a separate default (static value) and default_factory (callable that returns a value) keyword argument, and make a static default on a PK (or other unique) column, by default, be an error.

Lunrtick

3 replies

20h20m

2024-06-09 22:09:30 UTC

They could also have done default=uuid.uuid4 to have a new id each time, or default=lambda : str(uuid.uuid4()). It's not really related to whether or not it's a database default.

It seems quite unfair to place the blame on SQLAlchemy here, or even Python.

Even a statically typed language wouldn't prevent this kind of issue - the author of the code is the only person who can decide when they mean "use this exact string each time" or "use this function to give me a new string each time".

I suppose a column description API could follow the dataclass style definition, with different argument names for default and default_func. That would (I think) prevent this from happening.

jowea

2 replies

17h18m

2024-06-10 01:11:34 UTC

I could imagine some sort of static analysis saying that using a constant value instead of a function is a mistake for columns with unique set.

throw156754228

0 replies

10h49m

2024-06-10 07:40:20 UTC

Exactly, this is an error that should've been picked up statically, before unit even.

Lunrtick

0 replies

9h34m

2024-06-10 08:55:37 UTC

That's a good point! I suppose in Python that could be a guard in the column definition function.

hn_throwaway_99

0 replies

20h17m

2024-06-09 22:12:35 UTC

This is a great example of why SQLAlchemy is a terrible ORM

I feel like "terrible ORM" is a tautology.

This issue highlighted in one respect a fundamental problem with all ORMs and why I despise them so much - they're the absolute worst of a leaky abstraction, and as I like to say "they make the easiest stuff a bit easier, and they make the harder stuff way harder". E.g. in this specific case they don't prevent you from needing to know the details of default column value initialization, but apparently they must have also obscured a simple duplicate key constraint to some level that it took 5 days to find the root cause of this bug.

Just learn SQL. You'll need to know it anyway even if you do use an ORM, the basics aren't hard, the skill is much more transferable than the esoteric details of some ORM, and there are lots of good libraries that make dealing directly with SQL easy and safe (slonik for postgres is a favorite of mine, but there are other similar ones).

contextnavidad

11 replies

20h31m

2024-06-09 21:58:31 UTC

Our project was originally full stack NextJS but we wanted to first migrate everything to Python/FastAPI

This is the eye opener for me, how is a startup justifying a re-write when they don't even have customers?

KronisLV

5 replies

8h39m

2024-06-10 09:50:50 UTC

This is the eye opener for me, how is a startup justifying a re-write when they don't even have customers?

In my case (with a real project I'm working on now), it'd be due to realizing that C# is a great language and has a good runtime and web frameworks, but at the same time drags down development velocity and has some pain points which just keep mounting, such as needing to create bunches of different DTO objects yet AutoMapper refusing to work with my particular versions of everything and project configuration, as well as both Entity Framework and the JSON serializer/deserializer giving me more trouble than it's worth.

Could the pain points be addressed through gradual work, which oftentimes involves various hacks and deep dives in the docs, as well as upgrading a bunch of packages and rewriting configuration along the way? Sure. But I'm human and the human desire is to grab a metaphorical can of gasoline, burn everything down and make the second system better (of course, it might not actually be better, just have different pain points, while not even doing everything the first system did, nor do it correctly).

Then again, even in my professional career, I get the same feeling whenever I look at any "legacy" or just cumbersome system and it does take an active, persistent effort on my part to not give in to the part of my brain that is screaming for a rewrite. Sometimes rewrites actually go great (or architectural changes, such as introducing containers), more often than not everything goes down in a ball of flames and/or endless amounts of work.

I'm glad that I don't give in, outside of the cases where I know with a high degree of confidence that it would improve things for people, either how the system runs, or the developer experience for others.

neonsunset

3 replies

6h46m

2024-06-10 11:43:17 UTC

You don't need to make DTOs when you don't have to, using AutoMapper is considered a bad practice and is heavily discouraged (if you do have to use a tool like that, there are alternatives like Mapperly which are zero-cost to use and will give you built-time information on what doesn't map without having to run the application).

Hell, most simple applications could do with just a single layer - schema registration in EF Core is mapping, or at most two, one for DB and one for response contracts.

Just do it the simplest way you can. I understand that culture in some companies might be a problem, and it's been historically an issue plaguing .NET, spilling over, originally, from Java enterprise world. But I promise you there are teams which do not do this kind of nonsense.

Things really have improved since .NET Framework days, EF Core productivity wise, while similar in its strong areas, is pretty much an entirely new solution everywhere else.

KronisLV

2 replies

6h22m

2024-06-10 12:07:49 UTC

You don't need to make DTOs when you don't have to, using AutoMapper is considered a bad practice and is heavily discouraged (if you do have to use a tool like that, there are alternatives like Mapperly which are zero-cost to use and will give you built-time information on what doesn't map without having to run the application).

The thing is, that you'll probably have entities mapped against the database schema with data that must only conditionally be shown to the users. For example, when an admin user requests OrderDetails then you'll most likely want to show all of the fields, but when an external user makes that request, you'll only want to show some of the fields (and not leak that those fields even exist).

DTOs have always felt like the right way to do that, however this also means that for every distinct type of user you might have more than one object per DB table. Furthermore, if you generate the EF entity mappings from the schema (say, if you handle migrations with a separate tool that has SQL scripts in it), then you won't make separate entities for the same table either. Ergo, it must be handled downstream somewhere.

Plus, sometimes you can't return the EF entities for serialization into JSON anyways, since you might need to introduce some additional parsing logic, to get them into a shape that the front end wants (e.g. if you have a status display field or something, the current value of which is calculated based on 5-10 database fields or other stuff). Unless it's a DB view that you select things from as-is, though if you don't select data based on that criteria, you can get away by doing it in the back end.

Not to say that some of those can't be worked around, but I can't easily handwave those use cases away either. In Java, MapStruct works and does so pretty well: https://mapstruct.org/ I'd rather do something like that, than ask ChatGPT to transpose stuff from DDL or whatever, or waste time manually doing that.

I'll probably look into Mapperly next, thanks! The actual .NET runtime is good and tools like Rider make it quite pleasant.

neonsunset

1 replies

5h40m

2024-06-10 12:49:42 UTC

C# differs quite a bit from Java particularly in surrounding libraries, and there is a learning curve to things...the expectation that it's another Java ends up misleading many people :)

I'm sure MapStruct would also require you to handle differences in data presentation, in a similar way you would have to do with Automapper (or Mapperly). .NET generally puts more emphasis on "boilerplate-free happy path + predictable behavior" so you don't have autowire, but also don't have to troubleshoot autowire issues, and M.E.DI is straightforward to use, as an example. In terms of JSON (with System.Text.Json), you can annotate schema (if it's code-first) with attributes and nullability, so that the request for OrderDetails returns only what is available per given access rights scope. In either case different scopes of access to the same data and presentation of such is a complex topic.

Single-layer case might be a bit extreme - I did use it in a microservice-based architecture as PTSD coping strategy after being hard burned by a poor team environment that insisted on misusing DDD and heavy layering for logic that fits into a single Program.cs, doing a huge disservice to the platform.

Another popular mapping library is Mapster: https://github.com/MapsterMapper/Mapster, it is more focused on convenience compared to Mapperly at some performance tradeoff, but is still decently fast (unlike AutoMapper which is just terrible).

For fast DTO declaration, you can also use positional records e.g. record User(string Name, DateOnly DoB); but you may already be aware of those, noting this for completeness mostly.

Overall, it's a tradeoff between following suboptimal practices of the past and taking much harder stance on enforcing simplicity that may clash with the culture in a specific team.

KronisLV

0 replies

4h59m

2024-06-10 13:30:31 UTC

Thank you so much for sharing your experiences, lots of useful suggestions here!

lomase

0 replies

8h29m

2024-06-10 10:00:15 UTC

The problem here is using overengeniered libraries, not using c#

danpalmer

2 replies

18h41m

2024-06-09 23:48:34 UTC

And rewriting into a language that they lack experience in so much so that they can’t spot what are in my opinion really quite obvious bugs.

patates

1 replies

12h38m

2024-06-10 05:51:08 UTC

a language that they lack experience in

Perhaps also the tooling because any remotely decent IDE should show an error there, let alone the potential warnings of some code analysis software.

rsynnott

0 replies

11h14m

2024-06-10 07:15:20 UTC

Who needs static analysis when you can put your trust in a magic robot?

(This is one thing that baffles me about the “let’s use LLMs to code” movement; a lot of the proponents don’t seem to be just adding it as a tool (I don’t think it’s a terribly _useful_ tool, but whatever, tastes differ), but using it as the only tool, discarding 50 years worth of progress.)

whywhywhywhy

0 replies

7h49m

2024-06-10 10:40:19 UTC

ChatGPT's only failing in this was it's abilities making them think a pre-launch rewrite was a sensible use of runway and easy.

joshstrange

0 replies

19h47m

2024-06-09 22:42:54 UTC

Dear god, I thought I was taking crazy pills. After saying the same thing (a rewrite this early is insane) I was scanning the comments and no one else was pointing this out. I have no clue what would drive someone to rewrite this early (with or without customers) for what is effectively a lateral move (node to python). If you had hundreds of customers and wanted to rewrite in Go or similar then maybe (I still question even that).

dalemhurley

10 replies

18h3m

2024-06-10 00:26:50 UTC

I have seen the same mistake made in code created by humans. Many times, especially in react / typescript/ JavaScript, someone will forget to use a lambda.

I felt the blog post failed to articulate the root cause of the issue and went straight to blaming ChatGPT.

When you rush and make large or non peer code reviewed commits to main it is going to happen.

The real issue was when you rush, take shortcuts and don’t adequately test and peer code review then errors will occur.

I would have imagined that a test that tried a few different signup options would have found the issue immediately.

kfarr

3 replies

17h38m

2024-06-10 00:51:49 UTC

Yeah ChatGPT is a red herring -- it doesn't matter what generates the code, it's what you do with it.

whoknowsidont

2 replies

14h44m

2024-06-10 03:45:44 UTC

Surely current events explains why ChatGPT is topical?

dalemhurley

0 replies

13h55m

2024-06-10 04:34:45 UTC

Topical or not, blaming ChatGPT is only scratching the surface.

To be truly reflective, OP needs to dive into the real reason their code had this issue. It wasn’t using GPT, it was not having the controls in place.

Dylan16807

0 replies

14h14m

2024-06-10 04:15:14 UTC

You can have topical red herrings.

hedora

2 replies

14h4m

2024-06-10 04:25:22 UTC

My mental model for ChatGPT is that it’s an entry-level engineer that will never be promoted to a terminal level and will eventually be let go.

However, this engineer can type infinitely fast, which means it might be useful if used very carefully.

Anyway, letting such a person near financially important code would lead to similar issues, and in both cases, I’d question the judgment of the person that decided to deploy the code at all, let alone without much testing.

vsuperpower2020

0 replies

12h43m

2024-06-10 05:46:08 UTC

I want to reclaim the word engineer from Americans. Turns out everyone is an engineer, even the chatbots.

TeMPOraL

0 replies

12h16m

2024-06-10 06:13:28 UTC

This is kind of how it works with Real Engineering™ and other licensed professions - it's the non-licensed people doing most of the grunt work, the PE/architect/licensed professional reviews and signs off on it. But then, by virtue of their signature, they're still on the hook for any problems.

OptionOfT

1 replies

13h55m

2024-06-10 04:34:46 UTC

The code is written by ChatGPT, pushed, and subsequently reviewed by ChatGPT.

(I hope)

conductr

0 replies

11h53m

2024-06-10 06:35:57 UTC

Just another day livin’ in the matrix

crabmusket

0 replies

12h44m

2024-06-10 05:45:02 UTC

This sort of issue seems common in a few places. E.g. Vue's component props can have defaults, and woe betide you if you use a literal object or array as a default, instead of a function that returns an object or array.

I'm surprised there was no lint rule for this case.

tazjin

7 replies

20h29m

2024-06-09 22:00:22 UTC

A good chance to learn why people who write reliable software almost universally like static type systems.

viraptor

6 replies

20h26m

2024-06-09 22:03:50 UTC

This code would pass static type validation, there's nothing wrong with it at that level. It gets a default value to use and does exactly that.

tazjin

4 replies

20h4m

2024-06-09 22:25:52 UTC

Sure, you can follow along the old meme: https://twitter.com/vbhvsgr/status/1419369352164372482

Though in practice in decent languages it's much less likely you'd write your own `any -> any, any`-typed library for whatever (in this case DB interactions), and use a strongly typed one in which this would at least have been a much more explicit mistake to make.

bluepnume

2 replies

19h8m

2024-06-09 23:20:59 UTC

But this isn't an `any -> any` case. They passed in a default value, as a string, which is the correct type for a default value for this column. Even with very strong typing they wouldn't have got a type error here right?

jowea

0 replies

17h15m

2024-06-10 01:14:28 UTC

You could make a special primary key column creation function that rejects static values.

Nullabillity

0 replies

8h39m

2024-06-10 09:50:43 UTC

You can forbid static values and require it to be a function.

Generally, static languages will just culturally be less likely to have this kind of invisible "T | (() -> T)" overload.

paulmd

0 replies

18h24m

2024-06-10 00:05:38 UTC

I thought this was going to be the xkcd ‘int rand() {return 6; //chosen by fair dice roll}’, which is more or less what they did lol.

jrpelkonen

0 replies

17h56m

2024-06-10 00:33:37 UTC

If the language in question disallows function overloading, this distinction becomes very clear. For an example, see Rust option's unwrap_or[1] vs unwrap_or_else.

[1] https://doc.rust-lang.org/std/option/enum.Option.html#method...)

paul7986

7 replies

16h41m

2024-06-10 01:48:36 UTC

Today chatGPT thought it was June 7th 2024 but today is June 9th 2024... like what huh WTH it could query my iPhone for that simple info

spencerchubb

6 replies

16h38m

2024-06-10 01:51:13 UTC

I get the sense that you don't understand the purpose or design of chatgpt

No tool can do everything

paul7986

5 replies

15h43m

2024-06-10 02:46:22 UTC

what it cant tell you the correct date.. how simple is that and yeah i use chatGPT in many different ways ... i was pointing out an example from today in which it was flat wrong for such a simple thing.

spencerchubb

4 replies

14h49m

2024-06-10 03:40:06 UTC

A large language model is trained on vast amounts of text to predict the next token. That tool will have no idea what the current date is, unless the developers augment it by telling it the current date.

paul7986

3 replies

14h15m

2024-06-10 04:14:38 UTC

Few weeks or more ago I discovered Burger King has a cotton candy slurpee/icee ... I've been enjoying one once to a few times a week. But not all locations have it yet chatGPT seemed to know which locations have it & each one it told me ..they had it available to buy. Very useful so today went to one closest to me which always has it but today they said we no longer do. Thus I asked chatGPT has been discontinued as well asked for its sources which it provide saying it's still available per information as of today. Then I asked it what is today it gave the wrong date ..two days in the past.

I'm surely not using it to ask for just the date yet still it should know such a simple thing as current date if asked..ppl expect it to like Siri and Alexa can tell u it..users expect the same UX and way better!

spencerchubb

2 replies

13h24m

2024-06-10 05:05:46 UTC

That is indeed a weakness of chatgpt - it's not so good with super current information

That is because it is only trained on info up to a certain date

It's good to know what tools are good and bad at

paul7986

1 replies

13h9m

2024-06-10 05:20:38 UTC

ummm chatGPT is very slowly becoming mainstream... for to become fully such no user and or prospective user cares about it's limitations .. it needs to work like previous similar technologies do like Siri and Alexa.

User experience is utmost important without it Apple wouldnt be apple and chatGPT as it's current trajectory is going looks to become as big as the current biggies. Thus everyone including parents/grandparents/etc will be using it and those users dont care about LLM talk and etc.. they have no idea what it is. chatGPT just works for them and somewhat magically so.. it even knows the current date as mom, dad or grandma be like this thing doesnt know the current date Siri does.

You must not be a UX professional ;-)

spencerchubb

0 replies

2h25m

2024-06-10 16:04:50 UTC

Yes Siri and Alexa can tell you the date. But I feel like you are assessing chatgpt unfairly. Would you say that a mobile phone is bad because you cannot run powerful programs like on a laptop? No, the mobile phone has its own purpose, which is convenience

Chatgpt can do many things that siri and alexa cannot: draft emails, write code, write unit tests, help me brainstorm, give me cooking recipes, or even talk about relationship issues. Every tool has strengths and weaknesses

internetter

7 replies

20h54m

2024-06-09 21:35:19 UTC

"Note: I want to preface this by saying yes the practices here are bad and could have been avoided. This was from a different time under large time constraints. Please read with that in mind"

These "constraints" are why I'm terrified of subscribing to software

4hg4ufxhy

2 replies

12h51m

2024-06-10 05:38:52 UTC

Having worked with some legacy subscription code it can be quite nasty.

We had race conditions where we would charge users twice

This has made me paranoid that any time I see timeout or error related to money I assume it went through and come back later.

tk90

0 replies

25m

2024-06-10 18:04:03 UTC

out of curiosity, what's the technical solution? My guess: add an idempotency key to the request and a message queue? Then when you try to consume it, you check whether that request was made previously.

debugnik

0 replies

9h49m

2024-06-10 08:40:25 UTC

A local cinema's shitty website charged me and, immediately after, my wifi disconnected for a few seconds. This somehow crashed their server entirely for several minutes, failing to send me the tickets even when it got back up.

It took me several threatening emails to make them understand they had already taken my money and I wasn't going to try again until I got a refund. Now I'm paranoid any time I purchase online at mediocre shops.

peter_l_downs

0 replies

18h21m

2024-06-10 00:08:25 UTC

I have immense respect for the OP for writing up the story, and even more so for giving this preface. It's really useful to know what mistakes other people make, but can be quite embarrassing to tell others about mistakes you've made. Thanks, OP.

patates

0 replies

12h41m

2024-06-10 05:48:22 UTC

Doing a rewrite under "large" time constraints is also unusual.

muzani

0 replies

20h49m

2024-06-09 21:40:04 UTC

The alternative is writing it yourself, which adds constraints to everything else :)

duxup

0 replies

20h45m

2024-06-09 21:44:18 UTC

subscribing

The previous world where you buying per user seat licenses for hundreds and hundreds of dollars wasn't great.

nikonyrh

6 replies

20h56m

2024-06-09 21:33:47 UTC

I'm not familiar with this library, how does `text("(now())")` evaluate and why there are extra parentheses? And should that be a lambda expression as well, so that `create_date` isn't just the timestamp when the python process was started?

crote

1 replies

20h34m

2024-06-09 21:55:15 UTC

I'm not familiar with the library either, but that seems to be a SQL expression executed on the database server. It's basically a copy-paste from the official documentation[0]. So no, not a lambda expression, because it's not computed in Python.

As to the extra parentheses: I bet that's a force-of-habit thing to prevent potential issues. For example, it seems Sqlite requires them for exactly this kind of default definition[1]. It could also read to nasty bugs when the lack of parentheses in the resulting SQL could result in a different parse than expected[2]. Adding them just-to-be-safe isn't the worst thing to do.

[0]: https://docs.sqlalchemy.org/en/13/core/metadata.html

[1]: https://github.com/sqlalchemy/sqlalchemy/issues/4474

[2]: https://github.com/sqlalchemy/sqlalchemy/issues/5344

nikonyrh

0 replies

19h42m

2024-06-09 22:47:48 UTC

Aah that makes sense, thanks!

yeputons

0 replies

20h48m

2024-06-09 21:41:47 UTC

I suspect it's either just an SQL expression sent to the DB, or it's `eval`ed in Python.

oxidant

0 replies

20h48m

2024-06-09 21:41:32 UTC

My guess is that "now()" is the DB function that returns the current timestamp.

openmajestic

0 replies

20h48m

2024-06-09 21:41:47 UTC

IIRC, the difference is server_default vs default. One is generated DB-side, the other in the Python code. Might be wrong on that, but that's my recollection

jhardy54

0 replies

20h47m

2024-06-09 21:42:35 UTC

I’ll note that the bug is in the `id` column, but the `created_date` is likely passing the string “now()” to invoke SQL’s NOW(), deferring the timestamp creation to the database.

AndyKelley

6 replies

17h2m

2024-06-10 01:27:43 UTC

Two more problems identified solely from the screenshot:

* you have two competing subscription id columns.

* a uuid is not a string, it is a 128 bit integer. If your database limits to 64 bit integers then use a 64 bit integer for the id instead of a string, or use an array of 128 bytes.

beala

3 replies

14h8m

2024-06-10 04:21:09 UTC

The StripeCustomer table has the same issue. There's both an `id` column and a unique `customerId` column. Presumably the `id` column is useless and could be removed.

Also, is there a way to set up foreign key constraints on `userId` with this ORM? That seems like another oversight.

cuppypastexprt

2 replies

13h37m

2024-06-10 04:52:27 UTC

In that blog they say they have now added very robust unit and integration tests, so I don't think this is an issue.

cube00

0 replies

13h11m

2024-06-10 05:18:24 UTC

I'd be concerned said "robust" tests were also lifted from ChatGPT.

Kesty

0 replies

11h2m

2024-06-10 07:27:53 UTC

The conclusion of the blog are not great either.

Sure you should have tests, sure you shouldn't copy paste code you don't understand and you shouldn't push directly to production.

But, regardless of all that, the main issue of all this incident is not the rookie mistake itself, is how they didn't have logs or alerts and it took them 5 days of customer complaining to find out they had "duplication errors" in the db.

That's the thing that should have been fixed first and extensivly mentioned in the post-mortem

throwawayffffas

0 replies

7h24m

2024-06-10 11:04:58 UTC

you have two competing subscription id columns.

That's kind of rational, id is their own internal id. subscription_id is the id on stripe. A better name would be stripe_id.

SamBam

0 replies

16h11m

2024-06-10 02:18:10 UTC

Good catch, I agree that subscription_id looks hallucinated.

joshstrange

5 replies

19h50m

2024-06-09 22:39:20 UTC

Our project was originally full stack NextJS but we wanted to first migrate everything to Python/FastAPI.

Tell me you had no business being invested in without telling me.

I’m going to be harsh here but I honestly have no clue how else to respond. You wrote your backend in Node/Typescript and then decided to change it to Python. What in the world would make that a good idea? No seriously, there is absolutely nothing sane about that decision. In top of that, you used ChatGPT to do the conversion for some of your DB models, was that just for speed or because you didn’t know what you were doing (new language/framework?).

Also, you say you had credits to burn (god this industry is so messed up sometimes) so why rewrite? Clearly not for cost/performance and Node to Python seems like a very lateral move all things considered.

I’m completely flabbergasted as to why you would rewrite your backend like this.

qarl

3 replies

17h48m

2024-06-10 00:41:38 UTC

Tell me you had no business being invested in without telling me.

Check out their comment history to see who invested in them.

throw46365

1 replies

14h9m

2024-06-10 04:19:58 UTC

The lesson I learned from the dot-com era is that people who are dependent on the hype to make profit will crane their necks to believe the hype.

Salespeople, executives, engineers, it doesn’t matter.

Every day on HN reminds me a little more of 1998.

talldayo

0 replies

18m

2024-06-10 18:11:01 UTC

Yep. Hacker News is one of the only social medias that I've seen where you have to reject reality in order to be accepted and upvoted by the majority. After a while it becomes more funny than sad, really.

sleazebreeze

0 replies

17h6m

2024-06-10 01:23:42 UTC

Whats even more ironic is that their business is about extracting, fixing, and repairing data using AI. They had a chance to dogfood here and missed it.

voiceblue

0 replies

18h9m

2024-06-10 00:20:34 UTC

It’s probably because they wanted to use some Python library to run inference or something. Although perhaps IPC would be cheaper than a rewrite.

bluelightning2k

4 replies

19h17m

2024-06-09 23:12:06 UTC

It is strange that this took 5 days to find. Simply because of logs.

Go to logs. Filter by errors. Oh, errors in insert subscription. Seems relevant.

I could understand if the errors were somewhere else.

Even if logs didn't exist. Problematic endpoint generating 50 emails per day? I would have immediately thrown a try catch and rendered the error to the user if logging was impossible. Then your very next bug report solves it.

Assuming that they had the error (guid collision) - it's not as easy to spot as some commentators are making out. But surely after reading the code s few times.

Ironically they should have asked ChatGPT for help debugging

wmedrano

2 replies

17h56m

2024-06-10 00:33:41 UTC

ChatGPT gives the following:

The code snippet has a subtle but significant issue in the default value of the id column. Here is the problematic part: ... In this line, default=str(uuid.uuid4()) is evaluated only once at the time of the class definition, not each time a new StripeCustomer instance is created....

sitkack

0 replies

17h42m

2024-06-10 00:47:39 UTC

They should have also asked the LLM for some integration tests.

Ensure that the revenue generating codepaths have proper logging.

This failure had very little to do with having an LLM write it.

SamBam

0 replies

16h15m

2024-06-10 02:14:19 UTC

Presumably that's easy enough given just the snippet in question, but they hadn't narrowed the issue down to that snippet because they had no idea it was happening on the user creation, because they didn't have the logs.

I expect ChatGPT wouldn't have been able to solve the issue given the entire codebase.

latexr

0 replies

4h10m

2024-06-10 14:19:49 UTC

Ironically they should have asked ChatGPT for help debugging

Why are you assuming they didn’t? This is an AI company using AI to build their product and trusting it without proper code review, testing, or guardrails. Clearly they’re all in on AI hype. This took them days to solve, so not only would I bet they asked ChatGPT, I’d wager multiple people tried it multiple times.

Edd1CC

4 replies

8h33m

2024-06-10 09:55:57 UTC

They had 8 AWS tasks running 5 instances each with code written in TypeScript and Python, with frameworks like next.js, with $40 revenue and only a few weeks dev time?

What the actual fuck hahahaha

This is made worse when they edit saying the reason the codes crap is because of time constraints, but spent their time refactoring across languages and spinning up a distributed system FOR NO REASON. That is self imposed harm juggling features and ridiculous technical complexity. What were they thinking.

Edit: a YC summer ‘23 company who’s product is still behind a waitlist summer ‘24, presumably because of a rewrite to Rust

wg0

3 replies

7h58m

2024-06-10 10:31:54 UTC

Because they have half a million dollar to star with and 1.2 million dollars on top and then free AWS credits to burn.

TheNewsIsHere

1 replies

2h53m

2024-06-10 15:35:58 UTC

They even said as much in TFA - "[...] overkill, yes we know, but to be fair we had AWS credits".

Fully acknowledging the irony I am about to invoke - this is why I hate startup culture. Not startups, but this ridiculous culture of "well the VC gave us a million bucks and that bought us $100,000 in AWS credits, so let's just use it."

As someone who has built my company fully on my own dime (and the dimes of two colleagues), it's easy enough to burn piles of money in AWS (or any other cloud) when you're making an attempt at being judicious. Spinning up eight backends (edit: running five instances each, no less!) just because you have money, despite the fact that you know you don't need that much compute, is just insane. If for no other reason than you're just throwing your own credits away.

wg0

0 replies

59m

2024-06-10 17:30:04 UTC

A ten dollar Hetzner with dokku would do fine at this stage.

But then the whole startup culture is generally speaking, a culture of waste, pride and vanity.

Edd1CC

0 replies

7h5m

2024-06-10 11:24:03 UTC

If I give them a trampoline they shouldn't spend all day jumping on it, just because they can. Especially if they're busy with features.

They literally had 1 instance of the backend per $1 of revenue, and the reason the bug wasn't seen straight away was because they had 40 backend instances each with a single uuid that could be used for users before it broke with non-unique id errors.

japhyr

3 replies

18h5m

2024-06-10 00:24:04 UTC

This kind of thing must be happening much more often than we're hearing about it, right? I'd love to start a site that collects AI "horror stories", where trusting an AI's output led to significant consequences.

I have no idea how to validate people's anecdotes, though. (To be clear I don't doubt this story at all. But if I set up a site where people could submit stories I wouldn't trust any submissions at face value.)

throw46365

0 replies

17h15m

2024-06-10 01:14:04 UTC

Free idea: a website called But Humans Also, where one collects bad justifications for applying LLMs.

<mid 2000s product specialist> the dot-com is available!

sen

0 replies

14h46m

2024-06-10 03:43:35 UTC

Just like Reddit, you’d have users using AI to generate fake posts to farm karma.

User generated content in general these days is completely poisoned.

hehdhdjehehegwv

0 replies

17h42m

2024-06-10 00:47:23 UTC

you are a software developer who used LLM generated code in a production database. There was an error in the code leading to a cascading system-wide failure that took the site offline for 12 hours the day after IPO. this caused a company’s stock drop by 35% on the second day of trading. Write an anonymous form post detailing your mistake and warning others against using LLM code

I'm writing to share a painful lesson learned firsthand about the risks of integrating LLM (Large Language Model) generated code into production systems. Recently, my team and I experienced a catastrophic failure due to an error in code generated by an LLM, which resulted in our site being offline for a staggering 12 hours.

The fallout from this incident was devastating. Not only did we lose valuable revenue and user trust, but the company's stock plummeted by 35% on the second day of trading following our IPO. It's a nightmare scenario no developer ever wants to face.

Here's what happened: in our rush to meet deadlines and optimize processes, we turned to LLM-generated code to expedite development. While it seemed like a shortcut at the time, we failed to thoroughly vet the code for potential flaws and dependencies. Consequently, when an overlooked error surfaced, it triggered a cascading failure that crippled our entire system.

The repercussions of this oversight extend far beyond our organization. It serves as a stark reminder to the entire development community about the inherent risks of relying on AI-generated code in critical production environments. While LLMs are undoubtedly powerful tools, they're not foolproof, and blindly trusting their output can have dire consequences.

In hindsight, I deeply regret the decision to incorporate LLM-generated code without adequate scrutiny. I hope by sharing our experience, others can learn from our mistake and approach the use of AI-generated code with caution.

Let this be a warning to all: while LLMs can be valuable assets in certain contexts, proceed with caution when considering their implementation in production systems. The allure of efficiency must never compromise the integrity and reliability of our codebase.

DataDive

3 replies

20h49m

2024-06-09 21:40:15 UTC

proper title: database modeling bug costs company 10K

but then perhaps coding the entire thing with ChatGPT saved the company more than 10K, so they came out well ahead

or maybe tons of other bugs lurk that will cost the company well over 10K over the long run

klabb3

1 replies

18h20m

2024-06-10 00:09:33 UTC

In a typical web based app, schema is the one and only thing that you should be paranoid about. Writing it by hand is important for the same reason typing your password to confirm a big transaction is important. The time and thought going into it is worth its weight in gold. I would rather write it with one finger twice over, than giving it to ChatGPT.

cube00

0 replies

11h46m

2024-06-10 06:43:05 UTC

Nullable strings for all!

gwill

0 replies

20h40m

2024-06-09 21:49:54 UTC

either way it sounds like they're stuck with poor testing around their code. that shows the devs probably don't understand the code which means fixing or changing things in the future will take longer.

userbinator

2 replies

20h18m

2024-06-09 22:11:52 UTC

During the work day, this was fine. We probably committed 10-20 times a day

That's... scary, to put it mildly. I wonder how many of those are fixes to things broken by previous commits. Then again, I work on software where the average is far less than one commit per day, although it's a mature product. Nonetheless, "slow down and think" is probably good advice in this case.

zeroonetwothree

0 replies

17h22m

2024-06-10 01:07:29 UTC

I suppose it depends how many people work on it and what stage it’s in. Without knowing that’s it’s hard to say whether <1 or >20 makes more sense

mangamadaiyan

0 replies

18h3m

2024-06-10 00:26:00 UTC

One pearl of current software wisdom is "don't think, just do", as a corollary to "move fast, and unbreak things later". Never mind that the cost of unbreaking things is usually far higher than whatever expenses were notionally saved by going to plaid in a hurry.

thisisauserid

2 replies

19h8m

2024-06-09 23:21:26 UTC

Also reads like it was pasted from ChatGPT.

mewpmewp2

1 replies

17h55m

2024-06-10 00:34:24 UTC

I actually don't believe ChatGPT made this mistake. Maybe one of their engs made it and then decided to blame ChatGPT. I can't get ChatGPT to reproduce this error. I wonder what their prompt was.

I use ChatGPT constantly and it is not the type of error it would make. It is such a common pattern.

And if you ask GPT-4o whether the code is correct, it is able to spot the issue.

dragonwriter

0 replies

13h58m

2024-06-10 04:31:36 UTC

I can't get ChatGPT to reproduce this error. I wonder what their prompt was.

They were having it translate NextJS code to Python, so the prompt probably included their NextJS code (actually, since they’d never turned on the feature that led to them realizing the problem in NextJS, and maybe didn't have enough volume to hit it on the other pathways that the Python code had it on, it’s not implausible the same bug existed in their NextJS code but was never triggered, and ChatGPT just translated the bug. But in any case, their prompt would include their proprietary code to translate.)

nomilk

2 replies

14h42m

2024-06-10 03:47:01 UTC

Part of the skill in using LLMs is knowing when and how to use them, how to set the 'temperature' (how 'creative' it will be in its response), and how to write a prompt that is less prone to illusory responses.

My eyes were opened one relaxing morning when sipping my coffee and pondering how to tidy up a database column by migrating from string to enum.

I asked ChatGPT for its thoughts and its response seemed perfunctory and on point, until at one particular line, tucked in the otherwise sensible migration file [1], it casually recommended deleting all users whose value for that attribute wasn't among those specified by the enum. I spat my coffee out and learned a very valuable lesson that morning!

[1] https://imgur.com/a/ejIdCH6

qup

0 replies

14h34m

2024-06-10 03:55:39 UTC

That's one way to avoid bugs

ncallaway

0 replies

14h22m

2024-06-10 04:07:44 UTC

I’ve found it’s particularly useful at questions like “how do I do X idiomatically in Rust?”

Idioms are, very conveniently, questions about what the most common shape of X among the broader community, so it tends to do quite well with that.

I appreciate when it spits out example code, but I never copy/paste from it, I always rewrite anything myself to ensure I don’t slip up and accidentally… well, what it tried to sneak in to you…

That’s quite the scary anecdote

furyofantares

2 replies

15h11m

2024-06-10 03:18:48 UTC

Felt like a clickbait headline to me, but there's no link back to the project. So I guess not. Definitely respect for telling an embarrassing story even if I disagree wholly with the title (both the implication that the ChatGPT mistake is to blame, and that it cost them $10k.)

Anyway I believe the product in question is https://agentgpt.reworkd.ai

wg0

0 replies

9h36m

2024-06-10 08:53:02 UTC

AgentGPT is an autonomous AI Agent platform that empowers users to create and deploy customizable autonomous AI agents directly in the browser. Simply assign a name and goal to your AI agent, and watch as it embarks on an exciting journey to accomplish the assigned objective.

from https://docs.reworkd.ai/introduction

Whereas the blogpost clearly demonstrates that AI agents cannot be left totally "autonomous", their output might seem reasonable for those not well versed in particular domain but might have disastrous consequences.

VC bros are clearly gambling big on Linear Algebra.

beala

0 replies

14h33m

2024-06-10 03:56:39 UTC

I had the same question. Everyone is talking about how this is bad for the company's reputation... but it wasn't immediately clear to me what the company is.

I also eventually landed on reworkd.ai after some googling. The blog is called "asim" and the OP's username is "asim-shrestha". That lead me to this: https://www.ycombinator.com/companies/reworkd They are S23, which is mention in the blog.

IshKebab

2 replies

20h46m

2024-06-09 21:43:33 UTC

More like how switching to Python (and not meeting Python's high testing requirements) cost you $10k. This is a well known Python foot-gun and a developer could easily have made that mistake too.

asddubs

1 replies

20h37m

2024-06-09 21:52:07 UTC

agreed, though if a developer had manually made the mistake they might have realized the problem in less than 5 days. copy paste a bunch of ai generated code into your project and no one can try to deduce where the problem might lie once something goes wrong

though a little logging would also have gone a long way. I don't really get how this could have taken 5 days to find, since they knew exactly where the problem was

IshKebab

0 replies

11h35m

2024-06-10 06:54:11 UTC

Agreed. Also Pylint has a lint for exactly this mistake so they didn't even set up the standard linting / static type checking tools which are absolutely a must with Python.

tracerbulletx

1 replies

20h41m

2024-06-09 21:48:14 UTC

Yeah something about assigning blame for the mistake to ChatGPT really annoys the heck out of me.

sjansen

0 replies

20h3m

2024-06-09 22:26:19 UTC

Would "we screwed up by blindly trusting ChatGPT" annoy you less? Because that's how I read it.

Or more specially, given the context: "We were in a rush to translate a bunch of code and ChatGPT was doing such an impressive job helping that we became complacent and forgot that it just parrots back text it has seen before with something that looks like intelligence but without actual comprehension. So when it copied a common bug, we weren't paying enough attention to catch it."

threecheese

1 replies

17h36m

2024-06-10 00:53:43 UTC

In your defense, at least your solution wasn’t:

‘$ EXPORT MAX_REQUESTS=1 gunicorn bear:app’

… which would have in fact fixed your subscription problem, but gave you a new problem :)

Great share!

beala

0 replies

14h19m

2024-06-10 04:10:07 UTC

Heads up Bear [1] is just the blogging platform, not the company this post is about.

[1] https://bearblog.dev

serf

1 replies

17h3m

2024-06-10 01:26:21 UTC

i'm not a big openAI fan and even I think the title is crummy.

I think it should be "How we used chatGPT to make a 10k mistake." -- at least then it's honest about the party at fault, that being the startup that didn't vet generative code.

Relatedly i've been throwing AIs at the problem of ordering mods and dependencies for game engines; it's pretty astonishing the error rates you see involving medium-sized text lists.

A good experiment: take a 100 line text file of whatever, ask an AI to sort the lines by some criteria and output a text file, you'll get files back with less than 100 lines routinely.

These kind of things really limit my faith in those systems to work without a heavily leashed supervisor along side.

bavell

0 replies

6h35m

2024-06-10 11:54:49 UTC

This has also largely been my experience trying to have some openAI and local models help with game modding/scripting. Requires constant hand-holding which is just as much work as just doing it myself.

rebolek

1 replies

20h28m

2024-06-09 22:01:55 UTC

40 servers for one paying customer might seem like a bit of total annihilating overkill.

gizajob

0 replies

13h22m

2024-06-10 05:07:09 UTC

Not if you’re burning someone else’s venture capital.

lovasoa

1 replies

20h34m

2024-06-09 21:55:51 UTC

Didn't the logs say something like "duplicate key value violates unique constraint [...]" ?

vel0city

0 replies

17h59m

2024-06-10 00:30:51 UTC

That's what I don't get. Five days to query CloudWatch logging? This should have been caught before the first email even came in. "Gee, isn't it strange how we get these spikes in stderr output on our backend last night?"

clpmsf

1 replies

18h32m

2024-06-09 23:57:42 UTC

It seems YC is trending younger with this AI wave (I believe average age of the most recent batch was ~26) - kids dropping out of college to build startups without ever having worked at a real software engineering job... I imagine this type of story is not at all uncommon at the moment.

miyuru

0 replies

10h52m

2024-06-10 07:37:05 UTC

I got the same information going though the company LinkedIn.

Seems like YC in line with the rest of the world, slap AI on anything and boom there are VCs and cash.

Side story: The company I work for even change the company domain from .com to .ai. Cannot wait for the AI bubble to burst.

bomewish

1 replies

20h13m

2024-06-09 22:16:11 UTC

So I understand right, there are two solutions that would have handled this before it even got to prod or at least found it in prod fast.

1. Bunch of tests that simulate exactly the scenario of signups. Hundreds of them actually inserting db records with maybe some kind of dummy stripe code.

2. Logs of the actual uuid for each person.

The second would never have been used since the tests would have caught this bug. But are important anyway.

Seems a bit rich to blame the absence of these two things on chatgpt. That’s just immature engineering practices.

tidenly

0 replies

14h12m

2024-06-10 04:17:14 UTC

Im guessing they used all those credits to set up those instances, but never took the extra step to add log ingestion or any kind of monitoring. Unique constraint violations peaking should have at least sent some kind of mail or slack notification a few hours after release (putting aside the "it didnt happen during the day because we push to prod several times daily" - which is insane in its own right).

Nothing here really sounds like GPTs fault to me. The issue is something that could easily have been done by a human and missed in PR.

SamBam

1 replies

15h59m

2024-06-10 02:30:40 UTC

Did anyone else assume that this was about the Bear app, because there is no branding or link back to the author's actual project?

Also, did anyone click on the double ^ at the bottom, hoping to either go back up to the top or find something about the author's company, only to find that they had upvoted the post by mistake? I'm wondering if (by this count) 167 other people might have done that.

throwawayffffas

0 replies

7h22m

2024-06-10 11:07:15 UTC

Yes, thank you, I had the exact same experience. The actual project is probably https://reworkd.ai/

OutOfHere

1 replies

19h43m

2024-06-09 22:46:08 UTC

This is why you should always rush your engineers, never giving them enough time to validate or understand what ChatGPT just spewed. Good job. /s

Meanwhile, I go through the tedious process of understanding ChatGPT's code letter-by-letter, also reading the docs, searching StackOverflow, even offering and rewarding bounties on StackOverflow, all to see if the code makes a shred of sense.

chx

0 replies

15h40m

2024-06-10 02:49:08 UTC

When I was coding https://stackoverflow.com/a/77210784/308851 well, I had no idea how to open an SSH connection using go, I barely knew basic Go so I asked ChatGPT. The meat of the SSH connecting code is still pretty much ChatGPT written (bad chx!) but it contained a couple defers like defer session.Close() which I needed to understand and remove before it became usable as a utility function. I did search for ssh.PublicKeysCallback(agent.NewClient(sshAgent).Signers) to see whether others use it and I found they do and their code was very similar so I decided to trust ChatGPT code even if I only understood it enough to understand but not enough to write similar code. It's just an SSH connection, the risk is quite low, the expected result of bogus code is just not working, I wouldn't expect subtle errors here. The rest of the logic is hand written, though.

999900000999

1 replies

13h17m

2024-06-10 05:11:56 UTC

I have to agree it feels navie to publish this.

"We're too lazy to write our own code, or even test that it works, please give us money VCs/Users."

Chat GPT isn't to blame here, chat GPT is like a tireless intern. It isn't to blame if the CTO pushes it's code straight to prod.

This is giving me a start up idea, code review as a service!

cube00

0 replies

13h9m

2024-06-10 05:20:38 UTC

Given ChatGPT can pick up this bug when you feed it line by line [1], you might have the next startup here.

[1]: https://news.ycombinator.com/item?id=40628839

yawnxyz

0 replies

1h24m

2024-06-10 17:05:25 UTC

I think changing the title to "Not monitoring our ChatGPT calls costed us $10k" would make it stronger.

Adding monitoring is the last thing you think about when pushing out a prototype, and it's easy to forget that a "prototype that no one will probably use" could cost thousands with accidental infinite loops and bugs like these.

Always set your spending limits!

yareal

0 replies

17h27m

2024-06-10 01:02:08 UTC

This post mortem is sort of classically underdone. It describes a step that was taken that was in the path of the error, but is not the root cause. The root cause here is not "we copy pasted from chat gpt and it hallucinates", but rather a "our systems allowed this failure to get to production". Which in turn should be met with why? Because we didn't have tests or qa that covers this path. Why?

Keep. Asking. Why.

ChatGPT didn't fail, your system allowed ChatGPT to fail. Answering why is the interesting thing to discuss and blog about.

wkat4242

0 replies

18h31m

2024-06-09 23:58:31 UTC

I wonder what the product/service was?

tuananh

0 replies

15h44m

2024-06-10 02:45:07 UTC

this is same as copy paste code from stackoverflow.

throw156754228

0 replies

11h10m

2024-06-10 07:19:52 UTC

To be honest I think part of this is a poor interface provided by SQL Alchemy. To quote an influential author on my career, Scott Meyers, "interfaces should be easy to use correctly and hard to use incorrectly."

thefz

0 replies

10h36m

2024-06-10 07:53:22 UTC

"LLMs will make programmers useless". Yeah.

thayne

0 replies

15h45m

2024-06-10 02:44:34 UTC

I'm generally pretty critical of ChatGPTs code writing ability, but this mistake is something that could very easily be made by a human.

On the other hand, this could and probably should, have been caught by an automated test that tried to create multiple subscriptions on a single server. Or for that matter , manual testing of creating subscriptions against a local copy. I'm not saying that to be dismissive, but one takeaway you should get from it is the value of testing before putting code in production.

Edit: Another takeaway should probably be that if you have a a major bug like this, and you can't easily reproduce it, you should look harder. I bet there were some logs for errors about constraint violations in the database if you had looked for them.

th0ma5

0 replies

13h8m

2024-06-10 05:21:28 UTC

A lot of comments here missing the huge point that even if the fault is obvious to many, how many things were also obvious that were fixed, and how you just simply run out of attention trying to keep up with all the mistakes transformers keep putting back in the more you fight with them.

tempcommenttt

0 replies

12h4m

2024-06-10 06:25:31 UTC

This is error is one that most Python programmers must have experienced early in their career. Usually the other way around, when they define a mutable default value and are hit by strange results. On when adding the current time (datetime.datetime.now()) as a function default.

As a rookie programmer you might not notice this pattern, but after getting hit a few times you’ll immediately see that. And default to either a factory or to None and then set the value inside the function.

ChatGPT code is only safe to use if you understand it. If you don’t, there’s always the risk that it will bite you.

spamizbad

0 replies

17h12m

2024-06-10 01:17:01 UTC

I don’t understand how you can move fast in software development without at least some rudimentary observability in place (logs). You’d see a 500 and likely an IntegrityError exception and that would give you a huge clue you’re not setting your PK correctly.

shutupnerd0000

0 replies

16h55m

2024-06-10 01:34:08 UTC

OP went to the trouble of blocking out the customers name but left their photo visible?

seba_dos1

0 replies

5h40m

2024-06-10 12:49:40 UTC

It's not a ChatGPT's mistake. In fact, it never is.

rglover

0 replies

20h9m

2024-06-09 22:20:31 UTC

Overcomplexity of the stack strikes again.

rahimnathwani

0 replies

17h1m

2024-06-10 01:28:28 UTC

https://xkcd.com/221/

raggi

0 replies

11h56m

2024-06-10 06:33:44 UTC

If that's representative of the order volume I'm quite curious what the motivation/business case is for shifting technology stacks from one slow dynamic environment to another slow dynamic environment?

pvillano

0 replies

17h32m

2024-06-10 00:57:16 UTC

I try very hard to only commit code I understand.

prash2488

0 replies

12h25m

2024-06-10 06:04:16 UTC

I am not python developer. And I neither intend my career to go there in near future. But I asked ChatGPT what's wrong about this code, (Not sure it's my custom instruction or not) but it always starts assuming the imports are the issue.

Once I asked the imports are not the issue, It correctly pointed out, and explained the problemetic code at me...

I whish they could ask another question to LLM and have an issue pointed out..

pmontra

0 replies

10h57m

2024-06-10 07:32:23 UTC

Any unit test or integration test that has to create two records in that table would have failed.

petters

0 replies

11h59m

2024-06-10 06:30:01 UTC

It’s an obvious error if you know Python but it’s still a mistake in the design of the Python language imo

o999

0 replies

15h38m

2024-06-10 02:51:27 UTC

This blog post only serves as a reminder that you have to avoid overrelying on "AI".

nurple

0 replies

15h12m

2024-06-10 03:17:19 UTC

Days since an LLM screwup cost a company money: 0

n_ary

0 replies

13h36m

2024-06-10 04:53:43 UTC

My questions are, why did you decide to move on to Python/FastAPI, if you did not understand it well? My second question is, why did you copy-paste as-is from CGPT without doing a review of whatever?

I understand time constraints, but there should be a law or something forbidding using whatever any GPT vomits. In fact, many does it blatantly, so my employer banned using GPT codes and only gave us access to it for /entertainment/ usage.

mvdtnz

0 replies

18h55m

2024-06-09 23:34:10 UTC

Everyone focusing, understandably, on the poor coding and testing practices. But this kind of thing blows my mind,

This problem became really well hidden because of our backend setup. We had eight ECS tasks on AWS, all running five instances of our backend (overkill, yes we know, but to be fair we had AWS credits).

I mean sure, you acknowledge that it's overkill, but my word is that OVERKILL. You're servicing customers numbering in the double digits and you're using more cloud resources than could run entire established businesses. I feel like a lot of developers today have totally lost sight of what computers are capable of, and just over-provision (and overcomplicate) as a default approach. This is scary.

monkpit

0 replies

2h43m

2024-06-10 15:46:08 UTC

This link gives a 404 for me.

Edit: the entire subdomain gives a 404 actually.

moneywoes

0 replies

14h57m

2024-06-10 03:32:32 UTC

Curious, why migrate to Python Fast API?

mikewang

0 replies

8h56m

2024-06-10 09:33:36 UTC

not sure how. But when I asked GPT with this line and the issue was found exactly.

The issue is tiny and slipery for a very big table. But I am still curious of why test can not find it.

During the work day, this was fine. We probably committed 10-20 times a day (directly to main of course) which would cause new backend deployments to occur, giving us 40 new IDs for customers to potentially use.

They just use the test env for prod? When to push code, the CICD should be run and some examples should be run too here. And every time, the env should be clean. Here the database does not change from test to production.

m3kw9

0 replies

17h43m

2024-06-10 00:46:52 UTC

Some are saying they used ChatGPT to write code, but these are going to be normal going forward as models get better, I mean who doing web work isn’t using it to code these days? You just need better testing before pushing it

luismedel

0 replies

8h44m

2024-06-10 09:45:22 UTC

I consider a shame to read some of you, literally trashing and blaming the whole team after the article.

One thing is to healthly discuss how dangerous can be assuming a GPT-generated code is safe or not. Or how unit tests could identify this (could really in this specific case?). Or why you need oncall shifts and good alerts. But, come on.

I spotted the issue in the code at first sight, but that doesn't make me morally superior, nor smart enough to blame someone to publicly talk about their mistake. It only means I'm currently reading that kind of code a lot, and I know where to look at. Pass me some clever ARM code and I'll be unable to spot even the most superfluous mistake.

It seems HN is crowded by the most smart guys on the planet, who never had dumb mistakes and are SO "quality inclined" they need to blame someone for theirs.

edit: of course, the decision to make it public is questionable, but that's topic for another thread, IMHO.

iansinnott

0 replies

12h48m

2024-06-10 05:41:29 UTC

Why migrate to a new language? I didn't see it mentioned in the post. Seems like they already had a working solution.

huygens6363

0 replies

13h16m

2024-06-10 05:13:16 UTC

We copy pasted the code it generated, saw everything worked fine, tried it in production, saw it also worked, and went on our merry way.

Oh.. this is not good. How did you see it worked fine? You did not try inserting new customers?

houseplant

0 replies

13h46m

2024-06-10 04:43:14 UTC

at this point, how can anyone trust chatGPT when we know it hallucinates things in its responses? It returns things that look like code or answers or whatever, and that's all it's trained to do. The concept of "correctness" only exists insofar as its trained, and you can't train on something that doesn't exist yet, generative AI is not creative in that sense

I have no idea why people are letting chatGPT do anything without pouring over everything it says first, and at that point why bother.

grugagag

0 replies

17h29m

2024-06-10 01:00:14 UTC

I think this type of happenstance will become a lot more common and that’s not because using chatgpt to produce code directly, it’s a useful tool that I use from time to time too and I welcome it to some degree. I think this will become a lot more common because this tool enables more to be expected of us in terms of productivity, namely quantity. And if writing code wasn’t the hardest part, reviewing more and more of it in a shorter time will become the next burden.

goriilacoder

0 replies

15h45m

2024-06-10 02:44:22 UTC

uuidv4 is bad for a primary key. Google it. Use uuidv6 or v7. If you don't like using something not quite yet part of the standard, use v1 using time rather than MAC. Not a big deal for small tables, but for anything that will get large, you really don't want v4.

ergonaught

0 replies

20h50m

2024-06-09 21:39:21 UTC

ChatGPT didn’t make the mistake that cost you.

elforce002

0 replies

18h45m

2024-06-09 23:44:05 UTC

Thanks for sharing. Where I work we know chatGPT exists but we're still using SO for obscure errors. We don't trust any copilot when dealing with our livelihoods.

At least you got away "easy". I'm waiting for the "...cost us $100k..." post.

datavirtue

0 replies

4h17m

2024-06-10 14:12:21 UTC

I have many examples of human mistakes starting at $5MM and going down from there.

cuppypastexprt

0 replies

14h34m

2024-06-10 03:55:29 UTC

We had eight ECS tasks on AWS, all running five instances of our backend (overkill, yes we know, but to be fair we had AWS credits).

Yes, that's a very fair reasoning. YC did the right thing by investing in this company. Fits very well with the rest of their portfolio.

cuppypastexprt

0 replies

14h54m

2024-06-10 03:35:37 UTC

Edit: I want to preface this by saying yes the practices here are very bad and embarrassing (and we've since added robust unit/integration tests and alerting/logging),

Very believable.

coding123

0 replies

10h25m

2024-06-10 08:04:52 UTC

I guess, it sounds like openai is doing something right

chrismcb

0 replies

12h50m

2024-06-10 05:39:42 UTC

No. No it didn't. That is like saying stack overflow's mistake cost you. Or the weird of some random stranger cost you.

boxed

0 replies

11h17m

2024-06-10 07:12:39 UTC

Classic mistake. I tell newbies about this type of mistake roughly weekly hanging out on Discord. I bet it's in the training data of ChatGPT many many times.

blindriver

0 replies

14h58m

2024-06-10 03:31:54 UTC

The worst part about this is they got 50 emails a day saying they couldn’t subscribe, and they threw their hands up for 5 days.

better_sh

0 replies

18h23m

2024-06-10 00:06:01 UTC

the real question is why did you go to sleep right after turning on subscriptions lol

audiodude

0 replies

17h48m

2024-06-10 00:41:24 UTC

Like all startups, we've made a ton of mistakes throughout our journey with this perhaps being the worse.

lol "worst".

arialdomartini

0 replies

20h28m

2024-06-09 22:01:22 UTC

Honest question: didn’t you have any unit test around the subscription functionality?

arecurrence

0 replies

16h32m

2024-06-10 01:57:17 UTC

I made a bug like this once where a database default was set to a value evaluated at runtime instead of on every insert. Oops

However, luckily in my case, it was caught immediately in the staging env since collisions caused exceptions.

Realizing when an expression is evaluated is pretty easy to miss. That code is probably live somewhere else right now surreptitiously causing issues.

amarant

0 replies

11h35m

2024-06-10 06:54:21 UTC

To be brutally honest, blaming chat-gpt for a coding mistake doesn't inspire a lot of confidence.

Having insufficient testing and 0 monitoring does not improve it.

Y'all need to hire some senior backend Devs.

a-dub

0 replies

20h35m

2024-06-09 21:54:09 UTC

even though it's in high level web stuff, that's a classic heisenbug -- one that appears intermittently due to some hidden process that requires an expert to understand.

_giorgio_

0 replies

12h46m

2024-06-10 05:43:04 UTC

A web site with only one working page. Links to hone it to anywhere else funny work. Great. Surely a chatGPT mistake! :-D

Edit: nothing works at all.

404 ʕノ•ᴥ•ʔノ︵ ┻━┻ It looks like this page doesn't exist. Let's get you back home.

YaBa

0 replies

20h28m

2024-06-09 22:01:00 UTC

Imagine trusting ChatGPT for something important. OMG. My apologies, But I cannot feel any kind of sorry. Even for my private projects, I hardly trust on it since it allucinates a lot.

SushiHippie

0 replies

12h40m

2024-06-10 05:49:37 UTC

https://web.archive.org/web/20240610032818/https://asim.bear...

ShakataGaNai

0 replies

13h41m

2024-06-10 04:48:05 UTC

Man. Tonight is a harsh crowd on HN.

I agree, the headline is... misleading. Yes, ChatGPT made a mistake, but the issue is multiple. Perhaps it would be better if the headline was something like "A ChatGPT mistake taught us a $10k lesson".

I love using ChatGPT as much as the next person, but I also have run into more than a few circumstances on simple code where it's simply... made shit up. Like AWS (Boto3) functions that don't exist at all. So any code that comes out of it gets tested and understood by me. I'll ask it to explain and dig into the docs when it does things I don't understand.

That being said, the valuable lesson is in QA, debugging, logging and alerting. It's something that isn't a surprise a small (couple person, few months) startup would have done well. Often the developers of these projects are DEVELOPERS and not DevOps/SysAdmins/DBAs. The code gets written like developers do and not instrumented like a DevOps engineer would. Most get away with this for a long time (honestly, most companies get away with this for far too long).

So great write up, good lesson.

Culonavirus

0 replies

20h33m

2024-06-09 21:56:25 UTC

... And nothing of value was lost.

7thpower

0 replies

20h27m

2024-06-09 22:02:26 UTC

I have a lot of employees ask about how they will create value in light of AI that can do more and more of the things that have been central to their careers, and the answer is usually that they will do different things than before, and perhaps more of them, by leveraging these tools but also that they are responsible for the quality of work product, the tool is not.

That’s always been the case, but there is so much more surface area for human and tool interactions now that we have tools that are so generalized.

Good for them for sharing the story, countless others have them but not sharing them.