return to table of content

"We ran out of columns" – The best, worst codebase

PreInternet01
35 replies
4h57m

I miss that direct connection. The fast feedback. The lack of making grand plans.

There's no date on this article, but it feels "prior to the MongoDB-is-webscale memes" and thus slightly outdated?

But, hey, I get where they're coming from. Personally, I used to be very much schema-first, make sure the data makes sense before even thinking about coding. Carefully deciding whether to use an INT data type where a BYTE would do.

Then, it turned out that large swathes of my beautiful, perfect schemas remained unoccupied, while some clusters were heavily abused to store completely unrelated stuff.

These days, my go-to solution is SQLite with two fields (well, three, if you count the implicit ROWID, which is invaluable for paging!): ID and Data, the latter being a JSONB blob.

Then, some indexes specified by `json_extract` expressions, some clever NULL coalescing in the consuming code, resulting in a generally-better experience than before...

jsonis
9 replies
4h20m

These days, my go-to solution is SQLite with two fields (well, three, if you count the implicit ROWID, which is invaluable for paging!): ID and Data, the latter being a JSONB blob.

Really!? Are you building applications by chance or something else? Are you doing raw sql mostly or an ORM/ORM-like library? This surprises me because my experience dabbling in json fields for CRUD apps has been mostly trouble stemming from the lack of typechecks. SQLite's fluid type system haa been a nice middle ground for me personally. For reference my application layer is kysely/typescript.

PreInternet01
8 replies
3h37m

my experience dabbling in json fields for CRUD apps has been mostly trouble stemming from the lack of typechecks

Well, you move the type checks from the database to the app, effectively, which is not a new idea by any means (and a bad idea in many cases), but with JSON, it can actually work out nicely-ish, as long as there are no significant relationships between tables.

Practical example: I recently wrote my own SMTP server (bad idea!), mostly to be able to control spam (even worse idea! don't listen to me!). Initially, I thought I would be really interested in remote IPs, reverse DNS domains, and whatever was claimed in the (E)HLO.

So, I designed my initial database around those concepts. Turns out, after like half a million session records: I'm much more interested in things like the Azure tenant ID, the Google 'groups' ID, the HTML body tag fingerprint, and other data points.

Fortunately, my session database is just 'JSON(B) in a single table', so I was able to add those additional fields without the need for any migrations. And SQLite's `json_extract` makes adding indexes after-the-fact super-easy.

Of course, these additional fields need to be explicitly nullable, and I need to skip processing based on them if they're absent, but fortunately modern C# makes that easy as well.

And, no, no need for an ORM, except `JsonSerializer.Deserialize<T>`... (And yeah, all of this is just a horrible hack, but one that seems surprisingly resilient so far, but YMMV)

Vampiero
4 replies
3h9m

Well, you move the type checks from the database to the app, effectively, which is not a new idea by any means (and a bad idea in many cases), but with JSON, it can actually work out nicely-ish, as long as there are no significant relationships between tables.

That way you're throwing away 50% of the reason you use a relational database in the first place. Has it occurred to you that MongoDB exists?

Also I don't understand why you're afraid of migrations, especially since you're the only developer on your own SMTP server.

throwup238
0 replies
2h46m

> That way you're throwing away 50% of the reason you use a relational database in the first place. Has it occurred to you that MongoDB exists?

Did you miss that he’s using sqlite? The dev experience with a sqlitedb is way better than running yet another service, especially for personal projects.

Sqlite is used just as much as an application file format as it is a relational database.

throwaway173738
0 replies
1h23m

How does MongoDB handle someone pulling the power cord out of the server? Because that’s another reason to use something like SQLite, and it often gets used in embedded systems.

randomdata
0 replies
1h4m

> Has it occurred to you that MongoDB exists?

What gain would MongoDB offer here?

You certainly would lose a lot of things, like a well supported path to linking with to the database engine, and a straightforward way to start to introduce relational tables as the project matures. Nothing completely insurmountable, of course, but carry a lot of extra effort for what benefit?

PreInternet01
0 replies
2h45m

Has it occurred to you that MongoDB exists?

My original comment started with "but it feels "prior to the MongoDB-is-webscale memes""

So, care to take another guess? And, while we're here, does MongoDB run fully in-process these days? And/or allow easy pagination by ROWID?

throwup238
0 replies
3h19m

> And, no, no need for an ORM, except `JsonSerializer.Deserialize<T>`... (And yeah, all of this is just a horrible hack, but one that seems surprisingly resilient so far, but YMMV)

I do the same thing with serde_json in Rust for a desktop app sqlitedb and it works great so +1 on that technique.

In Rust you can also tell serde to ignore unknown fields and use individual view structs to deserialize part of the JSON instead of the whole thing and use string references to make it zero copy.

jvans
0 replies
3h13m

why is a migration such a burden in that scenario

Izkata
0 replies
43m

Fortunately, my session database is just 'JSON(B) in a single table', so I was able to add those additional fields without the need for any migrations. And SQLite's `json_extract` makes adding indexes after-the-fact super-easy.

Our solution for a similar situation involving semi-structured data (in postgres) was to double it up: put all the json we send/receive with a vendor into a json field, then anything we actually need to work on gets extracted into regular table/columns. We get all the safety/performance guarantees the database would normally give us, plus historical data for debugging or to extract into a new column if we now need it. The one thing we had to monitor in code reviews was to never use the json field directly for functionality.

chrisldgk
6 replies
4h12m

I actually love your approach and haven’t thought of that before. My problem with relational databases often stems from the fact that remodeling data types and schemas (which you often do as you build an application, whether or not you thought of a great schema beforehand) often comes with a lot of migration effort.

Pairing your approach with a „version“ field where you can check which version of a schema this rows data is saved with would actually allow you to be incredibly flexible with saving your data while also being able to be (somewhat) sure that your fields schema matches what you’re expecting.

dvdkon
3 replies
4h5m

Having to write and perform migrations for every small schema change is a bore, but it means your software doesn't have to worry about handling different versions of data. Going "schemaless" with version numbers means moving code from "write-and-forget" migrations to the main codebase, where it will live forever.

I think not doing database migrations only makes sense when you can make do without version numbers (or if you can't do atomic migrations due to performance constraints, but that's only a problem for a very small number of projects).

JadeNB
1 replies
4h0m

Not having to write and perform migrations for every small schema change is a bore, but it means your software doesn't have to worry about handling different versions of data.

Is that "not" at the front supposed to be there?

dvdkon
0 replies
3h4m

Thanks, edited.

chrisldgk
0 replies
1h59m

You’re correct there. I mostly work on CMSes with page builder functionality, which often bake the content schema into the database columns, which makes changing that schema (for new frontend features or reworking old ones) difficult and often prone to losing content, especially in dev environments. Best case is obviously that you never have to version your changes, but I‘d prefer making a new schema and writing an adapter function in the codebase depending on the schemas version to spending a lot of time migrating old content. That might just be due to me not being too comfortable with SQL and databases generally.

layer8
1 replies
4h0m

remodeling data types and schemas (which you often do as you build an application, whether or not you thought of a great schema beforehand)

This is not my experience, it only happens rarely. I’d like to see an analysis of what causes schema changes that require nontrivial migrations.

zo1
0 replies
3h13m

Same here. If your entities are modelled mostly correctly you really don't have to worry about migrations that much. It's a bit of a red herring and convenient "problem" pushed by the NoSQL camp.

On a relatively neat and well modelled DB, large migrations are usually when relationships change. E.g. One to many becomes a many to many.

Really the biggest hurdle is managing the change control to ensure it aligns with you application. But that's a big problem with NoSQL DB deployments too.

At this point I don't even want to hear what kind of crazy magic and "weird default and fallback" behavior the schema less NoSQL crowd employs. My pessimistic take is they just expose the DB onto GraphQL and make it front ends problem.

jrochkind1
5 replies
4h48m

Oh good question on date essay was written -- put dates on your things on the internet people!

Internet Archive has a crawl from today but no earlier; which doesn't mean it can't be earlier of course. My guess is it was written recently though.

__MatrixMan__
1 replies
4h26m

But clearly in retrospect. It sounds like some of the things I was doing in 2009.

chatmasta
0 replies
2h39m

It includes a reference to Backbone and Knockout JS, which were released in 2010, so presumably it was around that era. The database, though, was probably much older...

stavros
1 replies
3h19m

At least we know it wasn't written after today!

shermantanktop
0 replies
41m

Falsehoods Programmers Believe…?

klysm
4 replies
4h40m

I’m still in the think hard about the schema camp. I like to rely on the database to enforce constraints.

ibejoeb
3 replies
4h0m

Yeah, a good database is pretty damn handy.

Have you had the pleasure of blowing young minds by revealing that production-grade databases come with fully fledged authnz systems that you can just...use right out of the box?

scythmic_waves
2 replies
3h24m

Can you say more? I’m interested.

Merad
0 replies
27m

Databases have pretty robust access controls to limit (a sql user's) access to tables, schemas, etc. Basic controls like being able to read but not write, and more advanced situations like being able to access data through a view or stored procedure without having direct access to the underlying tables.

Those features aren't used often in modern app development where one app owns the database and any external access is routed through an API. They were much more commonly used in old school apps enterprise apps where many different teams and apps would all directly access a single db.

teaearlgraycold
1 replies
4h19m

Byte vs. Int is premature optimization. But indexing, primary keys, join tables, normalization vs. denormalization, etc. are all important.

RaftPeople
0 replies
2h16m

Byte vs. Int is premature optimization

I think you can only judge that by knowing the context, like the domain and the experience of the designer/dev within that domain.

I looked at a DB once and thought "why are you spending effort to create these datatypes that use less storage, I'm used to just using an int and moving on."

Then I looked at the volumes of transactions they were dealing with and I understood why.

hobs
1 replies
4h28m

This is perfectly fine when you are driving some app that has a per-user experience that allows you to wrap up most of their experience in some blobs.

However I would still advise people to use a third normal form - they help you, constraints help you, and often other sets of tooling have poor support for constraints on JSON. Scanning and updating every value because you need to update some subset sucks.

You first point is super valid though - understanding the domain is very useful and you can get easily 10x the performance by designing with proper types involved, but importantly don't just build out the model before devs and customers have a use for anything, this is a classic mistake in my eyes (and then skipping cleanup when that is basically unused.)

If you want to figure out your data model in depth beforehand there's nothing wrong with that... but you will still make tons of mistakes mistakes, lack of planning will require last minute fixes, and the evolution of the product will have your original planning gather dust.

jsonis
0 replies
4h13m

Scanning and updating every value because you need to update some subset sucks.

Mirrors my experience exactly. Querying json can get complex to get info from the db. SQLite is kind of forgiving because sequences of queries (I mean query, modify in appliation code that fully supports json ie js, then query again) are less painful meaning it's less moprtant to do everytning in the database for performance reasons. But if you're trying to do everything in 1 query, I think you pay for it at application-writing time over and over.

someuser2345
0 replies
2h58m

So, you're basically running DynamoDB on top of a sql server?

gonzo41
0 replies
3h32m

This is essentially just a data warehousing style schema. I love me a narrow db table.

But I do try and get a schema that fits the business if I can.

arnorhs
0 replies
2h6m

I think they are referring to the fact that software development as a field has matured a lot and there are established practices and experienced developers all over who have been in those situations, so generally, these days, you don't see such code bases anymore.

That is how I read it.

Another possible reason you don't see those code bases anymore is the fact that such teams/companies don't have a competitive comp, so there are mostly junior devs or people who can't get a job at a more competent team that get hired in those places

Nexialist
25 replies
3h59m

My worst codebase story:

In my first real job, I worked for a company that maintained a large legacy product programmed in a combination of COBOL and Java.

In order to work on the Java side of the product, you checked out individual files from source control to work on, which 'locked' the files and prevented other developers from checking out the same files. This functionality was not part of our actual source control system, but was instead accomplished with a series of csh shell scripts you could run after ssh'ing into our development server.

Each of our customers had a 'master' jar file that represented the actual final compiled product (a jar file is really a zip file archive, which bundles together the resulting compiled java class files).

Once you had finished implementing your code changes, you ran another set of scripts which found the master jar file for each customer, unzips it, copies the compiled files from your local machine into it, and zips it back up again. Finally the source control lock is released.

This means, effectively, that the codebase was never compiled as a whole at any point in the process, instead, we just manually patched the jar file over time with individually compiled class files.

Over the years, small errors in the process allowed a huge amount of inconsistencies to creep into the codebase. Race conditions would allow two developers to lock the same file at once, or a developer would change a class that was a dependency of some other code that somebody else was changing. Sometimes code changes would make it into some of the customer jar files, but not others. Nobody knew why.

It took a small team two years to migrate the entire codebase to git with proper CI, and a huge chunk of that time was reproducing a version of the codebase that actually compiled properly as a whole. After the project was finished, I resigned.

werdnapk
6 replies
3h2m

Visual SourceSafe would show that a file was checked out hinting to maybe stay away. Good times.

heywire
2 replies
2h48m

We still use VSS for a couple codebases. We’re a small enough team that conflicts are rare, but the occasional Teams message “hey, let me know when you’re done with module.cpp” is not unheard of.

varjag
0 replies
7m

Fascinating. An anachronism worthy of steampunk novels!

cebert
0 replies
20m

I’m impressed VSS still works. I used it as part of my first professional software engineering job in 2008 and it felt old then.

SoftTalker
1 replies
1h37m

IIRC SourceSafe could be configured with strict locking or advisory locking? I might be wrong about that.

The admin user could override or unlock locked files. We had to do this if a developer left a file checked out after they left, or were on vacation. Nobody knew the admin password for the SourceSafe repository. That was OK though, all you had to do was make a local account on your PC named the same as a the source safe admin acccount, and you'd have admin access in SourceSafe.

masklinn
0 replies
46m

IIRC SourceSafe could be configured with strict locking or advisory locking? I might be wrong about that.

IIRC SourceSafe could be configured with either strict locking (you had to lock a file in order to edit it) or no locking à la SVN (the tool would check if your file was up to date when trying to commit it).

I recall that I spent a while suffering under strict locking at my first internship before a colleague discovered this mode existed and we were able to work more reasonably (there were no editable binary files so locking was never really needed).

telgareith
0 replies
1h58m

Did you know, theres a GitHub repo called [you-poor-bastard](https://github.com/SirkleZero/you-poor-bastard)? It converts a vss repo to git (not very well, but well enough), ignoring the VSS "passwords."

mikeocool
4 replies
3h11m

I recall something similar from my first job, except the shared file locking was a full on feature in Macromedia dreamweaver.

CSS was just starting to get adopted and every project we worked on just had one “gobal.css” file. When someone else had global.css locked, you’d call dibs if you needed it next. Inevitably, everyday someone would leave the office and forget to unlock global.css and no one else could get anything done.

evan_
1 replies
1h33m

It just did this by creating a sentinel file- so if you needed to, you could just delete the file manually.

masklinn
0 replies
46m

SVN provided the ability to steal locks, and locks were opt-in so e.g. you wouldn't have made CSS need locks because it's a text file and merges fine. Mandatory locking was mostly for binary work files e.g. PSD and the like.

q7xvh97o2pDhNrh
0 replies
2h49m

...and whoever did that "accidentally" when they had to leave around lunchtime on Friday was, presumably, celebrated as a local legend.

mnahkies
0 replies
2h40m

Macromedia flash / .fla files weren't particularly ideal for collaboration either, though I still feel a bit nostalgic for working with flash in general

cjbgkagh
3 replies
3h6m

So they had a problem, got 2 years of approved development effort of a small team to solve it property which they did successfully, and then you resigned? After they fixed the problem?

Of course where they started was just awful but a place that recognized it's problems, commits to fixing it, and has sufficient competency to actually fix it sounds rather nice to me. Many orgs get stuck at step 1.

I presume there were other reasons for resigning, or you just like massive refactoring projects.

Nexialist
2 replies
2h51m

It was a little tongue in cheek, but yes. I had large grievances with the software culture there, but after I got sign off on the project to modernise our build process, I couldn't bring myself to abandon ship in the middle of trying to fix it.

After everything was finished up, I was feeling burnt out and realised that I'd held on for too long at a company with a fundamentally bad culture that wasn't going to change just because the tech did, so I moved on.

gnat
1 replies
2h12m

Thank you for the clarification. Because you said “it took a small team … and then I resigned”, it was unclear that you were part of that small team and instead made it sound like you left because the problem was fixed.

dfee
0 replies
1h29m

For what it’s worth, it wasn’t unclear when I read it.

ufmace
1 replies
54m

I think a lot of people who've only learned programming in the last 10 years or so don't realize that Git, and even more so the large-scale popularity of Git, is actually pretty new. A lot of programming happened before Git was created, and before it became mature and popular enough to be the default for everyone.

Many of these earlier systems sound terrible and hacky to us, but they were the best that was available at the time. Systems to "lock" files you were working on were pretty common because basically nobody did merges well. Most of them were based on having a server to manage the whole thing too, so they were only really common in corporations and larger and more mature hobbyist teams - this was also before you could spin up a cloud server for most things with a few keystrokes and $5 a month. It's low-effort to spin up a local Git repo to track your work on a tiny personal project, but nobody's going to set up a CVS server for that.

Anybody remember Joel On Software's 12 Steps [0], written in the year 2000, 5 years before the Git project was started? Where "use source control" is Step 1? There's a reason why that's there - at the time it was written, source control was clunky and a pain in the ass to set up, so a lot of smaller companies or ones with less mature development teams never got around to it.

I may be getting a little "old man yells at clouds" here, but be really thankful that Git is FOSS, ubiquitous, works great for 98% of projects, and is superlatively awesome compared to everything that came before it.

[0] https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-s...

fiddlerwoaroof
0 replies
43m

I use CVS, RCS and Subversion on my personal projects for several years before I learned git. I don’t remember any of them being a pain to setup for small projects and Sourceforge provided free CVS hosting for small projects.

stavros
1 replies
3h21m

We all take too much for granted how much git "just works", even when it doesn't.

Izkata
0 replies
1h13m

Even those before git - what they're describing sounds a bit like RCS, the precursor to CVS, which came before SVN.

I've never used RCS or CVS myself, but I remember the file locking thing in descriptions of it, and that it was why CVS was named "concurrent" - it fixed that limitation.

wredue
0 replies
1h54m

I’ve worked on “check out” code bases very similar to this. I mean. Nothing so insane as scripts patching class files in to jars Willy nilly, but it seems the “just check out a file so nobody else can work on it” thing is a “common” “solution” to this.

I am interested in how you managed it when someone had to hold code for years (as I’ve seen)? Basically, we also had a way to share control with one other person, and the person who was taking the second source were responsible for updating both versions at the same time manually (which never happened, so implementing a large project that touched hundreds of source files had to dedicate a couple weeks to manually hand comparing files, manually implementing the changes, and then manually retesting)

whatever1
0 replies
3h20m

I bet now they deal with broken pipelines and dependency hell.

Tools do not fix bad design.

sir-dingleberry
0 replies
3h51m

This is insane. Thanks for the post.

pelagicAustral
0 replies
3h42m

This sounds so much like dealing with MS Access databases... Unfortunately, part of my responsibility in my current role is to manage a handful of Access applications... they are ancient, I am talking early 2000's. They are the most unruly thing I have ever had to work with, and I will not go into details but your story reminds me so much of having to work on these apps...

brightball
0 replies
1h39m

Had a similar thing on a small team at a bigger company. We didn't have any type of version control so the team did all development on a shared dev server that we SSH'd into. We'd use vim to edit files one at a time and rely on vim's file locking to make sure that we weren't editing anything at the same time as anybody else.

Oddly, the project was one of the best I've ever worked on.

hleszek
12 replies
1h23m

When I started at my first company, they had a very complex VB application running on dozens of customers around the country, each having some particular needs of course. There was a LOT of global variables (seemingly random 4 uppercase letters) controlling everything.

At some point, the application had some bugs which were not appearing when the application was run in debug mode in Visual Studio. The solution was obvious: installing Visual Studio for each customer on site and teaching the users to run the app in debug mode from Visual Studio. I don't know how they convinced the users to do this and how they managed with the license but it was done.

What happened next was even worse.

There was no version control of course, the code being available on a shared disk on the local network of the company with the code copied over in multiple folders each having its own version, with no particular logic to it either, V1, V2, V2.3, V2a, V2_customer_name, V2_customer_name_fix, ...

After that, when there was a problem for a customer, the programmer went there to debug and modified the code on site. If the bug/problem was impacting other customers, we had to dispatch some guys for each customer to go copy/edit the code for all of them. But if the problem was minor, it was just modified there, and probably saved on the shared folder in some new folder.

What happened next was to be expected: there was no consensus on what was the final version, each customer having slightly different versions, with some still having bugs fixed years before for others.

fouronnes3
11 replies
1h14m

This is amazing. I can so well imagine a bright young hire joining that team, helpfully offering to "setup this thing called git" only to be laughed out of the meeting by all the "senior" staff.

djbusby
2 replies
58m

I was one of those once. Tried to get CVS in a project.

Then some other dev committed 9MB of tabs 0x09 at the end of a file. Then the site was "slow" (cause the homepage was 10MB). And the blame went to...CVS somehow.

I left.

qup
1 replies
30m

Then they probably fired the guy who deleted 2MM lines of tabs for not meeting LoC metrics.

djbusby
0 replies
20m

Many years later there was a scene in Silicon Valley where their VC set a tequila bottle on the delete key and caused havoc.

That's when I figured out what happened in 1998.

mleo
1 replies
1h5m

Anecdote seems long before git creation, so Visual SourceSafe maybe. Which did not work well over a WAN. Needed other tools to replicate and synchronize VSS.

masklinn
0 replies
52m

You can remove the "over a WAN" part: VSS had been designed as a local VCS, so until the addition of a proper server in the mid aught using it over a network share was the only way to actually use it. And it really wasn't good.

I don't know if that made it better, I assume not much, VSS was really trash.

llmblockchain
1 replies
25m

I've been that person a few times.

1. The only developer on the team with Github and put forward the idea of the company not hosting their own source code with TFS.

2. The only developer using branches with git when the co-founder asked (demanded) everyone to only use master.

The list goes on!

malux85
0 replies
1m

Ahhh yes, the beauty of large organisations full of dead weight!

Here’s a few of my horror stories where I was a consultant at various companies:

1. Previous consultants left no documentation or anything, and a running Hadoop cluster handling (live!) 300 credit card transactions a second. Management hired 8 junior sysadmins - who were all windows sysadmins, had never used Linux before, and were expected to take over running this Linux cluster immediately. They all looked at me white as ghosts when I brought up SSH prompt, that’s the point where I learned they were all windows sysadmins.

2. Another company: all Java and MySQL developers who were trying to use Spark on Hadoop, refusing to learn anything new they ended up coding a Java app that sat on a single node, with a mysql database on the same node, that “shelled out” to a single trivial hello-world type function running in spark, then did the rest of the computation in Java on the single node, management celebrated a huge success of their team now using “modern cluster computing” even though the 20 node cluster did basically nothing and was 99.99% idle.

3. Another company: setup a cluster then was so desperate to use the cluster for everything installed monitoring on the cluster, so when the cluster went down, monitoring and all observability went down too

4. A Cassandra cluster run by junior sys-admins and queried by junior data scientists had this funny arms race where the data scientists did what was effectively “select * from *” for every query and the sysadmins noticing the cluster was slow, kept adding more nodes, rather than talk to each other things just oscillated back and forwards with costs spiralling out of control

Any many more!

This might sound like I’m ragging on juniors a bit but that’s definitely not the case - most of these problems were caused by bad management being cheap and throwing these poor kids into the deep end with no guidance. I did my best to upskill them rapidly and I’m still friends with many of them today, even though it’s nearly 10 years later now,

Good times!

The_Colonel
1 replies
59m

At this level of dysfunction, installing git won't do anything. You need a holistic change in thinking which starts with convincing people there's a problem.

parpfish
0 replies
46m

Yeah, this level of dysfunction takes years to cultivate.

You need the “Dead Sea effect” to be in effect long enough that not only have the good people left, but for them to have been gone long enough that people rising into management have never even worked with somebody competent so they don’t know there’s a better way

kolanos
0 replies
59m

I'm sure I'm not alone in actually having lived such an experience.

I joined a dynamic DNS provider once that had been around since 1999. Their tech, sadly, had not progressed much beyond that point. Showing the higher ups version control was like showing cavemen fire. Of course once the higher ups arranged to have training sessions led by the new hire for the entire dev team the VP of Engineering couldn't handle it and had me fired. Fun times.

hleszek
0 replies
51m

I started in 2008. This is what I did eventually. Over the years I introduced the small company to Linux, git, defensive programming, linting, continuous integration, Scrum..., but only for the new projects and stayed 13 years there.

That old project though was never fixed though, probably still running that way now.

dmd
11 replies
2h37m

Probably some of the worst code I ever worked on was a 12k+ line single file Perl script for dealing with Human Genome Project data, at Bristol-Myers Squibb, in the late 1990s.

The primary author of it didn't know about arrays. I'm not sure if he didn't know about them being something that had already been invented, or whether he just didn't know Perl supported them, but either way, he reimplemented them himself on top of scalars (strings), using $foo and $foo_offsets. For example, $foo might be "romemcintoshgranny smithdelicious" and $foo_offsets = "000004012024", where he assumes the offsets are 3 digits each. And then he loops through slices (how does he know about slices, but not arrays?) of $foo_offsets to get the locations for $foo.

By the time I was done refactoring that 12k+ was down to about 200 ... and it still passed all the tests and ran analyses identically.

sandebert
2 replies
2h12m

Nice. But I have to ask, considering it was Perl: Could an outsider understand it after you reduced it to 200 lines?

randomdata
0 replies
1h27m

199 of the lines were comments.

dmd
0 replies
2h10m

Certainly better than they could when it was 12000.

hinkley
2 replies
2h9m

We should use the Antarctic highlands as a prison colony for people found guilty of writing Stringly Typed Code. Siberia isn’t awful enough for them. I thought people who stuffed multiple independent values into a single database column were the worst and then I saw what people can accomplish without even touching a database.

dmd
1 replies
2h6m

Ha - we did that too at BMS. We were paying Oracle by the column or something like that, so people would shove entire CSV rows into a single value (because corporate said everything HAD to be in Oracle) and then parse them application-side.

nyokodo
0 replies
1h6m

people would shove entire CSV rows into a single value

So you invented a “no-sql” document database in Oracle.

dezsiszabi
1 replies
2h31m

That's a very impressive achievement, reducing that to around 200 lines. Congrats :)

itsameputin
0 replies
2h21m

Well the lines could very well be 1000s of bytes long. Perl oneliners used to be a thing in a company that I used to work for.

rokkamokka
0 replies
2h34m

Goes to show that solutions don't need to be good to be impressive. Because damn, I'm impressed he did that

harry_ord
0 replies
29m

Admittedly I've somehow only worked in perl but the worst code I tried for fix felt similar. They know about arrays but every map and grep used perks default $_ and there was enough nesting the function was near 1k lines if I remember right.

apwell23
0 replies
13m

and it still passed all the tests

All his sins can be forgiven because he made it so easy to refactor by writing comprehensive tests.

mnahkies
3 replies
4h32m

I think the intriguing part was purposefully using the same sequence value for rows in multiple tables.

I've worked with globally unique (to our application) integer keys, and per table integer sequences (which obviously aren't globally unique), but I don't recall seeing anyone use a global sequence but purposefully reuse elements of the sequence before.

collinvandyck76
1 replies
4h19m

I've seen this fairly recently and I was surprised to also see it in this article because I think that makes it twice that I've seen it in ~25 years.

airstrike
0 replies
3h17m

They may even be the same codebase!

linux2647
0 replies
2h59m

It’s kind of like an idempotency key assigned either by the request or generated at the beginning of processing that request

MarceColl
2 replies
4h37m

Yes, but both these have very different properties. He said (I don't know if its the case) that the db didnt have an autoincremental type. Postgres uses these sequence objects to implement autoincremental ids as he was referring to, they are implemented in-engine and are very fast and have already solved data races.

In the article, what he complains about is not about what a sequence is, but about implementing it manually with a table that is read, incremented and then saved. This us more expensive, and depending on how it was implemented you need to take care of the whole data flow so you are unable to allocate the same id twice. That's what he considers odd

pbronez
1 replies
2h47m

The scary thing to me about that setup is how the global value is updated. Every individual script must successfully increment the value to avoid duplicate keys.

Really hope they had a “get key” stored procedure to handle that.

SoftTalker
0 replies
1h31m

It's possible they had some requirement for gapless ID numbers. You can't do that with a sequence.

_elf
1 replies
4h35m

Databases have built-in features for this now. What the author is talking about is a regular table.

In reality, that wasn't too unusual to see because frameworks would use that technique because it's a lowest common denominator across RDMS.

ssdspoimdsjvv
0 replies
2h27m

Does SQLite have sequences yet?

williamdclt
0 replies
4h37m

yeah, I've seen tables with a single row and single columns a few times, for alright reasons. Sometimes you do just want to store a single global value!

masklinn
0 replies
41m

I think you missed the crucial part: every related record across all tables would have the same sequence item (the same id). That's really not normal in the database world. It sounds a lot like ECS though.

trte9343r4
6 replies
4h15m

Slicing columns into multiple tables is fairly common type of sharding. Sort of SQL way to do columnary store.

pelagicAustral
4 replies
3h40m

This is such an horrendous practice, I am yet to find a database were this makes sense. Maybe my brain is not wired for it

zo1
3 replies
3h26m

The problem is maybe not so much the splitting and putting extra columns in a separate table. It's that you even have a table that large that it necessitates such a thing. Worst case you have a main table and a detail table that has a one to one correlation to the main entity table.

magicalhippo
2 replies
3h1m

Why is that worse than a couple of dozen joins?

zo1
1 replies
2h39m

Because that means your data is highly denormalized and has plenty of duplicates. But in all likelihood it means no one knows wtf this table actually represents and you should be firing people.

I've seen this play out. Usually the many columns is because everyone misuses the table and eventually their special little business scenario or "filter" needs to be a column. Bonus points is whoever has to reference this table, they have to copy over whatever the hell your PK seems to be, and the cycle repeats, this time a bit worse.

Last place I did a brief project in had this. Queue 1000 tables spread across 25 schemas, each table having wide PKs, 20 redundant indexes on each table, and despite all this the database performs poorly. No one can tell you what each table represents, the table names are meaningless and the same data is everywhere. In order to get anything done you have to ask a small cabal of priests that knows the processes that write between these tables. After about 10 years, a partial rewrite happens and you now have 1/3rd of the data on each side with plenty of duplicate and overlap because hey.

I feel torn, I really wanna name&shame this company as a warning to all developers thinking about working there.

pelagicAustral
0 replies
1h59m

I feel you... I mean "...you should be firing people", this is my day to day way of thinking.

My thoughts are that, hyper-specialization, and grind breed this type of data structure. But so many companies are forced to choose, and generally tend to sacrifice on the database side of things. Then you end up with this type of unruly structure.

Database theory and practice should be a MUST on all software development courseware.

mort96
0 replies
1h42m

But splitting into multiple tables because you hit the 1024 column limit is probably not a common type of sharding...

Waterluvian
6 replies
4h58m

The first line really hits me hard. There’s something so incredibly freeing about being a kid and doing stuff like coding. There’s simply no expectations. Even the smallest project felt like such an achievement. But now I code professionally and I don’t know how to turn off engineering brain. I don’t know how to be okay doing something poorly, but on my terms.

neilv
1 replies
3h56m

I think you can add a mode to engineer brain, in which it's aware when it's kludging something, but kludging is the appropriate thing to do.

Might help to have enough experience that you have (well-placed) confidence in your gut feel for how much engineering a particular thing needs.

(More common is to not have an engineer mode, or to not have enough experience to do it well.)

If you don't have the kludge mode, try some things where kludges are outright necessary. One time I recall this was when writing Emacs extensions. For example, it used to be that you'd hit a point where normal programmatic navigation/parsing of the buffer was just too slow, and a well-placed regexp operation that worked 99.99% of the time (and the other 0.01% of the time it's OK enough, and the user understands) made the rest of the code viable.

Another example is personal open source projects, when you really want to experiment with an usual implementation approach, and you give yourself permission.

(Maybe don't do this when the open source is done only for resume purposes, where you're trying to demonstrate you follow all the current fashionable conventions. You're almost guaranteed that someday a new-grad on a public forum or not-very-technical hiring manager will stumble across that bit of code, and fixate on the one thing they think they recognize as a bad practice. Documenting the kludge and rationale, in, say, a code comment, is great practice, but, again, it will also draw the attention of the least-skilled. Much like, if you're relating an anecdote in a job interview, and implied are 20 things you did right and 1 thing you did brilliantly, and you note as an aside one mistake you made, most of that will whoosh over the head of the dimmest FAANG interviewer, but they'll latch onto that mistake you spelled out, as a wise insight they have, and that's what's going in their report. :)

Then, armed with an experienced multi-skilled brain, when your startup's imminent MVP launch is facing crazy constraints, you triage what bits need to be rock-solid so they'll absolutely work and not fail, what bits need creative kludging so you can hit your launch window for other key requirements, and what bits you need to mitigate any compromises. Ideally, you have the wisdom to know which is which, and can activate different skills for each kind of work.

That Turbo Encabulator needs to be well-machined, but the new sensor you just decided it needs for a test doesn't need a week to be drilled into the engine block, nor a mount to be designed and CNC'd or 3D-printed, but the nearest Velcro or zip-tie or maybe wad of chewing gum will do.

Waterluvian
0 replies
3h52m

All such great advice. The one time I’ve found success is when I figure out the entire scope of the little project up front. Then I find myself not even worrying about maintenance or scalability. I know I won’t need that extra class abstraction or modularity or whatever. I just don’t like planning at the start, but it’s probably, counterintuitively, part of the answer.

P.S. you opened a parenthesis and forgot to close it so now everything I read all day is part of your anecdote)

jimmyhmiller
0 replies
3h24m

This is something I've had to train myself to overcome. The first step for me was having a place where I clearly signaled that this wasn't my best work. Where the rules were allowed to be broken. That place is my junk drawer of code[1].

I have a zsh alias `alias changes='git add . && git commit -am "Changes" && git push'` that I use for my commits in the repo.

This all may feel silly, but it's what I needed to get back to that time of playing. Where I never worried about the code. But where I also didn't feel I was wasting my time working on code no one could see. I'd definitely recommend trying something like that if you are struggling with it.

[1]: https://github.com/jimmyhmiller/PlayGround

datavirtue
0 replies
4h20m

It's because you don't get to write apps from scratch enough. Everything is a POC. You get it working. Optimization is just dumb until the project is so big it doesn't fit in the developers brain anymore. Then you write tests for the core part so you can keep interating fast. You will see parts that clearly need refactorwd before you can move on.

Also, be aware of, but don't get hung up on, engineering best practices. Chances are that someone is going to see your POC and put it in front of a client or promise it to someone. It will enter service being less than perfect, if you are lucky.

When you pull this off people will tell stories about you and brag about you when you are not around. None of them will know anything about the code engineering.

captn3m0
0 replies
4h40m

Writers often say that your first N books will be crap, and the answer to that problem is to just write more than N books. I feel that’s true of software as well - the first years of the magical programming I did without any expectations meant those projects were utter crap yet functional.

actionfromafar
0 replies
4h44m

What ever small success I had outside work has been whenever I managed to turn off Enterprise brain.

surfingdino
4 replies
4h2m

I had dubious pleasure of working with similar codebases and devs. I'll remember one of those guys forever, because whenever he wanted to work on a new branch he would clone the repo, make changes to the master branch, and push code to a new repo numbered repo0001, repo002, ... He refused to change his ways, because "I have a PhD so you are wrong".

Another WTF moment was realisation that MS SQL Server does not support BOOLEAN type. That made porting code fun.

marcosdumay
3 replies
3h52m

Another WTF moment was realisation that MS SQL Server does not support BOOLEAN type.

The standard does not have a boolean type. It's a postgres extension that the other open source databases adopted (because, yeah, it's obvious). But the proprietary ones insist on not having.

The official recommendation is using byte on MS SQL and char(1) on Oracle. Both are ridiculous.

nolist_policy
1 replies
3h18m

C programmer joins the chat.

mort96
0 replies
1h45m

Hey we've had _Bool and <stdbool.h> since 1999!

bdcravens
0 replies
2h28m

Pedantic, but the type is bit in MSSQL.

jancsika
4 replies
1h34m

Oh man, this article reminds me of an article that was a parody of some horrid business logic.

Something like this: a "genius" programmer was somehow, for some reason using svn commits as a method dispatcher. Commit ids were sprinkled throughout the codebase. A new hire broke the entire system by adding comments, and comments weren't compatible the bespoke svn method dispatcher.

Does anybody remember this article? I really want to read it again.

jancsika
1 replies
50m

Yes!

Has anyone tried implementing it? If not I'm going to give it a shot. :)

masklinn
0 replies
30m

It'd be way easier (and actually not completely insane, though still far from sane) in git: you'd store the functions in blobs instead of misusing svn commits. It'd probably be a lot faster too as Git is fairly good at giving you object content.

Admittedly the GC would be an issue, but I think that would be fixable: instead of a `functions` key in a json, link all the functions in a tree object, the link that tree object from a commit that way you have the author for free. And then the class name can be a ref' in a bespoke namespace.

Well that's the issue that it's not necessarily clear what commits go together to compose the system, so might actually be better to replace the ref' per class by a tree for the entire thing, with each class being a commit entry, and an other commit capping the entire thing. Of course git will not understand what's happening as commit entries in trees are normally for submodules but that's fine, git is just used as a data store here.

tryauuum
3 replies
3h20m

Would be great to work in such a company as a Linux guru.

There are so many entangled services and machines that you feel like an Indiana Jones. You ssh into a machine and feel century-old dust beneath your footsteps. And you never know what will you find. Maybe a service which holds the company together. Maybe a CPU eating hog which didn't do anything useful last 3 years.

I don't enjoy writing new code much. But in such an environment even with my limited skills I can do decent improvements, especially from security point of view. Feels great

arnorhs
1 replies
2h4m

Well it sounded like there are exactly 0 Linux machines running there.. it's all windows .net / c# and a bunch of native windows apps, as I understood the article.

But maybe you can replace your statement with "windows guru" and SSH with "remote desktop" and perhaps that would be fun

tryauuum
0 replies
53m

well, with linux at least I get get the source code of the kernel and of the MySQL when given a database server which hasn't been restarted for 7 years

noisy_boy
0 replies
2h6m

Basically the entire application is loaded with low hanging fruits and things that look like fruits but are bombs that explode upon contact.

dakiol
3 replies
4h6m

Honestly, I like to work on such systems because:

- there is so much stuff to improve. There’s nothing better than the feeling of improving things with code (or by removing it)

- it’s back to school again and everything goes. You can implement features in any way you want because the constraints the system imposes. Granted, sometimes it’s painful to add functionality

- there’s usually no room to subjective topics like clean code and architecture. The most important thing with these systems is correctness (and this is usually an objective topic)

- nobody can blame you for something that doesn’t work. It’s always the fault of the legacy system

I wouldn’t recommend working on such systems to junior engineers, though.

I don’t really like to work on “perfect” codebases where everyone follows the same pattern, with linters, where if something breaks is because of your shitty code (because the codebase is “clean”). It’s very frustrating and limiting.

poikroequ
0 replies
1h45m

there is so much stuff to improve. There’s nothing better than the feeling of improving things with code (or by removing it)

And there is nothing worse than a crappy codebase the company won't let you improve.

korhojoa
0 replies
2h38m

I mean, it is kind of nice to notice that something isn't going to work because you have the linters and tests. When your developer count goes up, chances that erroneous behavior is included also goes up.

I've created proof-of-concepts that worked perfectly but would make you cry if you looked at how they worked. Eventually they became that mess. Everything is a self-contained unit so it doesn't mess anything else up. Of course, there is always time to keep adding new stuff but never to refactor it into what it should be.

I prefer the way with linters and tests, it at least lessens the chances of whatever is put in being broken (or breaking something else). (Then again, somebody putting "return true" in a test _will_ surprise you sooner or later)

SoftTalker
0 replies
1h29m

You also get to see some genuinely creative stuff that works well, written by people who aren't indoctrinated in a particular approach.

codetrotter
3 replies
4h19m

went by the name Munch

How do you pronounce that?

Was it like the word munch in “munching on some snacks”?

Or like the name of the painter Edward Munch? https://www.nrk.no/kultur/nrk-endrer-munch-uttale-1.539667 (note: this link is in Norwegian)

jimmyhmiller
2 replies
3h44m

As in munching on snacks

thrwaway1985882
1 replies
2h54m

Hey former colleague - just had to say hello on a comment where you might see. I started reading this article and everything started feeling so familiar... as soon as you told me Munch was the resident shaman, everything clicked.

My favorite factoid for others was that when I was there, we used split-horizon DNS to squat on an in-use domain name for tons of internal services, including Github. I kept wondering what would happen if the owner realized & set up his own services to catch people who weren't on the VPN.

rendall
0 replies
2h13m

::Edvard Munch scream emoji::

Twirrim
3 replies
3h0m

Back about 15 years ago, I worked for a web hosting company that provided some sysadmin consultation services. Customer paid us, and I would take a look.

I had one customer who came back with the same request, slightly differently worded, every single month, and every single month I'd say the same thing. They had this site they were running that was essentially a Yellow Pages type site. They had a large set of companies with contact details, each with multiple business categories associated with it. You'd choose a category, and they'd return a list of matching companies.

The problem was the site was really slow. I took a quick look around, and saw that all the time was lost querying the database. Taking a quick look at the schema I discovered that their approach to categorisation was to have a TEXT column, with semicolon separated 4 character strings in it. Each 4 character string mapped to a business category.

So when someone wanted to load up, say, all pest control companies, it would check the category mapping table, get the 4 character string, and then go to the companies table and do:

    SELECT * FROM companies WHERE categories LIKE "%PEST%"
So on each page load of the main page type the site was there to provide, it did a full text search over the category field for every single record in the company table.

I guess that's probably okay for the developer without real world scale data, and real world traffic counts to worry about. But they had lots of data in the database, and that category field could have dozens of categories against a company. As soon as they had more than about 4-5 simultaneous customers performance started tanking.

I could never get them to accept that they needed to rethink the database schema. One month they were bleating about how is it possible that Google can manage to do such a search across a much larger amount of data, much faster. They really didn't like my answer that amounted to "By having a sane database schema". All they were willing to do was pay over the odds for our most powerful server at the time, which had enough capacity to hold the entire database in memory.

throwup238
2 replies
2h53m

In case anyone is looking for a performant way to implement categories like that in Postgres: https://news.ycombinator.com/item?id=33251745

I stumbled across that comment a few years back and it changed the way I handle tags and categories so just sharing it here. If anyone has an equivalent for Sqlite, I’d love to hear it!

wizzwizz4
1 replies
2h5m

It's a relational database: why not just use a PageCategories table with two foreign keys?

Twirrim
0 replies
1h24m

That's what my suggestion came down to. A two column table with company id and category id. They had ids already. They could index off category and get the results in split seconds

gamepsys
2 replies
4h5m

All that remained were ragtag interns and junior developers.

For many people, their first job in software engineering is the worst codebase they will deal with professionally for this reason. The first job hires lot of people with little/no experience. As soon as someone gains some experience than can move on to better paying jobs, where there are better developers with better standards.

prudentpomelo
0 replies
3h40m

This is the exact boat I am in. I always say the best thing about our codebase is the worst thing: junior developers can do whatever they want.

mleo
0 replies
48m

Worst code bases are often ones taken over from IT consultancies. They drive young, inexperienced developers working many hours to deliver functionality. While the project may start out “clean” using whatever is the current hotness in technology, at some point getting stuff developed and throwing over the wall to QA is the important part.

lemme_tell_ya
1 replies
2h27m

My first day at my first full-time programming gig, I was asked to look at some reporting job that had been failing to run for months. I logged in, found the error logs, found that it needed a bit more memory assigned to the script (just a tweak in the php.ini) and let the team lead know it should run fine that night. He was shocked, "Dude, if you just fixed that report you probably just got a promotion, no one has been able to figure that out for months." He was joking about the promotion, but my boss was just as shocked. I'd realize later that most the other people on the dev team didn't like linux and wanted to rewrite everything in .NET and move everything to Windows so no one even tried with anything related to any of the linux machines.

mleo
0 replies
31m

I know things have gotten somewhat better, but the amount of wasted time and latency of using RDP and Windows UI for development, testing and production maintenance is insane. Throw in some security requirements of RDP into host 1 to the RDP jump to host 2 and companies are just wasting money on latency. There is, often, not an appreciation of the administrative costs of the delivery. Not necessarily system admin costs, but developer and QA time associated with delivering and ongoing maintenance.

throwaway93982
0 replies
1m

When it comes to building things, the outcome is dictated by the constraints. But not the way most people realize.

The first way constraints affect outcomes is by absolute limitation. You can't exceed the speed of light, so nothing you make will ever be faster than that.

The second way constraints affect outcomes is by the path of least resistance. You have a mountain with two slopes: a long gradual one, and a short steep one. Most people will climb the long gradual one, but it will take them 4x as long to climb it.

The third way constraints affect outcomes is by strategy. If you are given very limited materials and time to achieve a goal, you will have to think up a novel way to achieve it that maximizes your resources.

Constraints can be good and bad. But a common problem occurs when people build something: if they are not given the correct goals, and they have very little stake in the outcome, they will apply their constraints incorrectly. They may make something that works, but not nearly as well as it could have worked if they were properly motivated. The leadership and vision that guides building is much more important than the people doing the building, because it will determine the outcome.

telgareith
0 replies
1h55m

First real job, stayed for 10yrs, 5yrs too long, put VB6 COM objects into the asp.net codebase.

tbm57
0 replies
1h21m

Working on something like that would drive me absolutely batty. I am happy you were able to find your zen in the middle of that chaos. This post truly speaks to the human condition

rendall
0 replies
2h15m

I'm glad OP was able to maintain a good sense of humor about it. Such discoveries as the nested classes of empty methods have sent me into teeth-gnashing fury followed by darkness and despair. One such experience is why I would rather change careers if my only option were to build on the Salesforce platform, for instance.

qxmat
0 replies
2h24m

Jira's now discontinued server version had a sequence table to stop you sharding it. It also made disaster recovery from a hard shutdown awful. I have nothing good to say about Atlassian.

omoikane
0 replies
2h0m

This codebase sounds like a haunted graveyard[1], where everyone just fixes their local corner of things and avoid the risk of untangling the existing mess.

Not needing to conform to some company-wide standard is probably really pleasant while it lasted, but every such effort adds to the haunted graveyard, and the lack of consistency will eventually come back to bite whoever is still around.

[1] https://www.usenix.org/sites/default/files/conference/protec...

nikodunk
0 replies
1h17m

What a beautiful, humorously written, bitter-sweet piece of writing!

mrighele
0 replies
2h23m

Now the story I heard at the time was that once upon a time SQL Server didn't support auto-incrementing ids. This was the accepted, correct answer.

At a company that I used to work, they heard the same rumor, so instead of using identity columns or sequences, they kept a table with a number of ids "available" (one row per id). Whenever unique id was needed, the table would be locked, an id selected and marked as used. If there were no ids available, more ids would be added and then one used. A scheduled job would remove ids marked as used from time to time. Note that there was a single "sequence table", that was shared among all of the entities.

That was not even the weirdest part. That id was unique, but NOT the primary key of the entity, only part of it.

The structure of the database was fairly hierarchical, so you had for example a table CUSTOMER in 1-to-many relation with a USER table, with a 1-to-many relation with an ADDRESS table.

while the primary key of the CUSTOMER table was a single CUSTOMER_ID column, the primary key of the USER table was (CUSTOMER_ID,USER_ID), and the primary key of the ADDRESS table was (CUSTOMER_ID,USER_ID,ADDRESS_ID). There were tables with 5 or 6 columns as a primary key.

mg
0 replies
1h58m

That is why people these days tend to use a single JSON blob instead of multiple columns. And because it is so popular, SQLITE and other DBs are building better and better JSON support into the DB.

I wonder if better support of EAV tables would solve this issue better.

If one could do "SELECT price,color,year FROM cars! WHERE status='sold'" and the "!" would indicate that cars is an EAV table ...

    entity attribute value
    1      price     20750
    1      color     red
    1      year      2010
    1      status    sold
... and the result of the query would be ...

    20750 red 2010
That would solve most of the use cases where I have seen people use JSON instead.

jkestner
0 replies
2h11m

Early on in my career, at a large company, I encountered someone who took “codebase” a little too literally. At the time every department had their own developers, sometimes just employees who had an aptitude for computers.

This one guy established himself by making an Access database for their core business, and when the web became a thing, built a customer site. But not on it—in it. He simply served ASP pages directly from the database, inserting dynamic content in queries. When I was asked to help improve their terrible design, I was forced to untangle that unholy mess of queries, ASP (new to me) and HTML. It was easiest to write all the HTML and insert their ASP right before I sent the files back (because I wasn’t given access to their DB/web server). Thinking “I could do better than this” got me into programming.

He was a Microsoft-everything head. Finally went too far when he presented a new web interface starring a Clippy-like parrot using Microsoft’s DirectX avatar API. The executives were unimpressed and then I noted that 20% of our customers couldn’t use his site. (I probably still had a “best viewed with IE” badge on the main site, lol)

jbkkd
0 replies
1h20m

Sounds awfully like my first job, with the addition of not having _any_ sort of test - functional, integration or unit. Nothing.

A few months in, when I approached the CTO and asked if I could start writing a test framework, he deemed it a waste of time and said "by the time you'd commit the test, it would go out of date and you'd need to rewrite it".

Naturally, the build would break about 5 times a week.

Boeing was a major customer of this system, so when shit hit the fan at Boeing a while ago, I wasn't surprised.

interactivecode
0 replies
2h24m

sure it might be a mess, but at least it's purpose built. I love that kind of performance gains. Honestly most companies die before purpose built code like that becomes a problem.

deathanatos
0 replies
31m

Two early databases I worked on.

The first contained monetary values. These were split over two columns, a decimal column holding the magnitude of the value, and a string column, containing an ISO currency code. Sounds good so far, right? Well, I learned much later (after, of course, having relied on the data) that the currency code column had only been added after expanding into Europe … but not before expanding into Canada. So when it had been added, there had been mixed USD/CAD values, but no currency code column to distinguish them. But when the column was added, they just defaulted it all to USD. So and USD value could be CAD — you "just" needed to parse the address column to find out.

Another one was a pair of Postgres DBs. To provide "redundancy" in case of an outage, there were two such databases. But no sort of Postgres replication strategy was used between them, rather, IIRC, the client did the replication. There was no formal specification of the consensus logic — if it could even be said to have such logic; I think it was just "try both, hope for the best". Effectively, this is a rather poorly described multi-master setup. They'd noticed some of the values hadn't replicated properly, and wanted to know how bad it was; could I find places where the databases disagreed?

I didn't know the term "split brain" at the time (that would have helped!), but that's what this setup was in. What made pairing data worse is that, while any column containing text was a varchar, IIRC the character set of the database was just "latin1". The client ran on Windows, and it was just shipping the values from the Windows API "A" functions directly to the database. So Windows has two sets of APIs for like … everything with a string, an "A" version, and a "W" version. "W" is supposed to be Unicode¹, but "A" is "the computer's locale", which is nearly never latin1. Worse, the company had some usage on machines that were set to like, the Russian locale is, or the Greek locale. So every string value in the database was, effectively, in a different character set, and nowhere was it specified which. The assumption is the same bytes would always get shipped back to the same client, or something? It wasn't always the case, and if you opened a client and poked around enough, you'd find mojibake easily enough. Now remember we're trying to find mismatched/unreplicated rows? Some rows were mismatched in character encoding only: the values on the two DBs were technically the same, just encoded differently. (Their machines' Python setup was also broken, because Python was ridiculously out of date. I'm talking 2.x where the x was too old, this was before the problems of Python 3 were relevant. Everything in the company was C++, so this didn't matter much to the older hands there, but … god a working Python would have made working with character set issues so much easier.)

¹IIRC, it's best described as "nearly UTF-16"

brunoarueira
0 replies
1h58m

Once not long enough, I'd worked on 4 projects which was literally copied from the first made and changed parts of the customers and internal users according to each use of it. So, the main problem is bugs found in one project was found on the other 3 and I'd to fix the same bug!

Codebases like this or from the OP is cool to learn how to not do certain things.

bdcravens
0 replies
2h23m

A couple of weeks ago I had a flat on the way to the airport. It was a total blowout, and my car doesn't include a spare. We were already over budget on our trip, so I had the car towed to the closest tire shop and had them put on the cheapest tire that could get me to the airport. I know I'll need to replace other tires, as it's an AWD, and I know it's not a tire I really want. I made a calculated choice to make that a problem for future me due to the time crunch I was under.

Programming is a lot like this.

arnorhs
0 replies
1h55m

I really love these kinds of stories. Does anybody know if there's a collection of similar stories/code bases anywhere?

I guess there is dailywtf but that's mostly bugs. Probably good enough though

airstrike
0 replies
3h16m

This is glorious. I can imagine people could write versions of this for every industry out there and they would all be equally fun to read.

agentultra
0 replies
1h10m

The ending is pure gold. Some of the best times in my career were working on a codebase for an application serving folks I knew on a first name basis and had had lunch with.

I could talk through pain points they were having, we’d come up with a solution together, I’d hack up a quick prototype and launch it just to them to try out. We’d tweak it over a couple of weeks and when it was good I’d launch it to all customers.

Messy, ugly code base. But it worked well because it wasn’t over-managed. Just developers doing development. Amazing what happens when you get out of the way of smart, talented people and let them do their work.