As a scientist that ends up working closely with actual professional software engineers... lots of the stuff they do looks like this do me, and I can't for the life of me make sense of why you'd do it.
I have seen a single line of code passed through 4 "interface functions" before it is called that call each other sequentially, and are of course in separate files in separate folders.
It makes reading the code to figure out what it does exhausting, and a few levels in you start to wonder if you're even looking at the right area, and if it will ever get to the part where it actually computes something.
This is actually really bad practice and a very “over eager junior engineer” way of writing software. You’re not off base at all that it seems excessive and confusing. It’s the kind of thing that seems technically complex and maybe even “elegant” (in isolation, when you first write the “interesting” code) at first but becomes a technical nightmare when used in real software that has to grow around and with it. You’re actually more on point in worrying about the understandability and debuggability this introduces.
I spent the better part of two years unfucking some Go software that (among other things) misused channels. The problem with channels is that you rarely actually need them, but can use them for a lot of different things without too much initial difficulty.
I think a good litmus test for proper use of channels is if you answer no to “could this be done with a direct function call instead?” and “can I use a wait group or mutex instead”, and yes to (zooming out a bit to think about what kind of decisions you previously made that led you to think about using channels) “am I really benefitting from concurrency/parallelism enough to justify the technical complexity of debugging concurrent code”.
I saw some code in a job I was just starting where they had added several abstractions that I found...confusing.
After taking an extra long time to understand what the code actually did, I realized that some junior engineer had been using some design pattern they didn't really understand, and that added zero actual value to the routine.
After deleting all of that code and refactoring it to use completely different abstractions, everything was suddenly much easier to read and to extend.
Design is a hard skill to learn, and junior developers profoundly haven't learned that skill yet. But that's what we need to teach them as senior engineers, right?
Not that I could teach the author of the code I changed, since I think it was written by an intern that no longer worked for the company. But you do what you can.
There is also the fact it’s much easier to write something when you know where you are going. When you start you often just make lots of items general in nature to improve later on.
I mean... some do, some don't. With experience comes appreciation for simplicity and flexibility.
yep. good judgment comes from experience. experience comes from bad judgment.
It's easier to know where you're going when you've gone to similar places a hundred times.
I just think about a problem for two seconds and then have the entire path mapped out.
£3.50p says it was the Generic Repository pattern implemented over Entity Framework dbContext, right?
--------
Speaking of design-patterns, I subscribe to the opinon that _Design-patterns are idioms to work-around missing features in your programmign language_, which explains why Java has no end of them, and why us jaded folk find happiness in modern languages that adopt more multi-paradigm and FP (the post-Java cool-kids' club: Kotlin, Rust, Swift, TypeScript, (can C# join?)) - so my hope is that eventually we'll have a cohort of fresh-faced CS grads entering industry who only know-of Facades/Decorator/Adapter as something a language designer does to spite their users because any reasonable compiler should handle interface-mapping for you - and the Visitor-pattern as a great way to get RSI.
FP has design patterns too, just different ones, and they don't all have tidy names.
Also some GoF design patterns map pretty closely to FP equivalents... pattern-matching on ADTs + traverse/fold + ReaderT ends up looking a lot like the visitor pattern.
...that sounds like hand-written tedium; isn't `replicate` meant to avoid that?
Nah, it was something unrelated to databases.
I can't even remember the details. It's like trying to remember nonsense sentences; they don't stick because they don't really make sense.
To the best I can remember, it was something like the use of an adapter pattern in a class that was never going to have more than one implementation? And it was buried a couple layers deep for no particularly good reason. Or something.
And yes, modern languages like the ones you list make many of the original GoF Design Patterns either absolutely trivial (reducing them to idioms rather than patterns) or completely obsolete.
As someone in leadership, my ‘strong opinion held loosely’ on this, is that there’s absolutely no way to meaningfully build this skill in people, in a theoretical setting.
You can, at best, make them aware that there is such thing as “too much”, and “the right tool for the job”, and keep reminding them.
But nothing, nothing, comes remotely close to the real-world experience of needing to work with over-engineered spaghetti, and getting frustrated by it. Especially if it’s code that you wrote 6 months prior.
Juniors will always do this. It’ll always be the senior’s job to…let it happen, so the junior learns, but to still reduce the blast radius to a manageable amount, and, at the right moment, nudge the junior toward seeing the errors in their ways.
At least plant a seed of doubt early on.
when I was learning Go, I read a guide that told you to fire off a goroutine to walk a tree and send the values back to the main goroutine via a channel. I think about that "just an example" guide a lot when I see bad channel code.
For me the biggest red flag is somebody using a channel as part of an exported library function signature, either as a param or a return value. Almost never the right call.
Okay, I gotta ask - what exactly is wrong with this approach? Unless you're starting only a single goroutine[1], this seems to me like a reasonable approach.
Think about recursively finding all files in a directory that match a particular filter, and then performing some action on the matches. It's better to start a goroutine that sends each match to the caller via a channel so that as each file is found the caller can process them while the searcher is still finding more matches.
The alternatives are:
1. No async searching, the tree-walker simply collects all the results into a list, and returns the one big list when it is done, at which point the caller will start processing the list.
2. Depending on which language you are using, maybe have actual coroutines, so that the caller can re-call the callee continuously until it gets no result, while the callee can call `yield(result)` for each result.
Both of those seem like poor choices in Go.
[1] And even then, there are some cases where you'd actually want the tree-walking to be asynchronous, so starting a single goroutine so you can do other stuff while talking the tree is a reasonable approach.
Assuming latest Go 1.13 I would write an iterator and used goroutines internally.
The caller would do:
Better than exposing channel.But my first question would be: is it really necessary? Are you really scanning such large directories to make async dir traversal beneficial?
I actually did that once but that that was for a program that scanned the whole drive. I wouldn't do it for scanning a local drive with 100 files.
Finally, you re-defined "traversing a tree" into "traversing a filesystem".
I assume that the post you're responding to was talking about traversing a tree structure in memory. In that context using goroutines is an overkill. Harder to implement, harder to use and slower.
I agree with this, but my assumption was very different to yours: that the tree was sufficiently large and/or the processing was sufficiently long to make the caller wait unreasonably long while walking the tree.
For scanning < 1000 files on the local filesystem, I'd probably just scan it and return a list populated by a predicate function.
For even 20 files on a network filesystem, I'd make it async.
Before Go had iterators, you either had callbacks or channels to decompose work.
If you have a lot of files on a local ssd, and you're doing nothing interesting with the tree entries, it's a lot of work for no payoff. You're better off just passing a callback function.
If you're walking an NFS directory hierarchy and the computation on each entry is substantial then there's value in it because you can run computations while waiting on the potentially slow network to return results.
In the case of the callback, it is a janky interface because you would need to partially apply the function you want to do the work or pass a method on a custom struct that holds state you're trying to accumulate.
Now that iterators are becoming a part of the language ecosystem, one can use an iterator to decompose the walking and the computation without the jank of a partially applied callback and without the overhead of a channel.
The only time I've seen it work with channels in the API is when it's something you'd realistically want to be async (say, some sort of heavy computation, network request, etc). The kind of thing that would probably already be a future/promise/etc in other languages.
And it doesn't really color the function because you can trivially make it sync again.
Or a coroutine (caller calls `yield(item)` for each match, and `return` when done).
Yes, but this goes both ways: You can trivially make the sync function async (assuming it's documented as safe for concurrent use).
So I would argue that the sync API design is simpler and more natural. Callers can easily set up their own goroutine and channels around the function call if they need or want that. But if they don't need or want that, everything is simpler and they don't even need to think about channels.
I've used that pattern to write tools to e.g. re-encrypt all whatever millions of objects in an S3 bucket, and examine 400m files for jars that are or contain the log4j vulnerable code. I had a large machine near the bucket/NFS filer in question, and wanted to use all the CPUs. It worked well for that purpose. The API is you provide callbacks for each depth of the tree, and that callback was given an array of channels and some current object to examine; your CB would figure out if that object (could be S3 path, object, version, directory, file, jar inside a jar, whatever) met the criteria for whatever action at hand, or if it generated more objects for the tree. I was able to do stuff in like 8 hours when AWS support was promising 10 days. And deleted the bad log4j jar few times a day while we tracked down the repos/code still putting it back on the NFS filer.
The library is called "go-treewalk" :) The data of course never ends back in main, it's for doing things or maybe printing out data, not doing more calcualation across the tree.
At the time, that was one of the only ways to write decent-looking generic code.
I get your point, but a wait group or a mutex can be removed in favor of a clean usage of channels if the proper concerns are isolated at first. And I would personally much rather reason about channels than mutexes and wait groups. Wait groups and mutexes are just begging for deadlocks and race conditions, where a proper channel, used correctly, eliminates both of those by design.
By that same logic, if you just use wait groups and mutexes correctly, you should also not worry about deadlocks and race conditions. It's also quite trivial to introduce a deadlock with a channel.
Regardless, channels are basically a more expressive/flexible type than mutexes, waitgroups, and function calling, but in the same family as all of them. You can implement any of those rather trivially with a channel, but there are things you can do with a channel that are quite complex or impossible to implement using those. Such a flexible tool allows you to start doing things that are "easy" to implement yet poor design decisions. For example, instead of direct function calling you can now start passing data over channels, which "works" just as well except it incurs some scheduling overhead (not always a concern depending on how perf sensitive you are), makes debugging and interpreting stack traces more difficult (increasingly so as the logic on both sides of the channel increases over time), and allows the software to start evolving in an unintended way (specifically into an overly complex Actor model with tons of message-passing that is impossible to untangle, rather than a directed tree of direct function calling). Or you have a hard time understanding the state and properties of a piece of data throughout the program lifetime because it doesn't "belong" anywhere in particular.
---
Something I thought about recently: perhaps the biggest balancing act in software is between specificity and expressiveness. To solve a problem you might be able to find something that is perfectly tailored to your needs where you just click something or run something and your problem is solved, but it's highly likely that it will only solve that specific problem and not others ones. Alternatively, a lot of software (like Jira, SAP, many enterprise software) is highly configurable and expressive but requires a lot of effort to set up and may not be particularly good at solving your specific task. At its most extreme, you could technically call a computer containing only a brainfuck compiler and basic text editor as being able to solve any problem solvable by a computer, because it's a programmable turing machine.
This extends even into the weeds of programming, especially when you're working on software with other people or over long periods of time, where you might struggle to enforce or maintain your particular mental model for how the software should work. When faced with implementing something with an expressive approach vs a specific one, you want to be expressive enough to be able to modify the code later to do things you plan to do or think you have a high probability of doing, but you want to be specific enough that the purpose and function of something (be it a library, class, binary, or entire distributed software system) is clear - if it isn't clear, the people using it will struggle with it or avoid it, and the people working on it will start taking it in a direction you didn't intend.
Channels are the type of thing that are expressive enough to be broadly applicable, but are easily misinterpreted (you might be using them to implement parallelism, but your coworker Bob might think you're using them because you want to design your software under a message-passing actor model) and easily misused. They also make it very, very easy to "code yourself into a corner" by introducing inscrutable logical/data paths that can't be untangled. You might be able to use them safely in lieu of a mutex but it only takes one Bob to start taking them in the direction of unmaintainability. And sometimes you might be that Bob without knowing it. That's why I think it's best to avoid them unless your other options are even worse.
I agree with most everything else you said - especially about software being a trade off between specificity and expressiveness - but I can’t agree with this.
The problem being that a mutex can hide things that a channel can’t. Channels will always give you what you expect, but that is not the case for mutexes or wait groups or error groups or whatever.
Honestly, the older I get, the more I understand that joke about “you must be this tall to write concurrent programs” and the mark is at the ceiling.
"Do not communicate by sharing memory; instead, share memory by communicating." - https://go.dev/blog/codelab-share
Wait groups are preferred to channels for the purposes they serve. Mostly waiting for goroutines to finish. You can use a channel but wait groups are much cleaner.
Mutexes for shared memory are less preferred than channels. There are always exceptions.
But yeah, if all you have is a hammer then everything looks like a nail. Go has mutexes and wait groups and channels and all of these have their right place and use case. If you're using mutexes to effectively re-implement what channels support then you're doing it wrong. If you're using channels for something that can be a function call then you're also doing it wrong. Software is hard.
Using channels where mutexes would suffice has by far been the main cause of bad concurrent code I've encountered.
Using more than 2 'semantic' channels plus one ctx.Done() channel? There's probably a bug. So far that has been well over 50% accurate, across dozens of libraries.
When they're used like this, chans often break into non-blocking algorithm details, because they don't ensure mutual exclusion. And non-blocking algorithms are freakin hard - few things are guaranteed without great care.
Back at uni, we had a 200-level ‘software engineering’ unit, largely introducing everyone to a variety of ‘patterns’. Reading the Gang of Four book, blah blah blah. You get the idea.
Our final assignment for this unit was to build a piece of software, following some provided specification, and to write some supplementary document justifying the patterns that we used. A mature-aged student that had a little bit of industry experience under his belt didn’t use a single pattern we learned about the entire semester. His code was much more simple as a result. He put less effort in, even when taking into account his prior experience. His justifying documentation simply said something to the effect of “when considering the overall complexity of this problem, and the circumstances under which this software is being written, I don’t see any net benefit to using any of the patterns we learned about”.
He got full marks. Not in a “I tricked the lecturer!” way. I was, and still am, a massive fan of the academic that ran the unit. The feedback the student received was very much “you are 100% correct, at the end of the day, I couldn’t come up with an assignment that didn’t involve an unreasonable amount of work and ALSO enough complexity to ever truly justify doing any of the stuff I’ve taught you”. All these years later, I still tell this story to my team. I think it’s such a compelling illustration of “everything in moderation”, and it’s fun enough to stick with people.
Nice with such teachers :-) What's a 200 level unit? It means it's "very advanced"?
So Grug was right all this time ?
Joke apart this is a really interesting example. In a job interview I've been asked once if I ever regretted something I did and I couldn't quite word it on the spot, but definitely my first project included extra complexity just so that it "looked good" and in the end would have been more reliable had I kept it simple.
In the multi-year series of blaming (us) juniors for every ill in the programming world, they now also get blamed for over-architecting.
I took the opportunity to share that quote with some others on our project because this is a pattern that we recognize from our boss and a few of the productive consultants in the past. Not juniors but people who have/had the weight to set the code style tone of the project and has lead to hours of all of us scratching our head when having to read the code.
But thanks for souring me on this whole thread.
To recycle a brief analysis [0] of my own youthful mistakes:
[0] https://news.ycombinator.com/item?id=41219130
I once had to deal with a HTTP handler that called `Validate` on interface A which called `Validate` on interface B which called `Validate` on interface C which called `Validate` on interface D which finally did the actual work. There was a lot of profanity that month.
I mean to a point that makes sense; you got your base data types like idk, a bank account number which can be validated, which is inside a bank account which can be validated, which is in a customer which can be validated, etc etc. Visitor pattern style, I believe?
imagine allowing invalid values to exist
imagine enforcing invariants as part of the design of a software system
I’m just saying enforce invariants at construction time / type-designing instead of with the validity checks
This wasn't that kind of validation - it was "is this token allowed to do this thing?" Like "validate your parking" kind of scenario.
(And yes, it should probably have been "CheckAuthorisation")
naming things is so hard.
The nerve, taking good jobs away from young qa testers. Wait till the IT Union hears of this!
This is actually my preferred approach. If you want to put a 4gb base64 as your phone number, go right on ahead; best believe I will truncate it to a sensible length before I store it, but sure. Who am I to question your reality.
Sadly, people abuse shit like that to pass messages (like naming Spotify playlists with messages to loved/friends/colleagues while in jail) and maybe we have to assert a tiny bit of sanity on the world.
How do prisoners have access to Spotify?
Presumably some feature/jailbreak of JPay (and the like) tablets.
https://offers.jpay.com/jp5-tablets/
Pretty common, for example when using databases as a mostly dumb store with all the logic in application code, and then a second application (or big refactor) appears and they introduce a subtle bug that results in an invalid `INSERT` (or whatever), and the database happily accepts it instead of rejecting it.
Sounds awful.
That would make sense but this was one piece of data being validated against a lookup based off that data. The previous devs just had a, uh, Unique™ style of development. I swear they must have been on some kind of "editor tab count" bonus scheme.
This can happen when some of the interface operations have added logic, but others (like Validate here) don't, so just get delegated as-is.
One typical example is a tower of wrapped streams, where each layer applies additional transformations to the streamed data, but the Close operation is just passed across all layers to the lowest one (which closes the underlying file or whatever).
It gets worse (this isn't too much of an exaggeration): https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris....
Sometimes there's a scary lack of understanding and competency where you'd expect to find it.
As an undergrad, I once spent about half an hour peer programming with a computer science PhD - it was enlightening.
He didn't have the slightest understanding of software - calling me out for things like not checking that the size of a (standard library) data structure wasn't negative.
But other times these things are done for a reason; sometimes it's actually sane and sometimes it's just a way to deal with the lunacy of a codebase forged by the madmen who came before you.
the fizzbuzz repo hurt. It is oh so true though.
I often wonder if, unbeknownst to me, I am writing similarly over-complicated software. People seem to be unable to tell, so I doubt I can either. It makes me second-guess my code a lot.
Is there any reliable/objective/quantitative way to evaluate such a thing? The repo is a great example of what not to do, but it's so extreme it's hardly useful in practice.. (nor it should care to be..as a joke)
I think it's circumstantial - do your abstractions make it easier or harder to implement and integrate the sort of features that the application generally requires?
There's sometimes significant use in having some powerful abstractions in place to allow for code reuse and application customisation - but all too often people build fortresses of functionality with no idea how or if it's ever going to be used.
Foresight is useful here; if you can look at new features and break them up into feature specific business logic and generalisable application logic, then similar features can be cleanly integrated with less work in the future.
Sometimes however, the level of customisability far exceeds what is strictly necessary; the complexities involved no longer help, but rather hinder - not only understanding, but feature implementation and integration as well.
If you want to know, and you have a project that you can do this with - pick a reasonably complex project, back it up, don't touch it for a year. Can you work out what the hell is going on? If yes - you're probably doing ok.
Just delete some code and see if it still works. Bonus points if you can delete abstractions.
If you have abstractions that you weren’t forced to make after exhausting every other option then you can improve your code. If you begrudgingly add abstractions when you can no longer convince yourself that there must be away to avoid it, then you’re likely doing well.
IMO the question hinges on stakeholder-dynamics and future-predictions of where the software will go, and both of those inputs are highly subjective so the result will also be subjective.
A simple example would be the difference between code consumed by an in-house application managed by a single team, versus code in an open-source utility library used by 50+ companies.
In the first case, extra layers of indirection tend to have bad cost/benefit tradeoffs, particularly if your programming stack makes automatic refactoring fast and safe.
In the second case, layers of indirection are basically required for backwards-compatibility, because you can't just "fix the caller's code" or have a hard-break as conditions change.
And IsEven! (the not-JS one) https://github.com/Benjlet/IsEven
It looks really innocent. But then you browse it...
this is the classic over abstraction problem so that you can change things behind an interface at some point down the line if you ever need to while being totally opaque to any consuming code.
A lot of languages force you to start with this from day one, unless you want to go refactor everything to use an interface later on, so people just do it even when there will literally never be a reason to (and for testability, sometimes).
The cool thing about Go is the interface system is inverted kind of like duck-typing, so if you write purely idiomatic Go, then the thing receiving an arg in a function call is the one specifying the interface it must meet, rather than the implementing code having to declare every interface that some implementation meets.
People screw this up a lot though, especially if they came from Java/C# backgrounds.
Do you have a concrete example by any chance?
Most C# education will teach you to always make an interface for everything for some reason. Even in academia they’ll teach CS students to do this and well… it means there is an entire industry of people who think that over-engineering everything with needless abstractions is best practice.
It is what it is though. At least it’s fairly contained within the C# community in my part of the world.
Isn't that "for some reason" in C# being it's the standard way of doing dependency injection and being able to unit test/mock objects?
I've found it easier to work in C# codebases that just drank the Microsoft Kool-Aid with "Clean architecture" instead of Frankenstein-esque C# projects that decidedly could do it better or didn't care or know better.
Abstraction/design patterns can be abused, but in C#, "too many interfaces" doesn't seem that problematic.
I agree, for the most part. There's a little bit of a balance: if you just drink the kool-aid for top level stuff, but resist the urge to enter interface inception all the way down, you can get a decent balance.
e.g. on modern dotnetcore. literally nothing is stopping you from registering factory functions for concrete types without an interface with the out-of-the-box dependency injection setup. You keep the most important part, inversion of control. `services.AddTransient<MyConcreteClass>(provider => { return new MyConcreteClass(blah,blah,blah)});`
I agree with you on this, my issue is mainly when they bring this thinking with them into other languages. I can easily avoid working with C# (I spent a decade working with it and I’d prefer to never work with it again), but it’s just such a pain in the ass to onboard developers coming from that world.
It may be the same for Java as GP mentioned it along with C#, but they tend to stay within their own little domain in my part of the world. By contrast C# is mostly used by mid-sized stagnant to failing companies which means C# developers job hop a lot. There is also a lot of them because mid-sized companies that end up failing love the shit out of C# for some reason and there are soooo many of those around here. Basically we have to un-learn almost everything a new hire knows about development or they’ve solely worked with C#.
Basically if the Jump function takes an Animal interface, the package that defines the jump function is the one that defines the Animal.Jump method. So if you provide it with whatever entity that has this method, it will just work. You don’t have to define the Animal interface in your Cat package. But if your cat does not Jump, it won’t be accepted. Of course you can also pass in a box that Jumps.
Over-engineering is a common cause: simple solutions can be deceitfully difficult to find. That being said, additional indirection layers are usually justified by the overall architecture, and — assuming they're reasonable — can't always be appreciated locally.
"I apologize for such a long letter - I didn't have time to write a short one." - Mark Twain
Twain? https://news.ycombinator.com/item?id=769624
Had a math teach full of pithy, paradoxical, sardonic and witty quotes. Told us that if we only take away one from his class is that you can never go wrong attributing an unknown quote to Mark Twain or Benjamin Franklin.
I believe you'll find that's a direct quote from Winston Churchill. Or was it Dorothy Parker?
≈
As a software engineer, this is something I get onto with my team. There is a such thing as too much abstraction and indirection. Abstraction should serve a purpose; don’t create an interface until you have more than one concrete implementation (or plan to within a PR or two). Premature abstraction is a type of premature optimization, just in code structure instead of execution.
You deal with getting a disparate bunch of people to persuade a mostly documented and mostly functional (as purchased) collection of IT systems to do largely what is required according to an almost complete specification (which changes on a daily basis). All of that is exhaustively and nearly documented correctly.
There's the weird bits where 2=3 but you don't talk about that too often. James was really clever but a bit strange even by your terms and left a lot of stuff that we can get away with describing as legacy. We sometimes have to explain ourselves to management why the system buys a bunch of flowers every Hallowe'en (piss off Google and your wiggly red line - it's hallowed evening and I know how to abbreviate that phrase correctly) and ships them to a graveyard plot in NOLA. We generally blame James but no-one really knows what on earth is going on. We wrote an automated call closer for that one with a random dialogue.
"There is a such thing as too much abstraction and indirection"
Yes there is. You got two words out of sequence!
"Premature abstraction is a type of premature optimization, just in code structure instead of execution."
I will try to follow your suggestion but it sounds like advice to a teenage boy.
English is a programming language too. You can make people do things by using it. How you deploy it is up to you. I try to come across as a complete wanker on internet forums and I'm sure I have been successful here.
(EDIT: sp)
I have very few words to you, as this is a barely coherent reply...
I'm not sure what that means, to be honest. But generally speaking, junior programmers are still learning to be adults, so I guess that makes sense?
"I have very few words to you, as this is a barely coherent reply..."
Sorry, I've obviously been very rude towards someone for whom English is a second language.
For that: I apologise.
I think something people forget is that computer programming is a craft which must be honed.
A lot of people are introduced to it because computers are such an important part of every discipline, but unfortunately the wealth of mistakes maintaining many a code base occur from those who, quite honestly, simply lack experience.
In the authors case, they don’t explain the simple understanding that every pointer is simply adding a dimension to the data.
int *data; // one dimensional vector
int **data; // two dimensional matrix
int ***data; // three dimensional matrix
Which is one way to interpret things. The problem is that when folks learn computer programming using Python (or another high level language), it’s like using power tools bought from the hardware store compared to some ancient, Japanese wood working technique. The latter takes time to understand and perfect.
Ten thousand hours, so I’ve heard ;)
”every pointer is simply adding a dimension to the data.”
No, it’s not. The concept of having a pointer to a pointer has nothing to do with the concept of dimensionality.
You must be thinking of lists of lists (of lists…), which can be implemented using pointers. The dimensionality, however, comes from the structure of the list, not from the pointers.
I guess you skimmed over: “Which is one way to interpret things”
I’m certainly not thinking of a list-of-lists. In actuality, pointers ARE dimensionality, regardless of the intended use.
I’m not going to spell that out further. If you can’t see that, then you’ve got some work to do if you’re maintaining a C codebase.
Right. -ish. A two dimensional array can be modeled as an array of pointers to one dimensional arrays or as one pointer to a two dimensional array. Both have use cases. It's probably a right of passage for someone new to the C language to understand that difference (I learnt C as a teenager and it took me some time, months, to comprehend all the different permutations of pointers and square brackets).
The coder spectrum, scientists on one end, software engineers on the other. Only balance can save us.
I have read code used in research papers. The theoretical math usually goes beyond my comprehension, so I always dive into the code to better understand the logic, only to find... it's way worse... unintelligible.
At the end of the day we are used to what we do, and anything different will be foreign to us.
No kidding. Most of the academic/scientist's code is such a mess i would rather go through the paper and reimplement algorithm by myself :P
A lot of the code written in math, physics or data analysis settings is really written for the authors alone to understand. They benefit from tens of pages of documentation (papers) plus decades of previous experience from the readers. None of which commercial software systems have.
I agree- I'd like to think I'm somewhere in the middle despite being a scientist. I try to write the best code I can, and keep up on best practices, but try to keep things simple.
A lot of the scientific code out there in research papers is so bad, that when you look at it, you realize the whole paper is actually B.S.. What the paper claims was never even implemented, but they just did the most kludgy and quick thing possible to produce the plots in the paper. The whole paper will often hinge on the idea that they did something that generalizes, and are showing specific examples, when actually they just skipped to producing those specific examples - cherry picked no doubt from other ones that they couldn't get to work - and did nothing else. As a reviewer if I notice this and really tear into the authors, it will usually get published and not fixed anyways.
This happens because you get inexperienced new people doing the actual coding work without any proper mentors or instruction, and then they're under enormous pressure for results from a PI that doesn't understand or care about coding at all. It makes the whole thing a house of cards.
Trying to do it 'properly' as a scientist is an uphill battle, because funders, collaborators, etc. expect the quick (and fake) stuff other people seem to be doing. The realities of developing good reusable software and maintaining it long term are not possible to fund through scientific grants.
People writing usable code in academia are doing it for free on the weekends.
It’s called Clean Architecture, Clean Code or SOLID and it’s extremely stupid. It’s widely used because the man behind it, and a lot of other grifters, are extremely good at selling their bullshit. You also have crazy things like the Agile Manifesto to thank “Uncle Bob” for.
What is the most hilarious, however, is that these things are sold by people who sometimes haven’t coded professionally since 15-20 years before Python was even invented.
Anyway, if you want to fuck with them ask them how they avoid L1/L2/L3 chance misses with all that code separation. They obviously don’t but you’re very likely to get a puzzled look as nobody ever taught them how a computer actually works.
It hardly even matters now because each major function will have to wait on the scheduler queue until the cluster manages to assign it a container, then incur a dozen kinds of network slowness spinning up and initializing, then cold-start an interpreter, just to check a value in a painfully-slowly serialized-then-deserialized structure that was passed in as its arguments + context, only to decide based on that check it doesn’t need to do anything after all and shut down.
Processor cache? Lol.
So why would you want to add to that? A loop in which you change a few attributes on a thousand entities will run 20 times slower when you cause cache misses, even worse if your cloud provider isn’t using fast ram. Then add to that your exponential slowness as your vtable class hierarchy grows and you’re adding a load of poor performance to your already poor performance.
Which might made sense if spreading your code out over 20 files in 5 projects gave you something in return. But I’d argue that it didn’t just cause your CPU, but also your brain, to have memory issues while working on the code.
Nah, instead you’d have them invoke that Donald Knuth quote saying you should never attempt any optimization ever.
Contrary to the "over-engineering" claims, I'll put this explanation up for consideration: it's a result of fighting the system without understanding the details. Over-engineering absolutely exists and can look just like this, but I think it's mostly a lack of thought instead of too much bad thinking.
You see the same thing with e.g. Java programmers adding `try { } catch (Exception e) { log(e) }` until it shuts up about checked exceptions (and not realizing how many other things they also caught, like thread interrupts).
It's a common result of "I don't get it but it tells me it's wrong, so I'll change random things until it works". Getting engs in this state to realize that they're wasting far more time and energy not-knowing something than it would take to learn it in depth has, so far, been my most successful route in dragging people into the light.
(Not surprisingly, this is one of my biggest worries about LLM-heavy programmers. LLMs can be useful when you know what you're doing, but I keep seeing them stand in the way of learning if someone isn't already motivated to do so, because you can keep not-understanding for longer. That's a blessing for non-programmers and a curse for any programmer who has to work with them.)
How would you describe a path to learn this kind of things ? (Even just dropping a link would be appreciated).
Indeed typical education is about algos and programing paradigm (like procedural, functional, OO, etc) and context (system, native apps, web, data), but I don't remember/understand much about what you describe (but definitely faced it on toy projects and reacted like the "junior way" you describe). Heck we even did some deep stuff like language grammar / compiler design / and this thing with the petri boxes but it's a lot less practical and actionable I find.
Frankly: a comprehensive book about the language / subject is generally the best source. Fixing those foundational knowledge gaps takes time, because it's often not clear to anyone exactly what the gaps are - better to be exhaustive and fix it for real rather than thinking the "ah hah!" moment they just had was the only issue.
Not because I think ink on paper is superior somehow, but because books go in depth in ways that blog posts almost never do - if they did, they'd be as large as a book, and nobody reads or writes those. Narrow, highly technical ones exist and are fantastic, but they largely assume foundational knowledge, they don't generally teach it.
---
Learners are stuck in a weird place with programming. At the extreme beginning there's an unbelievable amount of high-quality information, guided lessons, etc, it's one of the best subjects to self-learn on period. I love it.
Experts also have a lot excellent material because there are a lot of highly technical blogs about almost anything under the sun, and many of them are relevant for years if not decades. Programmers are extremely open about sharing their knowledge at the fringes, and the whole ecosystem puts in a lot of effort to make it discoverable.
The middle ground though, where you know how to put words in a text file and have it run, but don't know how to go beyond that, is... pretty much just "get some experience". Write more code, read more code, do more coding at work. It's highly unstructured and highly varied because you've left the well-trodden beginning and have not yet found your niche (nor do you have the knowledge needed to even find your niche).
It's this middle-ground where I see a lot of people get stuck and churn out, or just haphazardly struggle forever, especially if they lack a solid foundation to build on, because every new thing they learn doesn't quite fit with anything else and they just memorize patterns rather than thinking. Which I do not claim is the wrong choice: if that's all you need, then that's likely (by far) the best effort/reward payoff, and I think that's where the vast majority of people can stop and be happy.
But if you want to go further, it's soul-draining and often looks like there's no escape from the chaotic drudgery. Making completely sure the basics are in place and that you think about everything added on top of that is the only real way I've seen people make progress. Whether that's through a mentor, or a book, or just brute-forcing it by hand on your own doesn't seem to matter at all, you just have to find one that works for you. The good news though is that after you've got "I can put words in a text file and it runs" figured out, it goes a lot faster than it does when you're starting in the beginning. And a lot of what you've already learned will be reinforced or slightly corrected in ways that often make immediate sense, because you have a lot of context for how it has failed or worked in the past.
Thanks a lot for the detailed answer. Would you recommend specific publishers or it's a book-by-book basis ? I heard good things about manning.
On the contrary, and I do agree that software engineers take the abstraction too far when they don’t know better, I don’t hold the code produced by people who aren’t software engineers by profession in particularly high esteem either.
You’re looking at two extremes: the codebase that is spread out too much with too much abstraction, and the codebase with zero abstraction that is basically a means to an end. In both cases they are difficult to work with.
I’ve certainly dealt with enough python, JS and PHP scripts that are basically written with the mindset of ‘fuck this, just give me what I want’, whereas people working in the code day to day need the abstractions to facilitate collaboration and resilience.
Agree with this. Abstraction and design patterns when used in a well-thought out manner should make large or complex codebases easier to work with.
And like you, have experienced code bases that tried to throw every design pattern in the book at you, even for a relatively simple application, and made it a pain to work with.
But have also seen them used carefully, in a standard company-wide usage that made all the code easier to understand - worked on a high-volume website with a large codebase, where they had micro-services that all used common 3-tier architecture, security-services, tooling... Really-well thought-out and you could work on any one of their ~100 microservices and already have a good understanding of its design, how to build and debug it, how its security worked, it's caching...
Yeah, agreed, its how these techniques are used that determine if they are useful or just add complexity.
Yeah, neither's great. If given a choice though, I'm absolutely going to take the latter. Yeah, changing something cross-cutting is going to be rough, but my need to do that is usually orders of magnitude less than my need to change specifics.
On a long enough timeline, both will bite me, but the former is much more likely to bite me today.
There's an element of what you might call "taste" in choosing abstractions in software.
Like all matters of taste, there are at least two things to keep in mind:
(1) You can't develop good taste until you have some experience. It's hard to learn software abstractions, and we want engineers to learn about and practice them. Mistakes are crucial to learning, so we should expect some number of abstraction mistakes from even the smartest junior engineers.
(2) Just because something is ugly to a non-expert doesn't mean it's necessarily bad. Bebop, for example, has less mass appeal than bubblegum pop. One of the things that makes bebop impressive to musicians, is the amount of technical and musical skill it takes to play it. But if you're not a musician those virtues may be lost on you whereas the frenetic noise is very apparent.
One of the things Google does better than other huge tech companies (IMO) is demonstrate good taste for abstractions. Those abstractions are often not obvious.
[Obviously the bebop comparison breaks down, and showing off technical skills isn't a virtue in software. But there are other virtues that are more apparent to experts, such as maintainability, library reviews, integration with existing tooling or practices etc.]
A crucial thing to remember in a shared code base is that taste is ultimately subjective, and to let it go when people do something different from you so long as the code is understandable and not objectively incorrect. Remember you're making something functional at the end of the day, not ASCII art.
Yes absolutely. I think harmonizing with things around you is part of good taste. It's one of the things that separates for example a professionally designed interior from a college dorm room filled with an eclectic array of things the occupants like.
I mean we are in “midwit meme” territory here. 4 levels of indirection look fine to an idiot. A “midwit” hates them and cleans them up. But a very seasoned engineer, thinking about the entire system, will happily have 4 layers of indirection.
Like anyone using a HashSet in Rust is already doing four layers of indirection: rusts HashSet is actually wrapping a hashbrown::HashSet. That set is wrapping a HashTable and the HashTable is wrapping an inner Table.
If you’re having trouble comprehending such code then a good IDE (or vim) that can navigate the code on a keypress should help.
Most of the midwit memes I see on programming are :
- beginner "lets write simple, to the point code" - midwit: noo let's have many layers of abstraction just in case - jedi: let's write simple, to the point code
I enjoy reading about new languages on HN. It’s weird because the grammar is weird. It’s like seeing start var ! oper addition sub var !! mltplctn var % close close or something and eventually you realize it means a + b*c. Why invent a new language? Why not just write list<int> if you want a list of integers or write Channel<string> or one of the idioms that everyone knows? I don’t know, but it can be fun to play with communication and meaning and grammar, and maybe even more fun if you get to be part of a group that gets paid for it.
What "everyone knows" depends on who everyone is. There was syntax prior to templates with angle brackets.
I chalk it up to "complexity envy" If they have real problems why can't we?
source me: CS by edu, science geek by choice.
Procrustination. n. the act of writing an infinite regression of trivial nested 'helper' functions because you're not sure how to actually attack the problem you're trying to solve.
I worked with a guy that did this.
A lot of people suffer from the tendency to do a lot of pre-abstraction. They also split everything up way to much without needing to
It is the Law of Demeter. It sounds like a good idea, until you get the chain of wrappers.
https://en.wikipedia.org/wiki/Law_of_Demeter
The author says as much and more at the end of the article.
This is the much reviled ravioli code:
https://wiki.c2.com/?RavioliCode
ITT people who have no context passing judgement on code they've never seen described by someone who doesn't know what they're describing.
Many software engineering adjacent courses, starting with AP Computer Science A, are heavy on the Java-style OOP. And you're never designing an actually complex system, just using all the tools to "properly" abstract things in a program that does very little. It's the right idea if applied right, but they don't get a sense of the scale.
The first place this bites a new SWE in the rear, the database. "Let's abstract this away in case we ever want to switch databases."
Ugh, dealing with stuff like this right now for project config. Why can't I just use yaml files in the build to generate environment files based on region/app and then read those values in the code? Instead, it's that plus a few layers of interfaces grouping multiple levels of config items.
Sometimes when you see things split up to what seems like an excessive degree it's because it can make the code more testable. It's not always the case that you split up a function for the purposes of "re-use" but that you might start by writing a thing that takes input X and transforms it to output Y, and write a test for lots of different inputs and outputs.
Then you write a function that gets some value that one might feed into this function from source Z, and separately test that function.
Then you create another function that reacts to a user event, and calls both functions.
And if a codebase is doing reasonably complicated things sometimes this can result in having to track through many different function calls until you find the thing you're actually looking for.
Sometimes it's also auto generated by an IDE or a framework as boilerplate or scaffolding code, and sometimes it's split up in a seemingly irrational way in one part of the code because you are actually using the same function you wrote somewhere else, so in your case of "4 interface functions" what you might find is that there is a function somewhere that accepts TypeA and that TypeA is a specialisation of TypeB and that TybeB implements InterfaceD.
Or maybe you're just reading some shitty code.
Either way lots of stuff that can look kind of dumb turns out to not be as dumb as you thought. Also lots of stuff that looks smart sometimes turns out to be super dumb.
In conclusion: yes.
I spent several years as a front-end contractor. I saw a lot of the same thing on the front-end, especially with JS. Even when I saw stuff like inserting a single black space on a line with JS, or similar things where its just a mess of calls back and forth before something actually gets executed, I asked a senior dev WTF was going on and why did this stuff get written like this.
His answer was as simple as it was dumbfounding. He said, "Its contractors. They get paid by the hour. Make sense now?" So basically someone was writing a ton of code just to take up time during the day so they could charge the hours back to the company and prove they were working their 40 hours. They DGAF about the code they were writing, they were more concerned with getting paid.
Completely maddening. My senior dev at the time said they won't even spend time refactoring any of it because it would waste too much time. He said they just made sure they have an FTE write the code next time.
It was my "welcome to the wonderful world contracting" wakeup call.
This is a popular pattern in apps or other "frameworky" code, especially in C++.
I can think of at least two open source C++ apps I used where every time I checked in the IRC channel, the main dev wasn't talking about new app features, but about how to adopt even newer cooler C++ abstractions in the existing code. With the general result that if you want to know how some algorithm backing a feature works, you can't find it.
Honestly it might not be you. Simplicity is very valued in good code. So if its hard to understand its probably not good code.
Ignoring inexperience/incompetence as a reason (which, admittedly, is a likely root cause) domain fuzziness is often a good explanation here. If you aren't extremely familiar with a domain and know the shape of solution you need a-priori all those levels of indirection allow you to keep lots of work "online" while (replacing, refactoring, experimenting) with a particular layer. The intent should be to "find" the right shape with all the indirection in place and then rewrite with a single correct shape without all the indirection. Of course, the rewrite never actually happens =)
It happens so that we don't have "a single 15,000 line file that had been worked on for a decade". We don't have the luxury of asking the GitHub team and John Carmack to fix our code when we are forced to show it to the stakeholders.
This perfectly summarizes a lot of production code I've seen. We once replaced like 20 Java files with like a single page of easy to understand and "to the point" code.