Push ifs up and fors down

I was initially surprised by the pushback this article is getting.

Then I remembered that this is data-oriented design advice, and I imagine most people on this forum (myself included most of the time) are writing line-of-business web apps where this advice seems like nonsense. I had already internalised the context, and wasn't planning to go apply this to my Laravel backend code.

A heuristic: if in your usual daily work you don't need to think about the instruction cache, then you should probably ignore this advice.

If you haven't yet and want to get a taste of when this advice matters, go find Mike Acton's "Typical C++ Bullshit" and decipher the cryptic notes. This article is like an understandable distillation of that.

Despite what Casey Muratori is trying to argue (and I'm largely sympathetic to his efforts) most line-of-business software needs to optimise for changeability and correctness ("programming over time") not performance.

Yeah, data-oriented design (DOD) always seems to get people riled up. I think it largely gets this type of reaction b/c DOD implies that many aspects of the dominant object-oriented approach are wrong.

most line-of-business software needs to optimise for changeability and correctness, not performance.

It's a shame that so many see changeability and performance in opposition with each other. I've yet to find compelling evidence that such is the case.

There is a point where performance optimizations get in the way of clarity, but that's after a long plateau where simple software == fast software. And simple software is the most amenable to changeability. It might not be the fastest way to initially write the software though, as leveraging existing large complex frameworks can give an head start to someone familiar with it.

Somewhere, somewhen, we, as software developers, started thinking that other programmers would rather extend code rather than modify it. This has led us to write code that tries to predict the use cases of future programmers and to pre-emptively include mechanisms for them to use or extend our code. And because it has seeped so deeply into our culture, if we don't do this -- engage in this theater -- we get called out for not being good software engineers.

Of course, the extra hooks we put in to allow re-use and extensibility usually results in code that is slower and more complex than the simple thing. Worse, very often, when a customer needs a new feature, the current extension hooks did not predict this use case and are useless, and so the code has to be modified anyway, but now it's made 10x more difficult because of the extra complexity and because we feel that we have to respect the original design and not rip out all the complexity.

I like Knuth's quote [1] on this subject:

I also must confess to a strong bias against the fashion for reusable code. To me, “re-editable code” is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you’re totally convinced that reusable code is wonderful, I probably won’t be able to sway you anyway, but you’ll never convince me that reusable code isn’t mostly a menace.

[1]https://blog.jj5.net/blog/2021/05/21/knuth-on-reusable-code/

Somewhere, somewhen, we, as software developers, started thinking that other programmers would rather extend code rather than modify it.

That was when stuff like "proper testing" was deemed to be too expensive. It's unlikely to break existing workflows with extending something, but very easy to do so during a modification.

Companies used to have hordes of manual testers/QA staff, that all got replaced by automated tools of questionable utility and capability.

The tools are very useful, and they have well-known capability. That capability is strictly less than the capability of most manual testers / QA staff, but it's a lot faster at it, and gets much closer to being exhaustive.

Automation should mean you can do a better job, more efficiently, more easily. Unfortunately, ever since the Industrial Revolution, it seems to mean you can do a quicker job with less money spent on labour costs.

That capability is strictly less than the capability of most manual testers / QA staff, but it's a lot faster at it, and gets much closer to being exhaustive.

That's if you put the effort in to write good tests. When I look at the state of gaming in general, it's ... pretty obvious that this hasn't worked out. Or the GTA Online JSON debacle - I'm dead sure that this was known internally for a long time, but no one dared to modify it.

And even then: an automated test can't spototherissues unrelated to the test that a human would spot immediately. Say, a CSS bug causes the logo to be displayed in grayscale. The developer who has accidentally placed the filter on all img elements writes a testcase that checks if an img element in content is rendered in greyscale, the tests pass, the branch gets merged without further human review... and boom.

Writing generally "reusable code", aka a library, warrants a different approach to software development than application code in many areas.

1. Application code = Fast-changing, poorly specified code. You need to have a rapid development cycle that supports "discovering" what the customer wants along the way. Your #1 job is pleasing the customer, as quickly, and as reliably, as possible.

2. Library code = Slow-changing, highly specified code. You have a long, conservative development cycle. Your #1 job is supporting application programmers (the customers of your library).

Simplest is definitely not more amenable to changes. We've just implemented a large feature where some devs tried to "hardcode" all the logic to apply a kind of rules-engine. I was horrified because the whole thing was coupled to the rules we currently needed, but we all know this is just the beginning and we plan to add more rules , and even allow custom rules to be defined by our customers. So, even though what they were trying to do is often lauded on HN and other forums because it's applying KISS and YAGNI, in this case adding a new rule would mean, basically, changing everything - the engine, the data that the engine persists, potentially the end result... everything! Now, perhaps this was indeed simpler though. However, it's the opposite of modifiable (and by the way, implementing it with abstract rules which store their own data, which the engine needs not know about, is actually much cleaner and the de-coupling just comes for free, almost).

This doesn't sound like a simple solution from your fellow devs. It appears to have been an easy solution. If the distinction isn't familiar to you, there is a great talk of Rich Hickey [0], that explains that distinction and some more. The talk is definitely not only applicable to Clojure.

YAGNI is a great slogan, but must not be a dogma. If you already know, that you are going to need custom rules. Then prepare for it. But, if --- for example --- the current rules are almost trivial, and you don't know yet, which rule engine will be the right fit later on. Then it might be a good idea to postpone the decision about the rule engine, until you know more. In the meantime a hard-coded solution could be enough.

[0]:https://www.youtube.com/watch?v=LKtk3HCgTa8

YAGNI definitely doesn't apply in cases where you do actually know that you are gonna need it.

If the task is "build a rules engine with X, Y, and Z rules, that can add A, B, and C rules next year" then delivering hardcoded XYZ rules engine is an absolute failure to engineer and a very braindead use of "YAGNI"

It's a shame that so many see changeability and performance in opposition with each other. I've yet to find compelling evidence that such is the case.

I think agree with some of the sibling comments that you can write in a default style that is both maintainable and performant. Regularly though, when profiling and optimising, the improvement can reduce maintainability.

Two examples from the last 6 months:

A key routine is now implemented in assembly for our embedded target because we couldn't reliably make the compiler generate the right code. We now have to maintain that in parallel with the C code implementation we use on every other target.

We had a clean layering between two components. We have had to "dirty" that layering a bit to give the lower layer the information needed to run fast in all cases.

With thought you can often minimise or eliminate the impact, but sometimes you take the trade-off.

If you had good tooling, you could annotate the assembly routine as being equivalent to the C implementation; then maintenance would be a lot easier, since the compiler would yell at you if they ever diverged.

Depending on how you've dirtied your abstractions, language toolingmighthelp with that (e.g. if you could decouple the abstraction from the implementation, and map multiple abstractions to the same data), but I don't know whether that could work, or if it'd even be an improvement if it did.

If you had good tooling, you could annotate the assembly routine as being equivalent to the C implementation; then maintenance would be a lot easier, since the compiler would yell at you if they ever diverged.

Is that something actual compilers can do right now? I don't think I have heard of something like though, though I don't work that close with compilers much. Furthermore, if the compiler can recognise that the assembly routine is equivalent to the C implementation, wouldn't it also be able to generate the same routine?

Well it's hard to argue about that tradeoffin general, but I think the existence of languages like Python, Ruby and PHP is compelling. Though I'd accept the argument that they help optimise forneitherperformance nor changeability!

My perspective is necessarily limited, but I often see optimisation as a case of "vertical integration" and changeability as about "horizonal integration".

To make something fast, you can dig all the way down through all the layers and do the exact piece of work that is required with the minimum of useless faffing about for the CPU[0]. But to make something robust, you might want to e.g. validate all your inputs at each layer since you don't know who's going to call your method or service.

Regarding the DOD/OOP wars, I really love this article, which argues that OOP doesn't have to be bad[1]. I also think that when performance is a requirement, you just have to get more particular about your use of OOP. For example, the difference between Mesh and InstancedMesh[2] in THREE.js. Both are OOP, but have very different performance implications.

[0] Casey Muratori's "simple code, high performance" video is an epic example of this. When the work he needed to do was "this many specific floating point operations", it was so cool to see him strip away all the useless layers and do almost exactly and only those ops.

[1]https://www.gamedev.net/blogs/entry/2265481-oop-is-dead-long...

[2]https://threejs.org/docs/index.html?q=instanc#api/en/objects...

Thanks for the thoughtful reply.

Well it's hard to argue about that tradeoff in general, but I think the existence of languages like Python, Ruby and PHP is compelling. Though I'd accept the argument that they help optimise for neither performance nor changeability!

I see the point you're making, and I don't disagree with it. But I should be a bit more clear in what I really meant in my parent comment:

Given whatever language a developer is working in, whether it is fast like C++ or a slow language like Python (or perhaps even Minecraft Redstone), I think the programmer that takes a data oriented approach (meaning they write their code thinking about the kinds of data the program can receive and the kind it will likely receive, along with what operations will be the most expensive) will have better code than a programmer that makes a nice object model following all the SOLID principles. The majority of the OOP code I've worked with spends too much time caring about abstractions and encapsulation that performance is lost and the code is no better to work with.

Regarding the DOD/OOP wars, I really love this article, which argues that OOP doesn't have to be bad[1]. I also think that when performance is a requirement, you just have to get more particular about your use of OOP. For example, the difference between Mesh and InstancedMesh[2] in THREE.js. Both are OOP, but have very different performance implications.

Absolutely agree here. Classic gamedev.net article. ECS != DOD and I think the next parts of the article illustrate how DOD isn't necessarily in opposition with programming paradigms like OOP and FP.

With that said, I think it can be argued that common patterns within both OOP and FP circles are a hurdle at times to utilizing hardware to its fullest. Here's Casey Muratori's argument against SOLID principles[0] for instance.

---------------------

I think the point still stands: performance isn't in opposition to making a maintainable/changeable program.

[0]https://youtu.be/tD5NrevFtbU?si=sED7befZRnVnQIxQ

A missing element to the conversation is another interpretation of DOD, that is, domain-oriented design. My favorite writing on the matter is "Programming as if the Domain (and Performance) Mattered" (https://drive.google.com/file/d/0B59Tysg-nEQZSXRqVjJmQjZyVXc...)

When OOP has bad performance, or is otherwise another instance of ball of mud architecture, it often stems from not modeling the domain correctly. Only using or being forced to use an inferior conception of OOP (i.e. OOP as merely fancy structs + function pointers) doesn't help either, and ends up encouraging the sorts of patterns found in GOF even preemptively before they can maybe earn their keep. And what's especially insidious about Kingdom of Nouns style OOP is that just because you have named something, or created a type hierarchy, does not actually make it a particularly good model of anything. If you interview enough people, you'll find some thinking it's entirely reasonable to do the equivalent of making a Car a subclass of a Garage just because garages contain cars. When bad modeling infects production code, it's difficult to remove, especially when it's so overgrown that a bunch of other code is created not to fix the modeling but to try and wrangle a bit of sanity into localized corners (often at the further expense of performance -- frequently from a lot of data copying and revalidating).

On the other hand, modeling things too close to the hardware, or (as the linked paper goes through) too close to one ideal of functional programming purity (where you want to just express everything with map and reduce on sequences of numbers if you can, in theory giving high performance too), can severely get in the way of changing things later, because again you're not actually modeling the domain very correctly, just implementing one clever and specific mapping to raw numbers. When the domain changes, if your mapping was too coupled to the hardware, or too coupled to a nice map/reduce scheme, the change throws a big wrench in things. Sometimes, of course, such direct hardware and code exploitation is necessary, but we live in a fat world of excess and most business software isn't so constrained. An interesting example from the past could be Wolfenstein 3D, where its doors and especially its secret push walls are handled as special cases by the raycaster. Carmack initially refused to add push wall functionality to his engine, it seemed like it would be too big of a hack and possibly hinder performance too much. Big special cases as ifs down in the fors usually suck. After much needling and thinking though, and realizing the game really did need such a feature, he eventually got it done, surprising everyone when at a final attempt at convincing him to add them, he revealed he already did it.

It's a shame that so many see changeability and performance in opposition with each other. I've yet to find compelling evidence that such is the case.

In the article, when I saw this

For f, it’s much easier to notice a dead branch than for a combination of g and h!

My first thought was "yes, but now if anyone _else_ calls h or g, the checks never happen (because they live in f). I'd much rather have h and g check what _they_ need in order to run correctly. That way, if another call to one of them is added, we no longer need to rely on _that_ call correctly checking the conditions. Plus it avoids duplication.

But... and this goes back to the original point from your post... this is a matter of code being correct over time; changeability. If you're worried about performance, then having the same check in 2 different places is a problem. If you're not (less) worried, then having the code less likely to break later as changes are made is helpful.

You can also often encode those checks in the type system like in the Option example near the start of the article.

Assuming your type system is robust enough to support such things. That's not the case for many mainstream languages.

They don't need to be in opposition: it's enough that the changeability, correctness, and performance of solutions is uncorrelated for you to frequently need to tradeoff between them, especially given the inevitable tradeoff of all of these against time to write.

Yeah. Is there an article showing how a clever one liner that is hard to read and an "inefficient looking" but easy to understand multiline explanation style code with proper variable names etc will result in the same compiled code?

I would assume compilers would be sufficiently advanced nowadays...

Honestly, both of these I think are pretty applicable to line-of-business apps as well.

The "push loops down" advice most especially: for any CRUD app, handling creation and updates in bulk when possible will typically save huge amounts of time, much more than in CPU bound use cases. The difference between doing `items.map(insertToDb/postToServer) ` vs doing `insertToDb/postToServer(items)` is going to be orders of magnitude in almost all cases.

I have seen optimizations of this kind take operations from seconds or minutes down to milliseconds. And often the APIs end up cleaner and logs are are much easier to read.

Oh definitely. I've seen the same kind of benefits. Avoiding 1+n queries is a similar change - push the loop "down" into the database.

Great point. The n+1 sql queries antipattern comes to mind as a very common "push loops down" application in line of business stuff. Let the db do the work/join.

most line-of-business software needs to optimise for changeability and correctness ("programming over time") not performance

These are not mutually exclusive, in fact, more often than not, they are correlated.

Maybe the most important aspect of performance is to make things small. Small code, small data structures, small number of executed instructions. Writing small code is what "thinking about the instruction cache" essentially is, btw.

And as it turns out, the smaller the code, the less room there is for bugs, the more you can understand at once, and the easier it is to get good coverage, good for correctness. As for changeability, the smaller the code, the smaller the changes. The same applies to data.

Now, some optimization techniques can make the code more complicated, for example parallelization, caching, some low level optimization, etc... but these only represent a fraction of what optimizing for performance is. And no serious performance conscious programmer will do that without proper profiling/analysis.

Then there are things that make the code faster with limited impact (positive and negative), and this is what the article is about. Functionally, if/for is not really different from for/if, but one is faster than the other, so why pick the slow one? And even if the compiler optimizes that for you, why rely on the compiler if you can do it properly at no cost. Just like looping over 2D arrays, it is good to know that there are two ways of doing it, and while they look equivalent, one is fast and one is slow, so that you don't pick the slow one by accident.

Is for/if faster since loops get started right away where ifs need to check conditionals constantly on top of whatever action they are supposed to execute?

Loops are fastest when they fit in the processor's instruction cache (and preferably, only touch data that fits in the data cache). Similarly, code is fastest when it has to execute the least amount of instructions. In the first example, the walrus(Option) function is designed to be executed unconditionally, only to return early when there is no walrus. That's an unnecessary function call that can be removed by changing the method signature (in Rust, because it has non-nullable types. In other languages you would need to do the non-null check anyway for safety reasons).

Not even strictly DOD, this trivial principle produces better semantics and drives better code separation down the line.

Several years ago I was exposed to DOD (and then this principle) when working on complex JS/TS-based for long-running systems. It results in better code navigability, accurate semantic synthesis, and easier subsequent refactors.

The side effect: some people remarked that the code looked like C

The side effect: some people remarked that the code looked like C

Did you understand this comment to be positive or negative?

There are indeed 2 religions. Data oriented or Code oriented.

Both are making very different choices as they have diverging trade offs.

I wrote something along the lines. Not the same, but very similar given enough abstraction

https://blog.pwkf.org/2022/08/07/two-programming-religions-c...

I have practiced this advice in line-of-business software to great effect. Popular ORM libraries operate on individual records, but that is quite slow when software needs to import a large number of objects. Restructuring the code to operate on batches—push fors down—results in significantly fewer database queries and network round-trips. The batched routines can resolve lookups of foreign keys in batches, for example, or insert multiple records at once.

Granted, that is more difficult to maintain. I only batched the import routines. The rest still uses the more maintainable ORM.

Despite what Casey Muratori is trying to argue (and I'm largely sympathetic to his efforts) most line-of-business software needs to optimise for changeability and correctness ("programming over time") not performance.

Then consider listening to John Ousterhout instead: turns out that in practice, changeability and correctness are notnearlyat odds with performance than we might initially think. The reason being, simpler programs also tend to run faster on less memory. Because in practice, simpler programs have shorter call stacks and avoid convoluted (and often costly) abstractions. Sure, top notch performance will complicate your program. But true simplicitywilldeliver reasonable performance most of the time.

By the way, while pushing fors down is mostly a data oriented advice, pushing ifs up is more about making your program simpler. Or more precisely,increasing its source code locality:https://loup-vaillant.fr/articles/source-of-readabilitythat’s what concentrating all the branching logic in one place is all about.

I almost agree except that I may have in mind a different kind of engineer than you do: people who don't think about the instruction cache even though theyarewriting performance-critical code. This post won't change that, but I hope that people writing about performance know that audience exists, and I dearly hope that people in this situation recognize it and strive to learn more.

At a large enough scale, even line of business code can become a bottleneck for real-world business activity. And unsurprisingly, engineers who don't think about performance have a way of making small scales feel like large ones because they use architectures, algorithms, and data structures that don't scale.

This happens even in the FAANG companies famed for their interviewing and engineering. I've seen outages last for hours longer than they should have because some critical algorithm took hours to run after a change in inputs, all the while costing millions because one or more user-facing revenue-critical services couldn't run until the algorithm finishes (global control, security, quota, etc. systems can all work like this by design and if you're lucky the tenth postmortem will acknowledge this isn't ideal).

I've had to inherit and rework enough of these projects that I can definitively say the original developers weren't thinking about performance even though they knew their scale. And when I did inherit them and have to understand them well enough to make them as fast as they needed to be, some were at least written clearly enough that it was a joy, and others were a tangled mess (ironically, in some cases, "for performance" but ineffectively).

See also: the evergreen myth that "you don't need to learn algorithms and data structures, just learn to find them online" resulting in bullshit like a correct but exponential algorithm being put into production when a simple linear one would have worked if the engineer knew more about algorithms.

There's so much wrong with how many people do performance engineering that I don't think pushing fors down is even in the top 10 tips I would give, I just think that folks posting and commenting in this space recognize how large and impactful this section of the audience is.

maybe - although most of the time, especially last few years, I write web apps and I push ifs up all the time because it allows for early return like the article says.

Determining that you need to do nothing should be done as soon as possible, especially in any system where performance is essential and web apps make more money the better their performance as a general rule.

I think this is still very much applicable in OOP.

Developers tend to break complex business logic within classes down into smaller private methods to keep the code DRY. The “push ifs up” mantra is really useful here to ensure the branching doesn’t end up distributed amongst those methods.

The “push fors down” is also very relevant when most call threads end up at an expensive database query. It’s common to see upstream for loops that end up making many DB calls downstream somewhere when the looping could have been replaced by a “where” clause or “join” in the SQL.

In fact, the “push fors down” mantra is useful even at the architecture level, as it’s usually better to push looping logic for doing aggregations or filtering into a DAO where it can be optimized close to the DB, rather than pulling a bunch of objects out of the database and looping through them.

I love simple and clear design principles like this!

Though, as with all design principles, one needs to consider it deliberately vs applying it as dogma!

The first example is bad for reasons not related to ifs and fors. In general, if you can, if you have a "container" for something, you should write functions on the contained, domain-level "Thing" rather than the container with the domain level thing.

As an example I work with - Clojure. Sometimes I use agents. I don't write functions for agents, I write functions for things agents might contain.

Similar rules for Elixir. My primary domain level functions don't work off a PID. They work off the underlying, domain-level data structures. GenServer calls delegate to that where necessary.

This makes them more flexible, and tends to keep a cleaner distinction between a core domain (frobnicate the Walrus) and broader application concerns (maybe the Walrus is there, maybe not... oh yeah, also frobnicate it).

Yeah. I think the given advice probably takes validation logic and floats it too high. It is of course nice to have early validation logic, but it is also nice when your functions don't mysteriously crap out with some weird error but instead shout a validation error at you.

Haskell solves this with newtypes, “here is a transparent container that certifies that you did the appropriate validation already,” that helps for this.

The advice that I really want to hammer into people's heads is, prefer “sad ifs.” That is, I will almost always find this

    if (something_is_wrong_in_way_1) {
      // fix it or abort
    }
    if (something_is_wrong_in_way_2) {
      // fix it or abort
    }
    if (something_is_wrong_in_way_3) {
      // fix it or abort
    }

more readable and maintainable than this

    if (things_are_ok_in_way_1) {
      if (things_are_ok_in_way_2) {
        if (things_are_ok_in_way_3) {
          // do the happy path!
        } else {
          // fix or abort thing 3
          // if fixed, do the happy path
        }
      } else {
        // fix or abort thing 2
        // if fixed, test way 3 again
        // if way 3 is good do the happy path, else fix it
        // ...
      }
    } else {
      // ...
    }

I feel like it's in human nature to focus on the expected case, I want everyone whose code I meet to do the exact opposite, focus primarily on the unexpected. Every “if” imposes a mental burden that I am keeping track of, and if you have to go to an external system to fetch that information, or you need to exit early with an error, I can immediately discharge that mental burden the moment I know about it, if the handling and the detection are right next to each other.

I once had a CS teacher who hated early returns and break statements; they would always prefer the "nested ifs" approach. And they're far from being alone - many others believe that path is cleaner.

I'm with the parent poster, though: if you can use control flow to keep the happy path the unindented path, and you have syntax highlighting to help you show that each guard clause actually does interrupt the control flow, it can often be significantly cleaner.

I also like to think of it in terms of "railway oriented programming" (https://fsharpforfunandprofit.com/rop/which is a must-read for anyone trying to wade into monads and functional programming IMO) - the "happy path" should be the path that it's easiest for the train to go down, namely a straight line down the page. Only give yourself the headache of a sharp turn when you're interrupting that path!

I don't mind early returns, but I'm infamously anti "break" and "continue" statements (and all my colleagues to date have disagreed with me and have found my stance hilarious and/or infuriating). My rationale is that they turn a loop into a mutant abomination that's a loop and conditionals all mushed up together. I also can't shake this gut feeling that they're crude and antiquated control flow mechanisms, only one step away from the fiery inferno of "goto". So instead of this:

    for thing in things:
        if thing.is_yellow():
            continue

        if thing.is_rainbow():
            break

        thing.thingify()

I'll do:

    things_iter = iter(things)

    thing = next(things_iter, None)
    is_rainbow_thing_found = (
        thing.is_rainbow()
        if thing is not None
        else False
    )

    while (
        thing is not None
        and not is_rainbow_thing_found
    ):
        if not thing.is_yellow():
            thing.thingify()

        thing = next(things_iter, None)
        is_rainbow_thing_found = (
            thing.is_rainbow()
            if thing is not None
            else False
        )

Surely you are joking? Your version is four times longer and repeats a large section of itself.

The way to avoid control flow here is to simply find the rainbow first, take the sequence up until then, filter by yellow, and thingify the remaining things.

    rainbow_idx = next(i for i, t in enumerate(things) if t.is_rbow())
    yellows = (t for t in things[:rainbow_idx] if t.is_ylw())
    for t in yellows:
        t.thngry()

Excuse formatting, cellphone.

Not sure this is any better than the obvious solution either.

I almost never say this: you are wrong. Code needs to be maintained. Everyone can understand the first example. Everyone has to slowdown and reason about your second example. It is objectively worse. And your argument is flawed: break and continue are not the goto-like behavior that goto is criticized for as they are limited to the scope of that block of code.

The problem traditionally cited with goto is it can go anywhere (even into the body of another function where expectations on which variables are set may not hold).

If you don't like break and continue, how do you justify exceptions? The handling code for those may be in a different file or nowhere. Much closer to goto.

I'm afraid I'm with your colleagues.

I value being able to grasp what the code does at a glance while scrolling through. Your first snippet is entering my brain like as if without even reading at all, all at once and instantly.

Your second example absolutely does not and I'm sure you know why.

Minimizing control flow in loops is a good idea, but if you can FP-ize the loop entirely that keeps it pretty readable IMO.

    things
        .iter()
        .filter(|t| !t.is_yellow())
        .take_while(|t| !t.is_rainbow())
        .for_each(|t| t.thingify());

Considering break and continue to be too powerful as control flow statements is a somewhat common opinion, in my experience. (I still use them, but only sparingly, and where I am quite sure that they're not confusing.)

That said, your second code block is unreadable to me -- even knowing what it does, it's just confusing. No way that that's more readable than making use of break here. If you don't want so many conditionals in the loop body, you can factor the first check out:

    no_yellow_things = (t for t in things if not t.is_yellow())
    for t in no_yellow_things:
        if t.is_rainbow():
            break
        t.thingify()

I’m with your colleagues I’m afraid. The intent of the all-mushed-up-together version is really clear. I had to spend a long time looking at the longer version to figure out what was going on. I know which I’d prefer to maintain if I had to.

Do they come from the times when structured programming was hot and new, SESE had been the buzzword of the year, and that doodad called "transistor" just had been invented?

In short: avoid `else`.

Both what you mention, and also the case with :

   if (...) {
     ...
   } else { 
     return 0; 
   }

Which can become

   if (...) {
     ...
   }

   return 0;

What about `else if`? Not a fan?

Personally I only either use if statements that wrap a single line of logic, or a switch/case statement where each branch is a single line of logic.

Most situations are handled by guard clauses, exhaustive type switching, or switching on logical cases.Veryoccasionally do I have to break this practice for a legitimate reason, but this keeps functions simple to read and to reason about.

I occasionally am unable to avoid an else, but I will always avoid else if. It puts too much reliance on remembering how this condition interacts with the one on the if statement above it, which can often be many lines away. I’d infinitely rather use a case statement where all the alternatives are in an easily-understood table with every condition aligned on successive lines.

Occasional vice for me.

Even better, if you're using a language with pattern matching (rust, any ML, haskell), just put it in a single match with the happy case on top, and the error cases below.

To do this, you need to encode the distinction between happy and sad path in your types. That's good for all sorts of other reasons, so it's not really a drawback.

Yes, pattern matching really helps with clarity, especially when there are multiple happy paths, like in a parser

I tend to prefer a style more like:

    if (!...) return 0;

    ...

Because otherwise you can end up with lots of nested conditions which are very confusing to follow, even though they boil down to "do the thing if these are true, otherwise don't do the thing".

In functional languages like elixir there is no return though, and you are forced to use if-else structure

The advice that I really want to hammer into people's heads is, prefer “sad ifs.”

I think this is the first time I've heard that name for this idea. You'll find more under the term "guard clauses" - it even has its own Wikipedia page:https://en.wikipedia.org/wiki/Guard_(computer_science)

Yeah, same, also know this as guard clauses. But "sad if's" is kind of adorable sounding, so I might start using it.

pessimifs

depressifs?

I at first had the opposite reaction, which is that I spend a ton of time fighting inverted ifs! But my reaction is not incompatible with yours. It's not, but it is related: I HATE reading:

if (!someCondition) { // short amount of code } else { // longer happy path }

The contextual overhead of having to invert a negative (i.e. !!someCondition) is just annoying.

I do agree that if (happy) { /* tons of code/ } else { /I forgot how we got here */ } can also be an issue.

The key to the idea above is that you don't have any "else" clause. It's an early return, raise an exception, or modify the data so it's no longer bad.

Yes, that's why I said I realized they weren't quite the same situation, while adding another example of a situation where people create hard to follow structures.

TIL: another word for "guard clauses" is "sad if's".

I recall avoiding multiple method exits (in Java) like the plague. They were hell to get through in a debugging session. But I guess we don't use debuggers much any more.

When I began writing C# I recall my other team members "refactoring" my code to be shot through with multiple exits instead of a my collecting variable, because "it looked nicer." When they asked me why I wrote it the way I did, I mentioned the Java debugger thing, and they kind of looked at me blankly and said, "Huh, never thought of that."

Times change.

Great insight. many programmers fail to think of their errors valid as values to get as an output. Golang got this right

The author called out the newtype solution specifically in the first paragraph:

it could push the task of precondition checking to its caller, andenforce via types(or an assert) that the precondition holds

(Emphasis mine)

I usually move this up in an ETL process into a function I usually call "Pre_Flight_Checklist." Over the years this has gotten its own helper functions. For example, if I am going to use a file, I have function that not only checks for the existence of the file, but that itisa file. This function can also be fed an expected size range and date if "freshness" is expected. If something is out of whack, an error message will mention that a file was expected at /Adirectory/Bdirectory/Cdirectory/filename.ext, but only /Adirectory/Bdirectory was found, nothing under it. I believe in overly-explanatory error messages.

Now, Idorun the risk of having things change out from under me, but putting all of the defensiveness up front allows the happy path and some alternatives to be more succinct in the code.

I keep finding new things to be paranoid about, though!

As an example I work with - Clojure. Sometimes I use agents. I don't write functions for agents, I write functions for things agents might contain.

I'm not following you. Do you have a better example?

Similar to the example of the OP:

  (def walrus (agent {}))

  ; good
  (defn frobnicate [w]
    (assoc w :frobnicated true))
  (send walrus frobnicate)

  ; bad
  (defn frobnicate [w]
    (send walrus #(assoc % :frobnicated true)))
  (frobnicate walrus)

It might make sense toalsohave a function that executes something like (send walrus frobnicate). But IMHO code should basically never look like the bad example.

Correct me if I am misunderstanding, you are saying the first example would be better if it was `walrus.frobnicate()`? Isn't that a syntax preference more than an issue with the point the author is trying to make?

The rule of thumb is put ifs and fors where they belong -- no higher or lower. And if you're not sure, think about a little more.

I don't think these rules are really that useful. I think this is a better variation: as you write ifs, fors and other control flow logic, consider why you're putting it where you are and whether you should move it to a higher or lower level. You want to think about the levels in terms of the responsibility each has. If you can't think of what the demarcations of responsibility are, or they are tangled, then think about it some more and see if you can clarify, simplify, or organize it better.

OK, that's not a simple rule of thumb, but at least you'll be writing code with some thought behind it.

This reply is akin to "just think about it and do it perfectly dummy" which might sound "obvious" and "smart" but is really just unhelpful commentary. The ideas provided in this blog are actually a good heuristic for improving code in general, even if they don't apply in all cases. Simple and practical rules of thumb can go a long way for those without enough experience yet to have internalized these kinds of rules.

Our job is to think about problems and solve them, I'm in completely agreement with GP

If you were going to generalize your hard-won knowledge and pass it on to a junior, what would you say?

"Draw the rest of the fucking owl"[1]

1:https://knowyourmeme.com/memes/how-to-draw-an-owl

There isn't an easy shortcut. Experts exist for reasons. Just like Doctors and other specialists, find a good expert you trust and evaluate their evaluation of the topic.

Doctors rely on an enormous number of heuristics and shortcuts exactly like this.

My wife teaches doctors, and so much of what she does is giving them rules of thumb much like this one.

edit: I want to note that I'm pretty ambivalent on the actual advice of the article, just commenting that doctors in my experience have a truly astounding collection of rules of thumb

An expert in a field tends to be an expert of the rules of thumb, the references to consult when the rule of thumb doesn't deliver the desired results, and the nuances and exceptions that are the art part of 'science and useful arts' for their field.

Trying to help too much, and give shortcuts to people who cannot otherwise evaluate when these shouldn't be applied has ill effects.

On the same note, I find the reasonning behind the advice way more interesting than the advice itself. It gives a good framing of the different issues to consider when writing conditional and iterative code.

Basically it should help newcomers to identify the issues they're facing and build their own personal heuristics.

A lot of replies in this tree read to me as "don't concisely express a general heuristic principle, because someone might read it and apply it without nuance".

This way of thinking is infantilizing. It is also self-defeating, because it is itself a failure to apply with nuance a general heuristic principle (don't oversimplify). It's doing what it tells you not to do.

Heuristics are only heuristics, but you have to articulatewhat they arebefore you can append "and this is only a heuristic, don't apply it blindly and universally".

Appending "everything depends on context" is the trivial part. The heuristics are the substantial contribution.

I see your side, but take it the other way: the basic message is really "you'll have to think for yourself anyway, heuristics from random strangers won't help", which I don't see as infantilizing.

I see post mortems and issue discussions on public projects as a better resource and contribution than sharing personal heuristics.

Better phrasing might be, this is a really good point, and while you can't rely on it alone, the more of these you learn, the better of a programmer you'll be.

I don't agree it is unhelpful commentary. I think it makes an important point that just following some rules you found won't be enough because the right answer isn't a result of something arbitrary/universal but is a side effect of a better understanding of the responsibility of each piece of code you're writing.

I think you’ll have better code on average if you always apply these rules than if you don’t.

It’s not always helpful to give people the option to do it as they think is best.

than if you don’t

If by “don’t” you mean “do the opposite” then I agree. The third option is “don’t follow the rule blindly but think about the situation”, and for that case it depends entirely on the person and how well they think.

I do think it is shortsighted, and equally helpful and harmful. Helpful in that it reminds the reader this is only one of many considerations that should be taken into account. Harmful because it borders on implying that this shouldn't even be a consideration taken into account because it is irrelevant.

In reality these ‘rules of thumb’ become dogma and people forget why the rules of thumb existed in the first place, or make up other bullshit rules to justify it.

This industry is full of dogma and self-appointed experts and it’s utterly tedious.

Exactly -- write code that matches clear, intuitive, logical, coherent organization.

Because easy counterexamples to both of these rules are:

1) I'd much rather have a function check a condition in a single place, than have 20 places in the code which check the same condition before calling it -- the wholepointof functions is to encapsulate repeated code to reduce bugs

2) I'd often much rather leave the loop to the calling code rather than put it inside a function, because in different parts of the code I'll want to loop over the items only to a certain point, or show a progress bar, or start from the middle, or whatever

Both of the "rules of thumb" in the article seem to be motivated by increasing performance by removing the overhead associated with calling a function. But one of thetop"rules of thumb" in coding is tonot prematurely optimize.

If you need to squeeze every bit of speed out of your code, then these might be good techniques to apply where needed (it especially depends on the language and interpreted vs. compiled). But these arenot at allrules of thumb in general.

I think a key thing software engineers have to deal with opposed to physical engineers is an ever changing set of requirements.

Because of this we optimize for different trade-offs in our codebase. Some projects need it, and you see them dropping down to handwritten SIMD assembly for example.

But for the most of us the major concern is making changes, updates, and new features. Being able to come back and make changes again later for those ever changing requirements.

A bridge engineer is never going to build abstractions and redundencies on a bridge "just in case gravity changes in the future". They "drop down to assembly" for this and make assumptions that _would_ cause major problems later if things do change (they wont).

I guess my point is: optimizing code can mean multiple things. Some people want to carve out of marble - it lasts longer, but is harder to work with. Some people want to carve out of clay - its easier to change, but its not as durable.

Whether marble or clay, both ideally take into consideration the fact that he/she who write it today may not be he/she who maintains it tomorrow.

When stuck between longevity v less durable, maintainability should be the deciding factor.

Part of my point though was that the bridge builder of today does not need to take into consideration that the person maintaining it 20 years from now will have to deal with gravity changing. So they can make certain assumptions that will be impossible for future maintainers to ever undo.

Software doesn't have these set-in-stone never-changing requirements. I think we are making similar points.

I've been impressed watching the crews meticulously replace each cable assembly of the George Washington Bridge over the past year or so. All the work that doesn't require disconnected cables is done in parallel, so you can get a sense of the whole process just driving across once (they've finished the north side so you want to drive into Manhattan for the current view).

It's more or less the same as code migrations we're doing on a regular basis, done far more diligently.

Conditions inside the function are also in line with Postel's law, if we drag it from networking to API design. And in large parts of programming the entire "enforce it with types" (say null check without saying null check) isn't a thing at all. It only gets worse with languages where api evolution and compatibility is done by type-sniffing arguments. Those will just laugh at the idea of pushing an if up.

But it's an interesting discussion nonetheless. What I picked up, even if it wasn't directly mentioned (or I might have missed it?), is that a simple check on the caller side can be nice for the reader. Almost zero cost reading at the call site because the branch is a short one, and chances are the check provides some context that helps to understand what the call is all about:

   if(x is Walrus) frobnicate(x);

is not just control flow, it doubles as a friendly reminder that frobnication is that thing you do with Walrusses. So my takeaway is the check stays in the function (I also don't agree with the article), but make it a part of the naming consideration. Perhaps "frobnicateIfWalrus" wouldn't be so bad at all! I already do that occasionally, but perhaps it could happen more often?

1) I'd much rather have a function check a condition in a single place, than have 20 places in the code which check the same condition before calling it -- the whole point of functions is to encapsulate repeated code to reduce bugs

That's fine, but it's often a good idea to separate "do some work on a thing" and "maybe do work if we have a thing". Using the example in the article, it issometimesuseful to have multiple functions for those cases:

    fn frobnicate(walrus: Walrus) {
        ...
    }

    fn maybe_frobnicate(walrus: Option<Walrus>) {
        walrus.map(frobnicate)
    }

But also… in languages like Rust, most of the time that second one isn't needed because of things like Option::map.

I think the argument here could be stated sort of as push "type" ifs up, and "state" ifs down. If you're in rust you can do this more by representing state in the type (additionally helping to make incorrect states unrepresentable) and then storing your objects by type.

I have a feeling this guide is written for high performance, while it's true that premature optimization is the devil, I think following this sort of advice can prevent you from suffering a death from a thousand cuts.

A good rule of thumb is validate early and return early. Prevents endless if else nesting

Yes, this advice has the scent of premature optimization with the tradeoff sacrifice being readability/traceability.

You also want to avoid branches in loops for faster code. But there is a tradeoff between readability and optimization that needs to be understood.

OK, that's not a simple rule of thumb, but at least you'll be writing code with some thought behind it.

This reflects one of my answers to the question: What separates an engineer from a developer?

Engineers are intentional, or at least hope to be as often as possible. On the other hand, developers may arrive at similar or same ends but they're generally more reactive and less intentional.

If you know where they belong, this post isn't for you.

The more experience I get, the more I feel that too many programmers worry about making things pretty "in the small", but not enough care about proper design of an entire codebase.

I'll admit that I like a concise and well-crafted function just as much as many others, but articles like this one are probably the things that can lead to the kind of unproductive bikeshedding that is sometimes experienced in PRs and other discussions. I don't care that much about whether your function is messy - or about where you put your ifs and fors (or if you use maps and filters instead), as long as the function is properly named, has a good interface (including expressive types), a clear purpose, is properly documented, doesn't make excessive use of side effects etc.

The advice about if up is not bikeshedding though, it is the exact kind of architectural choice you're saying one should decide on. Don't believe me ? Well imagine you have inputs, where should you validate them ? According to this rule of thumb it's at the topmost level, when they are received. Well that seems super sensible, and it's typically something that helps with understanding the code (rather than checking them at thw very last moment). Also for proofs that's technically necessary to allow the preconditions to "percolate up", which has the same effect of moving the if up.

So the first advice is definitely not bike shedding, the second one I'm not so clear though ;)

But the article wasn't about "validate inputs at the boundaries" (which is a semantic concern), it was about "push your ifs up" (which is a syntactic concern).

I agree that in the provided example, those two seem to somewhat coincide (although it's hard to say, given that the author makes an effort to give their functions names such as "frobnicate" that don't indicate their purpose), but in the general case that doesn't have to be true.

"If" is semantic!

Its location in the codebase isn't.

If it changes the behavior of the program it is.

Well, let's consider the three examples the author gives for "pushing ifs up":

The first one isn't even a proper example because they write "push the if to the caller", but doesn't actually show the caller. So the good and bad examples aren't even straightforwardly comparable. For the second example, calling g() has exactly the same behaviour as calling f(), and for the third example, the behaviour also remains identical with or without the enum.

I don't see how this is any more than just pushing code aroundwithout having more context about these methods and who calls them.

Are we reading the same article? There is zero concern for syntax.

The first point is literally what you said. Write functions with clear interfaces. If your function doesn't handle None, make its argument type reflect that (Walrus instead of Option<Walrus>).

The second point is about performance. Hoist branches out of loops, and process batches so any constant cost is amortized over the batch. Is that even controversial?

the author makes an effort to give their functions names such as "frobnicate"

Yes, and that's good, because it keeps the focus on his actual points rather than the minutiae of a contrived example.

Are we reading the same article? There is zero concern for syntax.

By "syntactical" I mean "concerned with the surface level appearance of low-level code structure".

Yes, and that's good, because it keeps the focus on his actual points rather than the minutiae of a contrived example.

Only if you think that guidelines about how to write code should be devoid of the context of the problem you're trying to solve. I happen to think that this is mostly wrong.

it is the exact kind of architectural choice you're saying one should decide on.

While I can't speak to what the OP had in mind, architectural concerns are definitely not inside a function. Even connecting individual functions/procedures barely registers at the smallest, most granular scale of architecture.

Of course, our programming languages for the most part don't let us express architectural concerns...so here we are.

Yes, this is what I was talking about. Debating which function should include an "if" condition seems rather immaterial to me.

The more experience I get, the more I feel that too many programmers worry about making things pretty "in the small", but not enough care about proper design of an entire codebase.

I've seen the reverse a lot, too. People who enjoy designing over-engineered cathedrals and can't be bothered to think about the low-level efficiency of their algorithms.

In general, I feel like too many developer get caught up in dogma, "best-practices" that are effectively often unrealistic, and tend to adopt extremist way of thinking about the big and / or the small picture. A lot of the "dev culture" feel like it lacks a sense of pragmatism.

People who enjoy designing over-engineered cathedrals and can't be bothered to think about the low-level efficiency of their algorithms.

When that's deliberate, I'd say it's good.

Quite often we need to optimize for maintenance. Or for high churn in developers. For readability. Adaptability. And so on. Each optimization is a trade-off. It's not always "performance vs readability", but quite often it is.

If we spend time and effort on the low-level efficiency of algorithms, AND that comes at the cost of readability, adaptability, maintainability or any other -ility, we should very deliberate make a choice: does performance matter more than X. The answer is suprisingly often "no". And it's one of the many reasons why today's software is slow, bloated, battery-eating and unstable.

Well, the times when that choice is not made deliberately are probably more to blame.

I think this tells more about company culture than the people themselves, some companies like small things, and others like cathedrals.

"Over-engineered cathedrals" aren't what I mean by "good, proper design".

I hate huge, messy meta-frameworks where everything happens implicitly and you can debug nothing. But I hate piles of ad-hoc code where everything calls everything and you can't change a single thing without breaking 2000 other things just as well. You usually don't need complicated code constructs to organise your code well - but you do need tothink.

Designing maintainable code is a difficult skill and one that involves careful thinking about tradeoffs. Unfortunately, mostly we as an industry never seem to be able to have time for it.

Poor choice of words in TFA and a cursory read by a reader generated this misunderstanding. "Up" means "to the caller" for the author, but was probably understood as "towards the beginning (of the function or file)".

From the title i would have guessed "Use more ifs and less fors" and it didnt make sense at the beginning

No, I did understand the point the author was making. I just think it's a weak one.

I work in security, and get to interact with a lot of area tech lead and architect types (L7-L9 in FAANG terms). I have found that the people who most concern themselves with “proper design” are generally about 5-10 years into their career and trying to prevent juniors from shooting themselves in the foot, but don’t yet have a proper appreciation for the cost of complexity and how code changes over the long run. Conversely, after you’ve been doing this for 20+ years you learn to value simplicity over most other technical factors.

Questions like “how early do we branch” and “what is the actual job this has to do” end up being the ones whose answers tend to have the most long-term value. Questions about abstractions and encapsulation tend to lead to the type of argument you describe. And the “the big picture” people are the ones with the most security issues in their code, because they don’t really understand what the properly architected codebase is really doing.

This resonates with my experience. Apparently now the "inversion of control", "dependency injection" patterns are in vogue. So my Sr. Developers are building new code with all that crap. So much that when you want to make a simple change, you have to hunt down 20 files of `xxxCreator`s, `xxxHandler`s and `xxxProvider`s that are glorified functions invoked by `.execute` or `.call`.

And this is in TypeScript ... we were laughing about that kind of complexity in Java 20 years ago... in Slashdot.

Questions like [...] “what is the actual job this has to do” end up being the ones whose answers tend to have the most long-term value.

How is that not "big picture", proper design and abstraction/encapsulation?

You seem to be arguing against a strawman. I didn't write "complexity". But if you want to fix security issues in your database access patterns, you better not have SQL distributed across your whole codebase.

but articles like this one are probably the things that can lead to the kind of unproductive bikeshedding that is sometimes experienced in PRs and other discussions

I'd counterargue that if these trivial principles are established beforehand, unproductive bikeshedding of this type will happen less on PRs and other discussions.

as long as the function is properly named, has a good interface (including expressive types), a clear purpose, is properly documented, doesn't make excessive use of side effects etc.

In performance-sensitive software where DOD thrives, more care needs to be put into these "in the small" things because that is how compiler optimization works. Then, it changes the rule of the game: statement semantics become crucial, self-documenting code makes more sense, in-code comments are for reasoning exclusively, and "documentation" becomes "specification and user manual".

Like there's the stuff that can be refactored because the changes are local, or the places to change are clear.

And then the stuff that is a lot harder to refactor because you would need to change code in a lot of places, and the places that need to be changed aren't clear. That's probably more the proper design of a codebase stuff.

To go along the same line of thought, too many think about the beatuy of the flower, instead of the whole florest.

The amount of hours occasionally wasted discussing this in PR, that add zero value to what the customer actually cares, e.g. does pressing the button print or not their monthly invoice.

Sure there is a balance to be had between completly unmaintainable code, and a piece of art to expose in the Louvre, however too many focus on the latter, instead of what is the customer getting out of their beautiful flower.

... but not enough care about proper design of an entire codebase.

One reason is that we don't have ways of meaningfully expressing components much larger than functions or that connect differently from functions in our programming languages. At most we can aggregate functions together. After that we're on our own.

Programming language design also tends to focus on the small (e.g. optimizing the number of language keywords).

I'm not convinced that such general rules can really apply to real-world code. I often see this kind of rules as ill-placed dogmas, because sadly even if this particular blog post start by saying these arerule of thumbsthey're not always taken this way by young programmers. A few weeks ago YouTube was constantly pushing to me a video called "I'm anever-nester" apparently of someone arguing that one shouldnevernest ifs, which is, well, kind of ridiculous. Anyway, back at the specific advice from this post, for example, take this code from the article:

    // GOOD
    if condition {
      for walrus in walruses {
        walrus.frobnicate()
      }
    } else {
      for walrus in walruses {
        walrus.transmogrify()
      }
    }
    
    // BAD
    for walrus in walruses {
      if condition {
        walrus.frobnicate()
      } else {
        walrus.transmogrify()
      }
    }

In most cases where code is written in the "BAD"-labeled way, the `condition` part will depend on `walrus` and thus the `if` cannot actually be pushed up because if it can then it is quite obvious to anyone that you will be re-evaluating the same expression — the condition — over and over in the loop, and programmers have a natural tendency to avoid that. But junior programmers or students reading dogmatic-like wise-sounding rules may produce worse code to strictly follow these kind of advices.

Re: 'never-nesting', I'm not especially dogmatic, but I've never empirically seen a situation where this:

  match (condition_a, condition_b){
   (true, true) => fn_a()
   (true, false) => fn_b()
   (false, true) => fn_c()
   (false, false) => fn_d()
  }

isn't preferable to this:

  if condition_a {
     if condition_b {
       fn_a()
     } else {
       fn_b()
     }
  else if condition_b {
     fn_c()
  } else {
     fn_d()
  }

(Assuming the syntax is available)

I think the lines gets blurred when doing early exits and guard conditions.

For instance

  if !condition_a && !conditon_b {
    return fn_d() // probably an error condition ?
  }

  if condition_a && condition_b {
    return fn_a()
  }

  if condition_a {
   fn_b()
  } else {
   fn_c()
  }

These 3 examples optimize to the the same thing because they generate identical instructions:

https://godbolt.org/z/a1Yq9rceE

Note: Inverting and rearranging conditions changes what LLVM decides to output, sometimes for the worse. --opt-level=s is counterproductive here.

Oh no, hard disagree. The match implementation is far easier to reason about. I can see at a glance that if both condition_a and condition_b are true, call fn_a(). For the nested if version I have to trace each expression by hand.

They said "isn't", not "is". Double negative. You two agree.

I think I would like the former syntax. Witness one of the several nested matching situations I run into:

https://github.com/titzer/wizard-engine/blob/master/src/engi...

This would be much, much better if Virgil had pattern matching on tuples of ADTs.

I really prefer the flat variant cause it helps me ensure exhaustiveness. In the end people probably trained their brain to read the nested variant to the point it's transparent to their neurons.

I think this is actually a great example of why `if` should be "pushed up." The objective of the code is to perform an particular operation on the walrus, given a condition. The is actually ifs being instead of polymorphism and a type system. Why does the walrus have these two functions, which must be called in different scenarios? Why not have a single function, and two types, and the correct type is passed in? Even given the current structure, we could accomplish this:

    // There are better ways to accomplish this, depending on the language
    func frobnicate(walrus) {
        return walrus.frobnicate();
    }

    func transmogrify(walrus) {
        return walrus.transmogrify();
    }

    // Somewhere high in the code
    if condition {
        processing_function = frobnicate
    } else {
        processing_function = transmogrify
    }


    // Somewhere else in the code
    for walrus in walruses {
        processing_function()
    }

If the decisions are made as early as possible, they do not need to be littered throughout the code. The guts of the code can run without branches, performing the same operations every time, the output only modified through the construction graph.

Of course, this is not a new idea:https://www.youtube.com/watch?v=4F72VULWFvc

It was old 15 years ago.

That pattern is not zero-cost: It breaks inlining and other compile-time optimizations. It also can hurt maintainability and readability. If the type of processing_function() is significantly generic, figuring out what is being called in the loop can be hard (as in undecidable; it is equivalent to the halting problem).

In an extreme case that I've seen in real code, processing_function() is called send(t: Object) -> Object or recv(u: Object), and all non-trivial control flow in the system is laundered through those two function names.

So, you have 100's or 1000's of send, recv pairs, and grep (or even a fancy IDE) can't match them up.

For a non-trivial codebase, this style is incredibly valuable for readability. With it, decisions about execution are made in a single place, rather than distributed throughout the code. If the conditionals are embedded deeply in the call graph, sure a reader may be able to understand a single function when looking at it, but they won't know the implications of that function or be able to reason about the program as a whole. I've seen a huge class of bugs that are eliminated by properly using polymorphism rather than embedded conditionals.

That being said, yeah, its possible to just layer unnecessary abstraction and way over complicate things, making the layers utterly incomprehensible. But that seems to be the result of a misunderstanding- the point here isn't to _abstract_, it's to properly model polymorphic behavior. Any abstraction that takes place to do that is suspicious at best.

I think the performance aspect is negligible for 98% of programs. The compiler may or may not be able to inline the underlying function, depends on the compiler. Depends on the optimizations. And in almost all cases, if you're relying on compiler optimizations to fix your performance problems, you're looking in the wrong place. Not to say such cases don't exist, and there's a slew of people who work in those areas, but I trust they know their business well enough to evaluate the trade offs here and make the right decisions.

But junior programmers or students reading dogmatic-like wise-sounding rules may produce worse code to strictly follow these kind of advices.

Yes.

I for one am happy such articles exist. Thus one articulated something I've run into quite a few times without being able to fully articulate the issues. It seems like a nice mental model to have in my back pocket.

That said, I appreciate comments like your because I hope the dogmatic junior dev comes across it and hopefully becomes a little more nuanced.

Thanks! You are exactly the target audience for this article! It really is only useful when:

* you already have the intuitive understanding of problems being described

* but you haven’t quite yet verbalized the problem and connected the dots

This clicked for me yesterday, happy that it helped you to articulate what you’ve already knew today!

I’m close to giving up on dogmatic junior devs. They don’t ever seem to give up their dogma, even in the face of direct damage caused by their dogma. The closest I’ve seen is a dogmatic junior dev saying they understand and will relax on the dogma, but turning around and enforcing it when they think you aren’t looking.

I’ve seen dogmatic junior devs turn into dogmatic senior devs, but I’ve never seen a dogmatic junior dev turn into a pragmatic senior dev.

saying these are rule of thumbs they're not always taken this way by young programmers.

The important thing to learn about all “rules of thumb” and “best practices” is the WHY behind them. Programmers especially should not copy/paste these things and apply them by rote.

RoT and BP blindly applied may well not be a good idea. As with everything “it depends”.

That was my same thought. This example would only work in specific circumstances, so why present it as a rule,

given it looks state mutating, I'd write this something like

    walruses.apply(condition ? frobnicate : transmorfify)

where apply does the looping.

Agree. Also, most of the time, the form that iseasier to modifyis preferred, and even if `condition` does notcurrentlydepend on `walrus`, it is preferable for it to be easy to make it depend on `walrus` in the future.

The GOOD refactor would only work if the condition didn't depend on `walrus` and helps to make that fact explicit. If you apply "push fors down" again you end up with:

  if condition {
    frobnicate_batch(walruses)
  } else {
    transmogrify_batch(walruses)
  }

Pushing "ifs" up has the downside that the preconditions and postconditions are no longer directly visible in the definition of a function, and must then be checked at each call site. In bigger projects with multiple contributors, such functions could end up getting reused outside their intended context. The result is bugs.

One solution is some kind of contract framework, but then you end up rewriting the conditions twice, once in the contract and once in the code. The same is true with dependent types.

One idea I haven't seen before is the idea of tagging regions of code as being part of some particular context, and defining functions that can only be called from that context.

Hypothetically in Python you could write:

  @requires_context("VALIDATED_XY")
  def do_something(x, y):
      ...

  @contextmanager
  def validated_xy(x, y):
      if abs(x) < 1 and abs(y) < 1:
          with context("VALIDATED_XY"):
              yield x, y
      else:
          raise ValueError("out of bounds")

 with validated_xy(0.5, 0.5) as x_safe, y_safe:
      do_something(x_safe, y_safe)

  # Error!
  do_something(0.5, 0.5)

The language runtime has no knowledge of what the context actually means, but with appropriate tools (and testing), we could design our programs to only establish the desired context when a certain condition is met.

You could enforce this at the type level in a language like Haskell using something like the identity monad.

But even if it's not enforced at the type level, it could be an interesting way to protect "unsafe" regions of code.

Pushing "ifs" up has the downside that the preconditions and postconditions are no longer directly visible in the definition of a function

You're missing the second part of the author's argument:

"or it could push the task of precondition checking to its caller,and enforce via types"

The precondition is therefore still directly visible in the function definition - just as part of the type signature rather than in an if statement.

The "enforce preconditions via types" is a common pattern in Rust (the language used in the article), and unlike checking with if statements, it's a strict precondition that is checked at compile time rather than at runtime and you won't even be able to compile your program if you don't meet the pre-condition.

The "enforce preconditions via types" is a common pattern in Rust

Called the Type State Pattern. Here are two links from my bookmarks:

https://cliffle.com/blog/rust-typestate/

https://yoric.github.io/post/rust-typestate/

Definitely, that's a pattern in many programming languages (including, increasingly, in Python). And in dependently-typed languages, you actually can obtain a proof of validity that gets erased at runtime, leaving nothing but basic unboxed data to be executed.

However that's not always an option in all languages or situations, and it's hard to encode conditions like "must only be called while an event loop is running" that way.

I'm envisioning this partly as kind of a catch-all/fallback mechanism that can either be bolted on a language that doesn't have these kinds of features. But it's not always the case that you can effectively communicate contextual requirements through the types of function parameters, and this would cover that case as well.

If you want to reduce this to anything, it's a workaround for not having monads in your language, with some tradeoffs around composability and dynamic extent.

"or it could push the task of precondition checking to its caller, and enforce via types"

Or the enforcement could be done via a design-by-contract discipline. Either using classical assertions, or something like reusable specifications in Clojure's spec library.

You can use types for this in Python:

  def do_something(position: ValidatedPosition):
    ...

  position = Position(x, y)
  if abs(position.x) > 1 or abs(position.y) > 1:
    raise ValueError("out of bounds")

  position = ValidatedPosition(position.x, position.y)
  
  do_something(position)

In practice I'd probably have ValidatedPosition encapsulate that validation in it's constructor, but you get the idea. If you attempt to pass a Position to do_something then mypy is going to shout at you for passing an object of the wrong type to it.

Python's type checking definitely isn't as comprehensive as something like Rust, but I have found it increasingly useful for situations like this where you want to ensure the data being passed in has been appropriately handled.

Sure, but now you have to rewrite your code to accept a ValidatedPosition as input and not two floats. In Rust that might be "zero cost", but in Python it certainly is not, even if you implement all this in Cython. And even in the context of a Cython/Rust/whatever extension, what if this is calling a BLAS function or some other external procedure that has specific preconditions?

It's usually fine to use the type state pattern as described by sibling commenters, but sometimes it's expensive or infeasible.

Another situation is less related to performance and more related to managing dynamic state. Consider the following annoyance that arises when we try to attach static types to an old-school Python idiom:

  class AppClient:
      def __init__(self, url: str, http_client: httpx.Client | None = None) -> None:
          self.url = url
          self.http_client = httpx.Client() if http_client is None else http_client
          self._auth_token: str | None = None

      def login(self) -> None:
          self._auth_token = ...

      def _post(self, body: dict[str, Any]) -> None:
          self.http_client.post(
              self.url, headers={"Authorizaton": f"Bearer {self._auth_token}"}, json=body
          )

      def submit_foo(self, foo: Foo) -> None:
          self._post(foo.to_dict())

Users will end up with a difficult-to-debug auth failure if they forget to call `login` first. Also, Mypy won't catch this without contorting your code.

We naturally would want to add a test case like this:

  def test_error_if_not_logged_in():
      app_client = AppClient(url="https://api.example.net/v1")
      with pytest.raises(RuntimeError):
          app_client._post({})

Thus we'll need to guardevery single usageof `self._auth_token` with:

  if self._auth_token is None:
      raise RuntimeError(...)

And there are three usual approaches for handling this:

1. Disregard the article's advice and push the "if"down, into the _post method, to ensure that we only actually need to write that guard condition once.

2. Write a _check_logged_in method that encapsulates the if/raise logic, use that to at least make it easier to apply the guard where needed.

3. Refactor the internals to use something akin to the type state pattern, which is a funky and cool idea that I support. But it could easily end up turning into an "enterprise fizzbuzz" exercise where your little client library expands into a type-driven onion that obfuscates the business logic.

I'm proposing a 4th way:

4. Declare that _post may only be called from inside a specific dynamic context, and ensure that that dynamic context is only established by login().

Is it better? I'm not sure. But I've never seen it before, and it's kind of appealing as I continue to think about it.

Something I've seen a few times in Go and java: Make login() a constructor for the AppClient, or declare what's essentially a factory called "AppClientConfig" with a login() function to create the AppClient.

I've only recently started to think about using factories like that, so I'm very sure there are patterns of dynamic state I wouldn't have a recipe for, but it's a prety appealing idea to me.

Isn't the intention of public vs private supposed to cover the "tagging of code for a particular context"?

Or maybe we need more specific semantics around this that can cross cut the domain that public and private (and protected in .NET ecosystems) cover?

Sure, this could definitely be a generalization of "public" and "private".

In bigger projects with multiple contributors, such functions could end up getting reused outside their intended context. The result is bugs.

Yes but in this particular example `fn frobnicate(walrus: Walrus)` if you pass here anything other then owned Walrus then program would not compile. Even if it was something generic passing the arg would have to satisfy trait bounds. Definition of those bounds in function definition would be required by compiler based on how the argument will be used inside function.

Branded types in typescript are an interesting way to do this.

The “push ifs up” advice is not about checking preconditions, it’s about selecting the right code path. If you have a function with a precondition, by all means assert it at the beginning of that function. For example, the Java type system allows nulls, so a function that requires an object should throw an exception if a null is passed.

Each call site is absolutely responsible for ensuring that it only calls a function if its precondition is true. This follows from the definition of precondition. Nothing can change that.

Calling a function in violation of its precondition is a bug (in the caller). And, sure, functions often need code that checks that (to prevent undefined behaviour). But we should distinguish these assertions from our program’s actual control flow. The article is about the latter.

I wouldn't quite say this isbadadvice, but it isn't necessarily good advice either.

I think it's somewhat telling that the chosen language is Rust. The strong type system prevents a lot of defensive programming required in other languages. A C programmer who doesn't check the validity of pointers passed to functions and subsequently causes a NULL dereference is not a C programmer I want on my team. So at least some `if`s should definitely be down (preferably in a way where errors bubble up well).

I feel less strongly about `for`s, but the fact that array arguments decay to pointers in C also makes me think that iteration should be up, not down. I can reliably know the length of an array in its originating function, but not in a function to which I pass it as an argument.

A C programmer who doesn't check the validity of pointers passed to functions and subsequently causes a NULL dereference is not a C programmer I want on my team.

I disagree. Interfaces in C need to carefully document their expectations and do exactly that amount of checking, not more. Documentation should replace a strong type system, not runtime checks. Code filled with NULL checks and other defensive maneuvers is far less readable. You could argue for more defensive checking at a library boundary, and this is exactly what the article pushes for: push these checks up.

Security-critical code may be different, but in most cases an accidental NULL dereference is fine and will be caught by tests, sanitizers, or fuzzing.

I agree with that. If a function "can't" be called with a null pointer, but is, that's a very interesting bug that should expose itself as quickly as possible. It is likely hiding a different and more difficult to detect bug.

Checking for null in every function is a pattern you get into when the codebase violates so many internal invariants so regularly that it can't function without the null checks. But this is hiding careless design and implementation, which is going to be an even bigger problem to grapple with than random crashes as the codebase evolves.

Ultimately, if your problem today is that your program crashes, your problem tomorrow will be that it returns incorrect results. What's easier for your monitoring system to detect, a crashed program, or days of returning the wrong answer 1% of the time? The latter is really scary, depending on the program is supposed to do. Charge the wrong credit card, grant access when something should be private, etc. Those have much worse consequences than downtime. (Of course, crashing on user data is a denial of service attack, so you can't really do either. To really win the programming game, you have to return correct results AND not crash all the time.)

Yeah but, not checking for null in C can cause undefined behavior. One possible outcome of undefined behavior is that your program doesn't even crash, but rather continues running in a weird state. So such a bug doesn't always "expose itself".

If we accept that bugs are inevitable, and that accidentally passing a null pointer to a function is a possible bug, then we also conclude that your code really should include non-null assertions that intentionally abort() the program. (Which run in debug/staging mode but can be disabled in release/production mode.)

That raises a more general point. When you can't or don't have compile-time checks, removing run-time checks in production amounts to wearing your seat belt only when driving around a parking lot and then unbuckling when you get on the highway. It's very much the Wrong Thing.

I wouldn't really characterize it that way. You (ideally) shouldn't be hitting code paths in production that you didn't ever hit in testing.

But, in any case, if you are fine with the slight performance hit (though many C/C++ projects are not), you can always just keep assertions enabled in production.

Indeed, Rust's own standard library uses this method. There are lots of public-facing unsafe functions that can result in undefined behavior if called incorrectly. But if the standard library is compiled in debug mode (which currently requires the unstable flag -Zbuild-std), then it will activate assertions on many of these unsafe functions, so that they will print a message and abort the program if they detect invalid input.

The Rust compiler has even started recently to put extra checks on unsafe code in codegen, e.g. on raw pointer dereference to check it is aligned.

Very good point. For C, I like the idea of sticking an assertion in there.

assert_always(ptr != nullptr);

(custom assert_always macro, so it doesn't get compiled out in release builds)

In C a NULL is often a valid non-dereferencable pointer value. If you're checking for invalid pointer values you need to check for any of the 0xfffffffffffffffe possible invalid values. If all you're doing is checking for NULL you're not checking for invalid values at all.

If the precondition of your function is "parameter p can not be NULL" that's fine, go ahead and check for it. If the precondition is "parameter p must be a valid pointer" well then, good luck to you finding the appropriate assertion condition.

My rule of thumb is: if the type system doesn't prevent an invalid value, it's your responsibility to prevent it at run-time.

I've been writing a lot of T-SQL lately, which doesn't let you declare a parameter or variable as NOT NULL. So it's a good idea to check for NULLs as early as reasonable - usually at the top of the stored procedure (for parameters). Otherwise, a NULL might propagate unexpectedly deep into the call hierarchy and cause less-than-obvious problems.

Fortunately, the data in the table can be declared as NOT NULL, so these kinds of bugs will usually not corrupt the data, but catching them as early as possible makes life easier. However, if there is piece of logic that writes something to the database depending on the value of some parameter, and that parameter is unexpectedly NULL, that might lead to a wrong thing being written, or a necessary thing not being written at all, effectively corrupting the data.

So, defensive programming all the way, baby!

Don't compilers, cpu branch prediction, etc all fix the performance issues behind the scenes for the most part?

Function call overhead can be a real issue in languages like Python and JavaScript. But you can or should measure when in doubt!

It's a real issue in most compiled languages too if you're not careful (also a sort of opposite issue; too few functions causing unnecessary bloat and also killing performance).

No, compilers (correctly) prefer correctness over speed, so they can optimise “obvious” things, but they cannot account for domain knowledge or inefficiencies further apart, or that “might” alter some global state, so they can only make optimisations where they can be very sure there’s no side effects, because they have to err on the side of caution.

They will only give you micro optimisations which could cumulatively speed up sometimes but the burden of wholistic program efficiency is still very much on the programmer.

If you’re emptying the swimming pool using only a glass, the compiler will optimise the glass size, and your arm movements, but it won’t optimise “if you’re emptying the correct pool” or “if you should be using a pump instead” - a correct answer to the latter two could be 100,000 times more efficient than the earlier two, which a compiler could answer.

The short answer is absolutely not, even when you are sure that it should. Even something as simple as a naive byteswap function might wind up generating surprisingly suboptimal code depending on the compiler. If you really want to be sure, you're just going to have to check. (And if you want to check, a good tool is, of course, Compiler Explorer.)

This is just a myth promoted by Big Compiler designed to sell you more compiler.

Some of the moves seemed to change what an individual function might do. For example they suggested pulling an if from a function to the calling function.

Could the compiler figure it out? My gut says maybe; maybe if it started by inlining the callee? But inlining happens based on some heuristics usually, this seems like an unreliable strategy if it would even work at all.

Well compilers are good and dumb at the same time.

There's a pretty famous StackOverflow question/answer about branch prediction failure:https://stackoverflow.com/questions/11227809/why-is-processi...

The idea of hoisting precondition ifs into the caller is terible! Sure there are special cases where it's a good idea (if nothing else it skips a function call) but in the common case you don't want to this.

In a library you want to check preconditions at the external boundary so the actual implementation can proceed knowing there are no dangling pointers, or negative numbers, or whatever the internal assumptions may be. Depending on the caller to do the check defeats the purpose.

Also in many cases you would need to violate encapsulation/abstraction. Consider a stupid case: `bool cache_this (T obj)`. Let the cache manager itself check to see if the object is already there as it can probably touch the object fewer times.

I agree on the `for` case but it's so trivial the article barely talks about it. Basically it's the same as the encapsulation case above.

In a library you want to check preconditions at the external boundary so the actual implementation can proceed knowing there are no dangling pointers, or negative numbers, or whatever the internal assumptions may be. Depending on the caller to do the check defeats the purpose.

I think the idea is to instead address this with a type-safe interface, designed so that the external boundary physically cannot receive invalid input. The caller would then be responsible for its own if statements when constructing the input types from possibly-invalid raw values.

Also in many cases you would need to violate encapsulation/abstraction. Consider a stupid case: `bool cache_this (T obj)`. Let the cache manager itself check to see if the object is already there as it can probably touch the object fewer times.

I don't see the suggestion as encouraging such a thing: the "cache_this" check should only ever be performed when it's known for certain that the user wants to access the cached object, so the entry point of the cache abstraction acts as a kind of boundary that the if statement depends on. And the if statement clearly shouldn't be pushed above its own dependency.

In cases where that doesn't make sense:

   let f = Some(get_a_u16());
   foo(f);

   ... 

   func foo(f: u16) -> u16 {
      match f {
          None => 0,
          Some(f) => f * 1234
      }
   }

I'd expect any reasonable compiler to include enough link time optimization and dead code elimination to compile the whole mess down to a single multiply instruction.

I don't see what you're tying to show with your example? The typical case is, you start with a raw type in the caller (u16), and then you create a derived type (Option<MultipliedU16>) as a result of validation. Also, you'd ideally want to handle overflow errors in the multiplication, especially with something as small as a u16.

(And in case it helps anyone, for a trivial function to be inlined across crates in Rust, either the function must be marked with #[inline], or a special LTO option must be explicitly set in the Cargo profile. So often it's a good idea to mark all public trivial utility functions in a crate with #[inline].)

I think the idea is to instead address this with a type-safe interface, designed so that the external boundary physically cannot receive invalid input.

Ah, an idea that people have broken their picks on for 50 years.

We will pay out bonuses to employees who worked at least 9 months last year. Actually current employees and former employees. I should create a special type for this case and then depend on the caller to construct such types only for qualifying recipients? That’s just “moving the if” back into the caller. Any responsible developer would validate those entries at the interface…and is it complete?

Ths idea is, we wouldn't be "depending on the caller": the constructor itself would indicate failure instead of producing an object if the data is invalid, so the caller can't mess up without completely violating the abstraction.

So we would still be doing the validation "at the interface". It's just that the interface would be split into two separate functions, one checking for qualification and another operating on qualified employees.

Also, an alternative would be to push the for loop down into the interface, so that it enumerates each employee, immediately filters them for qualification, and then performs whatever other work is needed for paying out. (It could even include a caller hook to perform additional filtering before paying out.)

I think there is a spirit to this. I.e. is it an "if" in spirit.

Preconditions are not really branches. They are usually `if (bad) throw;` type of thing. They could be replaced with `assert(!bad);`.

A branch would be a function like add_todo_or_calendaritem(is_todo: bool, ...) which would need to branch on is_todo.

A similar example of this is OpenAI's API calls which don't do response validation when you do function calling. Essentially, validating the response against the given function(s) is left to the user, leading to various implementations that just make the code noisy.

As an alternative, OpenAI could just make sure the true function call is run (first, validate that the response is a JSON, then make sure it's a valid JSON against the provided JSON schema) in n tries, after which the API raises an error or returns None.

Modern compilers and branch predictors means this doesn't matter 99.9% of the time.

If you take the same branch every time 100 times in a row, the processor will optimize the cost of the branch away almost entirely.

If the branch condition is not volatile, compilers will usually lift it.

It's not just the direct cost of a branch, it's the downstream costs as well. Removing (or automatically folding branches in a compiler) can lead to optimizations after the branch.

E.g.

    if (foo > 0) {
      x = 3;
    } else {
      x = 7;
    }
    return x * 9;

If the compiler (or programmer) knows foo is greater than zero (even if we don't know what it actually is), then the whole thing folds into:

    return 27;

That also means that foo is not even used, so it might get dead-code eliminated.

If that gets inlined, then the optimizations just keep stacking up.

(not that the article was about that, it's just one implication of removing branches: downstream code can be optimized knowing more).

So, in summary, compilers matter.

That code shouldn't generate any branching (unless on an odd cpu). A serious compiler should recognise it can be done with a conditional move.

This is mostly true, but sometimes the cost of evaluating the condition itself is non-trivial. For example, if a and b are complex objects, even something as trivial as `if (a.equals(b)) ...` might take a relatively long time if the compiler/runtime can't prove a and b won't be modified between calls. In the worst case, a and b only differ in the last field checked by the equality method, and contain giant collections of some sort that must be iterated recursively to check equality.

Without a proper context, this is fairly strange, and possibly even bad advice.

For loops and if statements are both control flow operations, so some of the arguments in the article make little sense. The strongest argument seems to be about performance, but that should typically be one of the latest concerns, especially for rule-of-thumb advice.

Unfortunately, the author has managed to create a catchphrase out of it. Let's hope that doesn't catch on.

Unfortunately, the author has managed to create a catchphrase out of it. Let's hope that doesn't catch on.

In you next pull request: "Hey can you push this if up?" :D.

The strongest argument seems to be about performance

It may be an argument, but it's not a strong one. If the improved code can be written like the author puts it in their example (see below), the condition is constant over the runtime of the loop. So unless you evaluate an expensive condition every time, you are good. Branch prediction will have your back. If condition is just a boolean expression using const values, I'd even guess the compiler will figure it out.

  if condition {
    for walrus in walruses {
      walrus.frobnicate()
    }
  } else {
    for walrus in walruses {
      walrus.transmogrify()
    }
  }

Branch prediction should have you covered here. If you can easily rewrite it in

    try
        let’s hope
    catch
        not on

This advice falls flat in case of validations. If a function is given some input and is supposed to work with it, how can it avoid if/else and how can we move this logic one level up to the caller to ask the caller to verify every parameter before calling the function?

And if we keep pushing (thus pending the decision making) up, wouldn't the top most function become a lot more complicated having a lot more logic pushed up from far down below?

That's bad and impractical advice but now will pollute many pull requests with needless arguments.

No, the advice is good. If the wrong function was called, or a function is called with the wrong input, it’s just a bug and no amount of ‘validations’ can fix it. The article is written in the context of Rust, which has a type system, so the compiler can help check.

In cases where a function’s precondition can’t be expressed in the type system, the function should check at the start and bale. For example, it’s reasonable in Java to check if parameters are null (since the compiler cannot do this).

For more on this, Google “parse, don’t validate”.

If a function is given some input and is supposed to work with it, how can it avoid if/else and how can we move this logic one level up to the caller to ask the caller to verify every parameter before calling the function?

The usual way in idiomatic Rust would be to use type safety for this purpose: have the function accept special types for its input, and provide the caller secondary interfaces to construct these types. The constructors would then be responsible for inspecting and rejecting invalid input. This way, the caller can continue pushing the construction, and thus the if/else statements for validation errors, upward to the ultimate source of the possibly-invalid values.

(This is also possible in C/C++/Java/C#/..., if not so idiomatic.)

I'm surprised by how often programmers coming from software engineering background do this wrong. I started programming in science and there it's absolutely necessary to think about this stuff. Doing for loops in a wrong order can be the difference between running your simulation in one hour instead of one week.

With this background, I instinctively do small-time optimization to all my codes by ordering for's and if's appropriately. Code that doesn't do this right justlookswrong to me.

Watch out for a visit from the premature optimisation police!

I try to avoid the premature optimisation police just as much as the regular police. Neither of them tend to be serving the public interest, whatever they may say.

I wrote some batch (list) oriented code for a static analyzer recently.

It was great until I decided to change my AST representation from a tuple+discrimated union to a generic type with a corresponding interface i.e. the interface handled the first member of the tuple (graph data) and the generic type the second member (node data).

This solved a bunch of annoying problems with the tuple representation but all list-oriented code broke because the functions operating on a list of generics types couldn't play nice with the functions operating on lists of interfaces.

I ended up switching to scalar functions pipelined between list functions because the generic type was more convenient to me than the list-oriented code. The reality is you often need to play with all the options until you find the "right" one for your use case, experience level and style.

I'm curious, why couldn't the list of generic types play nice with functions operating on lists of interfaces?

Have a look here:https://onecompiler.com/fsharp/3ztmx2uhr

Basically we want a "Yes" or a "No" when the family has children:

    let kidsYN = family |> numberOfChildren |> yesOrNo

But we get:

    error FS0001: Type mismatch. Expecting a
      INode list    
    but given a
      Node<Person> list    
    The type 'INode' does not match the type 'Node<Person>'

Forcing us to do:

    let familyAsINode = family |> List.map (fun n -> n :> INode)

Sure you can wrap this up in a function but it's ugly and annoying to have to use this everywhere and takes away from your logic. It ends up being better to split your "batch" and "scalar" operations and compose them e.g. by introducing a "mapsum" function:

    let kidsYN2 = family |> mapsum numberOfChildrenV2 |> yesOrNo

This kind of rule of thumb usually contains some mote of wisdom, but generally just creates the kind of thing I have to de-dogmatize from newer programmers.

There’s just always going to be a ton of cases where trying to adhere to this too rigidly is worse. And “just know when not to listen to this advice” is basically the core complexity here.

De-dogmatizing needs to happen, so what? I think these kinds of rules are helpful to play with; adopt them, ride them as far as they go, invert them for a day or year , see where that takes you. You learn their limits, so what? More grist for the palimpsest.

I think this article could be useful as a koan in a larger compilation.

Some of the koans should contradict each other.

Contrarily:

Push ifs down:

  BAD:
    if (ptr) delete ptr;
  GOOD:
    delete ptr;

Polymorphize your fors:

  frobnicate (walrus);
  frobnicate (walruses) { for walrus in walruses frobnicate (walrus); }

I agrée with them and with you. It looks like they work in some poor language that doesn’t allow overloading. Their example of `frobnicate`ing an optional being bad made me think: why not both? `void frobnicate(Foo&); void frobnicate(std::optional<Foo>& foo) { if (foo.has_value()) { frobnicate(foo); } }`. Now you can frobnicate `Foo`s and optional ones!

Indeed they are working in a poor language that doesn't allow overloading. It's Rust. :-)

Interesting that dependency inversion principal -- [1] is like an extreme case of pushing ifs up by encoding the if into the type interface implemention. Ultimately what you get with DI is pushing an if up some to ... somewhere.

  # Somewhere:
  walruses = [new TransWalrus(), new FrobWalarus()], ...]
  ...
  for(walrus in walruses) { 
   walrus.transfrobnicaterify()
  }

[1]https://en.wikipedia.org/wiki/Dependency_inversion_principle

This only works when `frobnicate` and `transmogrify` have the same argument and return types

I really thought the whole article was building up to a code example like

  [fwalrus, twalrus] = split(walrus, condition)
  frobnicate_batch(fwalrus)
  transmogrify_batch(twalrus)

And instead went for

  if condition {
    for walrus in walruses {
      walrus.frobnicate()
    }
  } else {
    for walrus in walruses {
      walrus.transmogrify()
    }
  }

Sort the data! Sort it! [1]

[1]:https://macton.smugmug.com/Other/2008-07-15-by-Eye-Fi/n-xmKD...

Thanks for the article. Maybe I’m confused, but why in the section near the end about how the two recommendations go together, why is the code this:

if condition { for walrus in walruses { walrus.frobnicate() } } else { for walrus in walruses { walrus.transmogrify() } } and not this?

if condition { frobnicate_batch(walruses) } else { transmogrify_batch(walruses) }

I thought that too. Ithinkthe point there was that you don’t have to push this notion as afar as in/out of functions: that just flipping them within a function can be beneficial.

This reads like the author is very close to rediscovering array oriented programming.

In his (for he is a 'he') defense, I believe much of the whole industry has been doing so over the past five years as well. Everything old is new again!

And yet, a Rust Option (or really any option) can just be viewed as a list of one or zero elements.https://rust-unofficial.github.io/patterns/idioms/option-ite...

In fact, in Haskell, operating on an option conditionally has the exact same functor as a list: `map`.

So what am I to do, with an iterator? It's conflicting advice! An if is a for for an option.

If there’s an if condition inside a function, consider if it could be moved to the caller instead

Haha, it is important to have logic where it is relevant. If performance is more relevant than semantics or maintainability do that. In all other cases favor locality, filter early and fscking kiss. Why is this news?

// GOOD >if condition { > for walrus in walruses { > walrus.frobnicate() > } >} else { > for walrus in walruses { > walrus.transmogrify() > } >}

What? They literally just said in the previous paragraph that the `for` should be pushed down into a batch function.

A beneficial side effect of this strategy (operating in batches with control logic out of the loop) is that you can also relatively easily distribute the work on many worker threads without touching the interface or the code structure.

What I noticed too for pushing the for loops down is that more functions start taking list of things instead of one thing. This makes it easier down the path when a requirement change and we need to handle multiple things instead of one thing with the same code logic. It decreases overall change in the codebase.

In other words: write small reusable functions. And an orchestrator "script" function to glue them all together. The orchesrator has all the "ifs" what is domain/application specific.

no one likes 'if/else', moving 'if/else' inside/outside a function is not a solution, if you are writing business logic or UI logic, we should try to avoid it as much as possible, only except when you are writing complex algorithm which the computational complexity is required.

Please stop using Rust in coding style examples as if it’s something people understand as widely as C.

I don’t know your egg head => symbols and idc.

I like my ifs down. In fact, after dedades of forcing myself to use parameter checklists and avoiding any nesting, I started to appreciate code that is nested just a couple of times intentionally to get rid of a bunch of conditionals that become implied as a result. It all depends on what feels natural and easy to read at the given stage of your life.

These heuristics feel to me like a corollary to a simpler, more general rule: avoid redundant work.

The real answer is "it depends", but in the vast majority of cases it is better to "push down" all control flow structures: both conditional branching and loops. This will make it easier to write the code as a pipeline, and therefore more readable.

Some argue that higher-order functions (HoFs) should be "pushed up" even if they make the code more verbose (that's a standard code style in Elixir), while the alternative is more readable - it creates lots of microfunctions, which are also clutter the code.

For programming langauges with either succinct control structures (APL family), or with syntactic sugar for them, I see no problem with "pushing them up".

Of course there are exceptions, e.g. in case the code needs to be optimized for speed, memory usage, security, or formal code correction.

I am just really tired of seeing articles like this. Sure, you find some rules that are helpful in some specific cases, but those rules almost never generalize (yeah the irony of me making a generalizing statement here is not lost on me, but I did say "almost").

Imagine you got a corporate programming job, and your manager come to you and says "here, in this company, we keep _all_ the if statements in one function, and no ifs are allowed anywhere else". I would just walk out on the spot.

Just stop, stop writing these articles and please stop upvoting them.

This is a variation of the “golden rule of software quality”

https://www.haskellforall.com/2020/07/the-golden-rule-of-sof...

One variation on this theme is to use subclass or function polymorphism. This lets you decouple (in time) the code that detects/tests the condition from the code that performs some operation based on the condition. In TFA’s enum example, instead of the enum values you could pass the foo/bar functions around as values (or instances of different subclasses implementing a common interface method as either foo or bar), and the place where the operation finally needs to be performed would invoke the passed-around function (or object). I.e., f() would return foo or bar as a function value, and g() would be passed the function value and simply invoke it instead of doing the `match` case distinction.

The drawback is that it’s then less clear in some parts of the code which implementation will be invoked. But the alternative is having to invoke the specific operation (directly or indirectly) immediately when the condition is determined. It’s a trade-off that depends on the situation.

Fors down sounds like a bad advice, and the rationale for it seems to be The Root of all Evil.

"Fors up" allows for composition, e.g. map. Fors down makes it clunky at best.

“Put ifs were they minimize the net total Cyclomatic complexity”

This is exactly what the factory design pattern is trying to achieve. Figure out the type of object to create and then use it everywhere vs a million different switch statements scattered around.

Also don’t create batch functions unless you need to. Functions that work on a single item compose better with map-reduce.

I'm glad many people have identified why "pushing ifs up" is often bad advice. This article should give examples of when and why you should useeitherapproach. Furthermore, I would argue that there's far too little information and context presented here to even make a decision like that.What do `frobnicate` and `transmogrify` do? How many callers would need to perform these conditional checks? Do these if statements convey domain logic that should actually belong in the walrus class? If these checks need to be made often, would it make better sense to capture the call as a lambda and then only perform the check once instead of having a conditional for loop? Etc etc.

The pushing-ifs-up assumes you are using a language where objects aren't nullable. This doesn't apply to most languages. Otherwise we would just get a potential null pointer exception when we don't check for null inside the function.

tl;dr Branch as early as possible (move ifs up) and as infrequently as possible (move loops down, to minimize the number of loops calling something that branches)

It probably actually is a good rule of thumb, in that it will naively force you into some local maxima of simplified logic. But eventually it becomes equivalent to saying "all of programming is loops and branches," which is not a very useful statement once you need to decide where to put them...

i first interpreted this on terms of organization policies, and then clicked the link.

functions, they’re just like us.

This usually culminates with the patient trying to cram all if/else logic into a type system, which effort eventually leads them to have an aneurysm and they have to be carried away on a stretcher.

As with much programming advice, this is language dependent.

You might want branching structures when you have no overloading. You might want guards and other structures when your type checking is dynamic.

I wrongly assumed this article was about readability and maintainability, two high cost factors if five poorly.

But I was wrong, here’s the motivation on “push fors up”:

The primary benefit here is performance. Plenty of performance, in extreme cases.

It seems like a decent general guideline.

It has made me wonder, though—do there exist compilers nowadays that will turn if’s inside inner loops into masked vector instructions somehow?

I love this advice, moving if statements "up" is something I've observed makes a big difference between code that's fun to work with and easy to maintain, and code that quickly gets unmaintainable. I'm sure everyone's familiar with the function that takes N different boolean flags to control different parts of the behavior.

I think it really comes down to: functions that have fewer if statements tend to be doing less, and are therefore more reusable.

A good example of this I see a lot in a code base I'm currently working in is React components that conditionally render or not. I really can't stand this pattern, and whenever I can I refactor that into having the component ALWAYS render, but have the caller decide whether or not to call the component.

This is kinda "just" predicate push-downs, for imperative code. Makes sense that the author is thinking about it given he is working on databases (tigerbeetle) and part of the motivation is performance.

Interesting that we push the ifs up but we push the predicates down! (And a "predicate pushup" sounds like you are adding some randomness to your exercise routine - one, two, skipping this one, four, ...).

While performance is perhaps the primary motivation for the for advice, sometimes it helps with expressiveness as well.

Compilers can do this. And I don't think the second part of that sentence is applicable very often.

The way he explains "pushing `for`s down" is obvious. Restating it would be: only do something you need in a for loop and not something you could do once. But that has nothing to do with what he means by "pushing `if`s up" which is about making `if`s appear es early as possible in the control flow. It doesn't matter if `for`s are early or late in the control flow. What matters is that only things that need to be done n times with n being the loop count of the for loop are in the for loop and everything else is not. And I disagree with "push ifs up" stated unqualified.

conditionis an invariant. Unless using cranelift or gcc, it's going to get optimized away by LLVM unless rustc is giving it some non-invariant constraints to solve for. Most compilers, JITs, and interpreters can and do do invariant optimization.

Another way to look at the class of problem: if you're using too many conditionals too similarly in many places, you may have created a god type or god function with insufficient abstraction and too much shared state that should be separated.

---

Prime directive 0.Write working code.

Prime directive 1.Choose appropriate algorithms and data structures suitable for the task being mindful of the approximate big O CPU and memory impact.

Prime directive 2.Write maintainable, tested code. This includes being unsurprisingly straightforward.

Prime directive 3.Exceed nonfunctional requirements: Write code that is economically viable. If it's too slow, it's unusable. If it's somewhat too slow, it could be very expensive to run or will cost N people M time.

Prime directive 4.If it becomes too slow, profile and optimize based on comparing real benchmark data rather than guessing.

Prime directive 5.Violate any rule for pragmatic avoidance of absurdity.