Code duplication sometimes is a good thing but it doesn't prove that clean code is a bad thing.
I think you overcommitted a bit on the refactor, if you had replaced those "10 repetitive lines of math" with a function it would have been cleaner.
Let's be crystal clear about it, your teammate did a terrible job not creating a function for those 10 repetitive lines. I would be rejecting this PR but never rewriting it.
Rewriting a PR is a quick way to insult someone because you're not open to debate and many times you don't have the full picture of why the code was written this way. PRs have to be reviewed and rejected when there are no coding standards, the last thing I want is people committing code and saying "Clean code is not important", this mindset only leads to a codebase that no one wants to work within a matter of months.
Communicating better and having some tact is the lesson you should be taking from your experience, clean code has nothing to do with it.
I've found over a long time (20+ years) that this is usually a sentiment held by people who are focusing on the wrong things in code bases (and quite often aren't actually solving real problems but spend their time solving non-problems with the additional side effect of creating more for the future). The first implementation of this should most definitely not abstract away anything like those 10 lines (which is minuscule). It's trivial to take something that does exactly (and only) the thing and modify it later and it's pointless to abstract away something as small as 10 lines for a gain you haven't yet proven or tested.
"Clean Code" is absolutely not important and most rules/"principles" of the same character as those that make up Clean Code are absolutely not important either, but are things that people with nothing better to do hold on to in order to validate the hornets nests they accumulate in code bases over time. It leads to over-abstracted, hard-to-change code that runs badly and is much harder to understand, generally speaking.
The only thing you should use as a guiding principle, if anything, is to express your data transformations in a clear and easily changed way, since that's the only real thing a program actually does. If something doesn't have to do with improving the expression of data transformation it's very likely it's bullshit that's been stacked on top of what you're doing for no real reason.
Most of the SOLID principles have nothing to do with or in fact make it harder to see what data transformations take place and how, which is why they are largely useless for understanding what is actually going on in a program.
So much this. It’s hard not to lose the forest for the trees — we are craftspeople after all — but at the end of the day the overall structure of a program is so much important than whether there’s a bit of duplication here or there. Better to let the structure emerge and then reduce duplication instead of trying to guess the right structure up front. And yeah I’m still not convinced SOLID is real (but I’m also not convinced classes are useful much of the time, for that matter).
Letting the structure emerge requires people thinking in depth about the underlying principles of what the code does or should do.
As for classes: They are merely a construct in many languages, that people have come up with for organizing code and in my opinion a very debatable one. Some newer languages don't even deal in classes at all (Rust for example) and with good reason. If one says we need classes for having objects—No we don't. And objects are a concept to manage state over the lifetime of what the object represents, so that might be a worthy concept, but a class? I mostly find classes being used as a kind of modules, not actually doing anything but grouping functionality, that could simply be expressed by writing ... functions ... in a module, a construct for grouping that functionality.
I think what many people actually want is modularity, which in contrast to classes is a concept, that truly seems to be well accepted and almost every language tries to offer it in some way or another. It is just that many people don't realize, that this is what they are chasing after, when they write class after class in some mainstream language, that possibly does not even provide a good module system.
When you first write a struct, and then you below write an implement block for that struct containing functions that can only be applied to that struct, it really looks like a class to me, it just has a syntax that differs from what we're used to in other languages. Why wouldn't you call that a class?
The idea is, that it decouples state from behavior, while a class tries to group that together. Other people can implement traits later for your struct. (I think Clojure has something similar.) They do not need to subclass your struct and in fact cannot even. This design encourages and kind of forces composition over inheritance.
I would not name it a class, because I don't want people to fall back into thinking: "Ah I know! Inheritance!".
Classes are about data abstraction and encapsulation, which have nothing to do with implementation inheritance. They're about providing an interface that preserves any required invariants and does not depend directly on how the data is represented. A "structure" that either preserves useful invariants or is intended to admit of multiple possible representations that nonetheless expose the same outward behavior is effectively a class, whether you call it one or not.
I think the discussion of what to call that is a bit pointless. For some people through their Java/C++/whatever-tinted glasses, these things will be classes. To others they might be called something else. You personally call them "classes". I personally do not. Rust the language does not. Lots of people behind the development of the language thought that structs and traits are better names.
I appreciate the programming language design behind them and hope, that Rust will not devolve into an ecosystem, where everyone thinks that they must put everything into classes or similar, needlessly maintaining state therein, requiring users to mutate that state through accessors and whatnot, when simply a couple of functions (and I mean strictly functions, not procedures) would have done the job.
I never stated, that I personally think classes necessarily mean inheritance. But guess who thinks so. Lots and lots of developers working with mainstream languages, where frequently inheritance is made use of, in combination with what those languages call a "class". That is why I am saying, that I don't want those people to fall back into their previous thinking and would not want to call them classes. It gives many many people the wrong ideas.
What other people call it is their own choice. I am merely stating my own preference here and probably the preference of the language designers, whom I mostly deem to be quite a lot more competent in drawing the distinction than myself.
There are plenty of cases where this makes sense, such as when working with sub-word data (bitfields), which is common in the embedded domain and often found as part of efficient code more generally. In fact, it may be more rare to have actual structs where one definitely wants to provide actual access (e.g. via pointers/refs) to the underlying data, and thus cannot just rely on getters/setters.
Well, having classes doesn't mean that you will necessarily use inheritance. There are programmers (ab)using it a lot, but for me, like for many others, classes are primarily a way to organize code: they provide a convinient way of representing something and specify functions that only make sense in the context of that something, while ensuring that you don't accidentally re-use those function for something else or lose track of which functions are meant for which component. They also provide a way to make collaboration easier, as you can agree on classes' interfaces and then each one goes on to implement it's own classes without having to worry about implementation details of other classes. It is true that you usually also have inheritance with classes, but I'm unsure if having it is a requirement to call something a class. IIRC from a theory perspective, classes are just an abstraction of some concepts, and the fact that a class' instance is called an object reflects this.
I think the person you're replying to tried to address that point that classes are primarily a way to organise code when other possibly equally good or better options exist like modules. An F# module might (for example) look like below.
module Hello = let say() = "hi" // returns the string "hi"
There are mechanisms to encapsulate implementation details (private functions), to have multiple modules for different "domains" and specifying public contracts.
A class seems to imply more than that: each class specifies how to create an object with a constructor (where an object is something with the class's methods except modifying some state only owned by and accessible by the object itself).
But in Rust you've got constructors as well. The only thing that really seems to be missing is inheritance. But I understand that one might say that classes without the possibility of using inheritance don't look fully right.
That's specious reasoning at best.
A class basically means a way to specify types that track specific states and behavior. Afterwards this materializes in other traits like interfaces and information-hiding.
Don't tell me Rust does not support those.
Also, C++ precedes Rust and it supports free functions and objects without member functions from day one. Rust is hardly relevant or unique in this regard. What Rust supports or not is not progress, nor is the definition of progress whatever Rust supports.
You're showing some ignorance here. Classes and objects are entirely different concepts. An object is an instance of a class. A class is a blueprint of an object. This is programming 101.
Right, yes, but those principles are often still very much in flux in the early days of a feature. Once a feature is more mature, it’s easier to confidently say what the code should do, and so that becomes a good time to refactor. Early on in the development lifecycle I think it’s rarely a good idea to worry about code duplication, underabstraction, etc.
And yes I agree with you that classes are an organizational concept with parallels in functional languages. Modularity is very important, but as you say there’s no reason that modularity implies classes. Sometimes I find classes to be ergonomic, and when they are using them makes sense, but plenty of other times a struct will do, as long as there’s some type of module system to keep different things different.
If you implement one single function, then it's easy to refactor later on.
If over 10 years, dozens of people add feature after feature without thinking carefully about the structure of the code, you get huge messes that are very hard to clean up. Then suddenly an old, unsupported dependency needs to be replaced (e.g. because of a vulnerability) and it takes weeks to remove it because there is zero separation of concerns anywhere in your codebase.
You say that overabstracted code is hard to change. I agree, but so is underabstracted code - or simply code with bad abstractions. I don't care particularly about SOLID, since I'm not too fond of OOP anymore, but encapsulation (which is a concept independent of paradigms) is definitely crucial when you want to make sure a code base is maintainable in the future.
I am not against abstraction, but the attitude that the person's colleague should've *definitely* abstracted away this minuscule amount of lines immediately even in their first implementation is not just abstraction, it's premature abstraction and the kind of dogmatic thinking that leads to the problems I outlined. The blog post scenario is not unique; the correct solution is almost always to leave the code less abstracted and only abstract it when there are actual proven use cases.
Exactly : "premature abstraction and the kind of dogmatic thinking".
This thread is filled with nit-picking this code, when the biggest problem was actually team communications and overall discussions about future direction. Many young programmers will spin their wheels on these types of things, but then sales/marketing might be like 'dude, this has a 1-2 year life span at most'. Or, this is being replaced by product-x next year.
If I had a nickel for every piece of "will be turned off in 6 months, tops" software that my workplace still maintains a decade later, I'd probably double my salary.
Definitely this. I've seen it, too, and I'd wager the vast majority of IT professionals have seen it, too.
At one particular company I spent a decade at, I was brought in specifically to help in the decomissioning of a particular set of legacy databases. One particular database was still on life support because a particular trading desk wouldn't move off of it, and since they made a shit-ton of money for the firm, they got a free pass on it, while IT constantly being dinged every year for the legacy database still being alive. Competing mandates between a profit center and a cost center, the cost center always loses. Which is fine, but then don't blame the cost center for not being able to make their goals because the profit center is dragging their heals. (Which was _not_ done at this firm).
Why didn't the IT folks step in and provide the labor for the conversion? This helps them accomplish their goals. Tading desks love free labor. This shows a novel way in which cost centers can attribute their costs to lines of business. Finally, this demonstrates leadership and initiative by the IT department and heads--especially if they have to fight an uphill battle about IP, and they succeed in convincing the desk.
The code should be flexible. The first iteration wasn't, simple, this is the code juniors in high school write. It always leads to a mess later on. Now, if you are a beginner like you seem to be, then it's a good code. Sure.
I just want to point out that this following line could come across as condescending, even if not intended that way.
You don't know anything about the person you are replying to except what they posted, and assuming someone with a different view (which I happen to agree with from my experiences) automatically has less experience (instead of acknowledging that others may have good reasons for their views which one hasn't considered or run into) doesn't reflect well.
The first iteration is plenty flexible where it needs to be. I've been programming since 2001 and honestly, the reason I'm holding this view and you're not is very likely the exact opposite situation of what you think: I think it's hard to find experienced and knowledgeable programmers who think the best time to abstract something is the first iteration of something.
We don't really know what the actual implementation looks like, so I find it hard to make a definite judgement. I think the refactoring introduced by the author probably wasn't good, because it looks like a very brittle abstraction. But I find it hard to believe that among those 10 lines of maths each, there aren't any functions one can extract that form some natural abstraction (e.g. "translate vector" or whatever), so that every handle function ends up being maybe 1 or 2 lines instead of 10.
That's besides the point, isn't it? Just because you can in theory extract functions that doesn't mean you should, or that your codebase will be in a better shape if you do. There's the issue of coupling, and there's the issue of introducing dependencies, and there's the issue of whether the apparent duplication holds the same semantics and behavior across the project. I'm talking about things like does it make sense to change the behavior on all 10 code paths if you need to change it in only two or three? Having a surface-level similarity between code blocks doesn't make them the same behavior, does it?
There's no definites in software development but if a formula is repeated 10 times you probably have a good name for it and at that point it probably should be in a function.
I'm not on the train of applying "Clean" Code always and everywhere. However:
I have to disagree here a little.
How are you going to "prove" lower maintenance cost? How are you going to prove, that less bugs happened? How are you going to prove that ahead of making the change? How would you prove it even after making the change? You cannot go back in time and live the same time again with a different approach. So what you are demanding as a kind of proof is never going to be possible and as such you are never going to have the refactoring. This argument could be had about any kind of refactoring, not only the one described in the blog post.
If one plasters the same code in many places, I have to doubt their basic understanding of computer programming. Have fun bugfixing N places instead of 1 place, once a bug is discovered. You hopefully don't be forgetting any places.
This is not to say, that no duplication ever should exist. Lets not be extremists.
That is a very broad over-generalization. Maybe some or even many people do as you say, but not everyone. There are indeed good ideas in Clean Code. We just need to not be dogmatic about them.
The existence of a situation where the code has to be changed in several places will prove the connection between all of these pieces of code. This is best proven in actual practice without guesswork, which any premature abstraction is.
The first time this happens the programmer will undoubtedly be much more well informed about the connections between the pieces of code and will have an actual case of them being related, which is why it's much better to delay any abstraction until that point. Future-proofing and playing oracle about the future needs of code is generally a dead end but the programmers least capable (i.e. junior programmers and/or Clean Code enthusiasts) of playing that game are also the most likely to do so.
In a perfect world maybe. And even then it would be better not to accumulate issues until they start hurting. But in the real world there are often people, who never had these issues on their calendar and will not be willing to allow for sudden fix it time.
I don't need to be an oracle to tell, that 10 times the same code will incur the need to change 10 places instead of 1. I don't need to be an oracle to write code, that from the beginning can expect some change in business needs.
I am fed up with people thinking, that developers are like little babies, who never learn anything from their past experiences. All in good measure of course. If I tell someone, that I see a problem for the maintainability of the code of some project, because it cannot easily be extended in ways, that are likely to happen due to business needs, I expect them to take me serious and not belittle my experience. Engineering is my discipline, and often not theirs.
I have build systems, that anticipated changing requirements and they did change just like I thought they would. How did I do it? By making parts of the code work in "additive" ways. I cannot state exact details, but there was some project about checking values of variables and give some result of whether the values were correct. The values were of various types, so they required various kinds of checks. For example string edit distance. In such a situation it is already foreseeable, that at some point you might want to add checks that check the values in different ways. For example number equivalence and then later on equivalence with some epsilon, some allowed deviation from the correct value. So I implemented it in a way that I can register checks. That way one can add a check later and register it, without changing existing code. These things can be anticipated and a good design will allow for change later easily. No need to be an oracle. Just need some experience.
I'm not sure you and the OP are even in significant disagreement. When exactly did the OP say developers are like babies who don't learn from experience? From my perspective, the OP seems to be critiquing the culture of indoctrinating inexperienced and/or developers into the cult of clean code. It is the clean code camp that is more likely to dismiss experienced engineers as unenlightened dinosaurs.
The first time this happens, the programmer will know about 1 or two of those places, and insert difficult to find bugs all over your software.
Indeed.
The first time this happens the programmer will spend hours looking for the first few places that need to be changed. Then he will spend days debugging, and hopefully will find the last few places.
Then, even though he is now informed about the connections, he will NOT refactor this code because he (a) has no time to refactor as he already spent a week on what seemed like a one-line fix, and (b) is now terrified to touch this code any more than absolutely necessary.
Then the cycle will repeat.
The advantage, if the abstraction is wrong, of using the abstraction is that you can use tool automation too look up all the uses of the abstraction and replace them with a new one at need without needing to memorize the codebase.
Hodgepodge copy-template code that has minor context-sensitive structural differences is relatively difficult to find all instances of in a codebase without tracing execution paths unless you were the original author and you have the uses memorized. In contrast most IDEs have a find all references and even grep can find all the common names of something in a codebase.
Abstractions, even bad abstractions, are far easier to deal with than unabstracted code. Abstractions are cheap and disposable. You're not married to them. They only seem hard to change if you're stuck on the mindset that to implement a change to a piece of code using an abstraction, that you have to change that abstraction, which is incorrect. If an abstraction no longer fits, you should more often than not stop using it in that part of the code rather than altering the abstraction to fit the new use case. They are also easier to understand by virtue of communicating their intent, unlike unabstracted code. It takes a lot of effort, on the other hand, to read through and understand a chunk of unabstracted code, and even more to change it or write a slightly differing variant of it for a new use case, and even more to extract any kind of abstraction from a bloated unabstracted mess. With this being the case, it always makes sense to eagerly abstract if you can, even if your abstractions are bad, because the result will still be more maintainable than unabstracted code.
Although I agree with this there is also another, more subtle thing going on. Rewriting the code that works and someone else is maintaining is a waste of the rewriter's time and is unprofessional. It also denies the original person an opportunity to learn because they don't see how lower quality code wastes time in practice (the time wasted in the refactor doesn't count because Abramov did it by initiative).
The clean code approach is better, but the issue here isn't code. It is that he was wasting his time while not providing useful feedback to the unnamed coder. He was making terrible choices. He isn't maximising the business or team outcomes. The best outcome is the original coder raising their standards and him going and working on something that needs work. He should have angled to that. Ie, the correct option was to do a code review.
Rewriting existing functioning code is not a waste of time in general. We don't know the full picture, so one cannot make such a generalized statement. I can write perfectly functioning code in the most terrible way that makes maintaining the code a herculean task. Very easily so even. Making things worse is almost always easy. Rewriting that code might make maintenance much easier, even if the code was previously working. It might make onboarding easier, since others can better understand what is going on. It is not in general unprofessional. To judge that, you need a way more complete picture. It can even be unprofessional to leave horrible code as is, instead of fixing it, leaving future people to plaster over it with more leaky abstraction.
But, without talking to the other person is the problem.
This made it sound like it was checked in, and this guy jumped in and re-wrote it the very same week without talking to the other person. That is probably the biggest problem here.
So, there was a project happening, one person did a section, someone else jumped in and re-did it (double work), without talking about it (risk another future triple re-write).
Not specifically for this blog post, but this is more common than it should be, and it's a symptom of organizational failure most of the time.
Either the code was merged without the people that would be directly impacted by it reviewing it, they had two teams touching the same code-path almost simultaneously and not talking to each other or some other lack of communication.
Sometimes the person rewriting the code is simply cutting through the bullshit because they know the original committer simply sucks at their job and you can't do anything about it (yes, I've seen that before).
Ultimately it could just be the guy being a jerk, but more often than not, it's not as simple as it seems.
I disagree slightly; I think there’s an important thing here which was a little bit glossed over in the article.
The original code established a pattern and left space for where and how to add code to handle edge cases, when they arose, whereas the clean code solution was implemented as though it was the complete and final solution.
A first draft is often incomplete, and in an oversimplified first draft, it’s easy to pull out patterns and remove “duplication” which is a result of the simplification, not an inherent feature of the problem being solved.
Wow. Just review the review and if it's good, merge it, resubmit to a different reviewer, whatever your process is. The reviewer/re-writer is helping get work done and being offended is counter-productive.
What the GP explained is commonly true when it comes to interacting with other humans, with only a few exceptions. You can complain about human nature being counter productive all you want, but refusing to adapt to this reality and foregoing fundamental soft skills is, ironically, even more counter productive.
Chesterton's Fence [0]
[0] https://fs.blog/chestertons-fence/
I think the original blog post is badly worded. We can see by the proposed refactor that in fact those 10 lines were not identical - they were only similar. The formulas for resizing the top-left corner of a rectangle are different from the formulas for modifying the bottom-right of an oval. They look quite similar, but one will have a + instead of a minus here and another one there etc.
The complexity is built into geometry itself in this case, and "abstracting" it away is only moving the mess around, it fundamentally can't (perhaps there is some clever mathematical abstraction that could using some special number group, but unless you have that built-in, it'll probably be much harder to implement the special arithmetic using built-in ints than you gain).
Source domain checks out
Absolutely. The lesson is "goodbye to over-applying a rule without considering context-specific trade-offs", which is a lesson that takes time to learn.
Assuming this story is real, there are several smells
* Colleague writes quite a bit of code, and merges it without anyone reviewing it or providing feedback.
* OP rewrites the code from a colleague without communicating, and again merges it without a review.
* The manager calls a meeting with OP and ask them to revert their changes.
The initial code might have been messy or not, and the refactor might have been a bad or good idea. Nevertheless I think OP is taking away the wrong lesson.
To be fair, he does say that he took that away as a lesson. He listed it first, so I’m charitably going to call it the “primary lesson”