This is a new summary of the real-life case that inspired me to go do PhD studies in information systems. I wanted to understand why IT incompetence still could exist to this degree in this day and age after all the knowledge the world had developed about good IT and software practices over decades of experience. Of course, I quickly found out that IS research had already figured most of this out, and that perhaps, people were just people and crappy organizations were just crappy organizations, and perhaps that's something that will never change because bell curve distributions exist for almost everything.
If all people could be selfless, humble, intelligent, competent at everything they ever try to do, AND be people of integrity, I'm sure many of the world's problems would disappear, in addition to just IT problems. People suck, organizations suck, societies suck, and we all suck in our own way. I think organizational research like I'm trying to do still tries to point us towards a better path nevertheless, but sometimes it's a throw your hands in the air thing. Some people will never care.
Needless to say, I'm no longer focused on researching this topic, as it seems really well-researched already. But it's still interesting to see that this particular example still pops up in a news report now and then. There are still plenty of other big examples that pop up every year, but this one seems to have staying power to stay in the news media.
and yet you don't see very many buildings collapse, bridges fail, and damns flood down valley. The idea is that there should be liability assigned to important systems, for which this liability makes the onus on the creator/owner to build in safe guards, checks or other protections to prevent disasters.
Why this isn't applied to software engineering is a whole nother story, but i think it probably should. Move fast and break things is not something i wanted to hear tbh.
There is something about how buildings, bridges, and damns are tangible and therefore easier for people to conceptualize and appreciate. There is a natural literacy there in that you don't have to understand what goes into making a bridge to understand the consequences of having a bad bridge.
Meanwhile, software's inner workings are encapsulated so that they're not self-evident as to consequences if things go awry. Furthermore, software is inconsistent in how it works, and this trains people to think that software is just naturally glitchy, but the glitches aren't a big deal. See the joke about different types of engineers also: https://www.reddit.com/r/Jokes/comments/pqr8t3/four_engineer...
The mechanical engineer says: “It’s a broken starter”
The electrical engineer says: “Dead battery”
The chemical engineer says: “Impurities in the gasoline”
The IT engineer says: “Hey guys, I have an idea, how about we all get out of the car and get back in”
If that's what the IT guy recommends, how do you get users, never mind corporate managers and executives, to take software system quality seriously? Obviously, this isn't what a proper software guy will think, the proper software guy knows that things are a bit more complex than that. But this is what the software guy communicates to non-technical people. And only a few rare non-technical people will tell the software guy, "OK, look, just tell me what's really going on here, how deep the problem is, and what we need to do to fix it, no matter what it takes." Most people won't have the time for that because fact of the matter is that the system satisfices needs until it doesn't, and then it's too late.
SW engineering is still in its infancy compared to other disciplines that have 100s of years of knowledge accumulated and built into their education process. SW education is different country to country and institution to institution. In no other engineering discipline would it be tolerated that you could be self educated. The lack of a standards and governing body that you would be expected to be accredited to is a major gap.
In Europe in order to be called yourself an Engineer you need a Bachelor's Degree as a mininum. If not, it's illegal to work as an Engineer because being one requires civil accountability on casualities. You might work as one only if a colleague at work signs up your project. And, yet, depending on the case your colleague will refuse because he or she would be the one to sue in case of risks.
Not everywhere in Europe. Not in the UK for example.
There is legal accountability though, and a much better understanding of who can be held to blame for what. Software suppliers are pretty much immune to consequences of carelessness.
However, I think the real problem is that software is far more complex than a bridge or a building. IT system have complex hardware (far more complex than any mechanical device) running even more complex software (counting all the layers from OS up). On top of that people (meaning users/buyers) have no idea how to evaluate safety/reliability/security and mostly seem to regard it as a nice to have to be traded off against other nice to haves, not an essential baseline.
Once its implement everyone assumes the computer must be right, and acts accordingly. The presumption that computers are right is even enshrined in UK law (the intent was to stop people getting out of things like speeding tickets by claiming speed cameras where faulty) but I think everyone has come across situations where the final word on a dispute was "the computer says so".
How about the safe code in avionics/nuclear sites, ADA/Spark, Misra C? Not magic or as safe-proof as a physical engineering where it's far easier to guess how a system will behave, OFC, but it's a good start.
Yes, it gets done when necessary which shows we can do it.
It does not get done everywhere because it is not a high enough priority.
I'm pretty sure that Bologna abolished 'Engineer' titles.
In Germany, the protected title Ingenieur is still around but most bachelor degrees in a technical field grant you the right to use it. Whether or not you are one doesn't change the liability situation (although you can't perform certain works at all without). It's not really relevant in software development.
The English term "Engineer" can be used by anybody though.
I think not, because the difference between being an actual one and not it's to spend several years in jail (you and your employer) in case of life or health related damages and harms.
Not in Sweden. There are a lot of creatively named engineers around here.
This is true in the US in regulated industries as well
Belief in software glitchiness should have saved these people from erroneous prosecution, but instead what we see is people trusted the computers to be infallible.
No they trusted management to be infallible, even when management couldn't redo the calculations of the software manually.
This sort of thing is inevitable once a company is controlled by people who don't understand what the company does.
Management replaced know-how with software, and it mostly worked. Then they probably fired the few remaining people that had the know-how. Then, when it became painfully clear something was wrong, management chose to blame anyone, including to the point of pursuing prison sentences, rather than admitting they didn't know their business and someone made a mistake. They chose to destroy people's lives rather than admit they couldn't do the ONE job they claim they can.
I think we should examine why this doesn't happen with structural engineering firms.
For the third time on this thread I'm going to say it is because you have to have proper qualifications to do that for a living.
Ok but, for example, Boeing has recently come under fire for poor quality control, which people are blaming on management decisions taking power away from technical experts.
And the enormous Millennium Tower in San Francisco is sinking and tilting.
So it's not unique to software.
I previously worked in factory automation back when computers were first getting introduced into factories. After I spent a day tracking down and correcting some software glitch that was causing a problem, the factory manager would ask what the problem had been. I was instructed to say "bad valve, we replaced it," which I noticed they found far more satisfying to hear than "software writing to wrong register."
In one of the books I got on technology (early '00s?) there was an anecdote about a computer at a military facility crashing each night (probably in the 70s or 80s). They brought in a consultant to look at it and after an extended period of time looking for the problem in the code, the consultant was staying late in the data center wracking his brain over the problem.
At 10pm, a janitor came in, unplugged the computer, and plugged in his floor clearer, cleaned the floor, and then when done, unplugged the floor cleaner, and plugged the computer back in.
The consultant then suggested using the outlet on the other wall that had a free outlet (and got a 'difficult to unplug' cover for the outlet with the computer).
The next night he stayed late again and the janitor used the outlet on the other wall.
The consultant then told management that the problem had been solved and it was a buffer problem.
There's a book called The Devouring Fungus: Tales of the Computer Age [1] that features a story like this, was that the book?
Apparently there are tons of these stories [2] [3], many of them probably urban legends.
But there was one last year that was definitely real [4], when a cleaner removed power to a freezer holding decades of samples, apparently because they were annoyed by an alarm sound it was making.
[1] https://www.goodreads.com/book/show/3227607-the-devouring-fu...
[2] https://old.reddit.com/r/talesfromtechsupport/comments/5yrs1...
[3] https://www.logikalsolutions.com/wordpress/information-techn...
[4] https://www.theguardian.com/us-news/2023/jun/27/cleaner-coll...
That was the book. It's been a few years since I've dug it out... and looking it up on Internet Archive...
https://archive.org/details/devouringfungust00jenn/page/96/m... (pages 97 to 98) is the proper telling of the story.
I have worked with ECU programming. When test drivers got stuck and called me it was the first thing I told them. "Restart the vehicle". If that didn't work, "disconnect the battery and wait 5 minutes".
They quickly learned.
1. This is garbage, we should do better.
2. Early implementations of Britain's TPWS on railways are exactly like this. Suppose you're pulling into a terminus station and, for whatever reason, you happen to stop such that your train's sensors are right on top of the TPWS "toast rack" (basically think radio transmitter). When the next driver turns the train on, it can see a TPWS transmission. Now, it knows perfectly well it isn't moving, so we're not in some terrifying near-death scenario, and it could let you, a human train driver, sort this out by, you know, not crashing the train into the station buffers. Nope, in v1.0 the firmware just considers starting in this situation to be a fatal error and won't let you move at all, the only authorised solution is to get rid of any passengers who've boarded your train, switch the train into it's (unsafe for passenger service) maintenance mode, drive it away from the beacons a few yards, stop, turn off the power, and then reboot the computer where it now can't see a troubling TPWS transmitter and now it's a working train again.
This happened to us with a first model year vehicle multiple times until the dealership received the necessary software upgrades. It was quite frustrating to be stuck at a busy fuel pump for 10 minutes until the software systems restarted properly.
"The IT person says". IT is not an engineering discipline (nor is software development in most cases), although they really should be.
And "turn it off and on again" is the canonical "joke".
Software is more like blueprints/processes for manufacturing. When something doesn't work right it's a yield rate issue. Oops the process failed, try it again.
Software can, and should, also be helpfully smarter. Where there's a chance for invalid inputs, they should be validated. That would help with errors like:
* Authentication failed, check your account credentials.
* Could not connect to remote server.
* Invalid data input, failed to parse around byte/octet 1,234,666, near line 56789 (printable filtered) text: 'Invalid String Example Here'
Yes, those are my take on 'software' versions of the starter, battery, and impurity.
Lots of well written software will do this, mostly good compilers for software.
Poorly written software, often hides these errors if it even collects them at all. PHBs and other less professional people seem allergic to thinking, for them it's easier to call in a specialist or just not do it with the broken tool.
This is a British scandal, and we’re quite capable of avoiding consequences for malfeasance in every sector of our economy, not just IT thank you very much.
The Grenfell Tower fire comes to mind, as someone was mentioning above how this sort of stuff rarely happens when it comes to buildings and the like. As far as I know there were no real consequences for the people that were directly responsible for that tragedy.
Not only no consequences, it looks like they're trying to cash in by rinsing the occupants for the cost to remove the cladding on similar buildings!
They're protecting property owners at every step. Why haven't they condemned large/tall buildings with inadaquate stairwells for fire evacuation (in fact, given the timeline of the fire, Grenfell's single staircase would have been amply adaquate to evacuate the building before people started dying)? Because that would totally hose the people who own such buildings. And why aren't they requiring large buildings to have adaquate staircases? Because that would eat into profitable floorspace. The UK prioritizes property owner profit.
They kinda have - check out the cladding scandal [1]. It's pretty major, a lot of properties are essentially unmortgageable until they get inspected to make sure they're safe.
[1] https://en.wikipedia.org/wiki/United_Kingdom_cladding_crisis
>The UK prioritizes property owner profit.
So the US didn't fall far from the tree. It's just like 'daddy'. <wipes away imperial tear>
It was deregulation in my mind that caused Grenfell.
You used to have to get your design cleared by a Local Authority Buildings Inspector. Now you can opt out and get a private company to do it!
Grenfell revealed loads of failings.
Some were deregulation, but some were outright deliberate fraud, like when Kingspan and Celotex scammed the test process and produced misleading safety documents to get their flammable insulation used where it wasn't legal.
And local authorities don't come out of it well either - the building was owned and maintained by a local authority, they hired the architects, the design-and-build contractor, and the guy whose job it was to carry out fire risk assessments. And they chose a bidding process that drove the price so low the bidders who wanted to use non-flammable cladding all dropped out.
Definitely need to stricter regulation though - the idea we could trust the construction industry to use flammable cladding safely has proven false. Flammable cladding should be banned on tall buildings all together.
I think we are in agreement. From what I remember, Kingspan changed their formulation and didn't repeat the fire testing.
Again, how is it you are allowed to do your own testing, rather than submit samples regularly to a test house? Or perhaps products on higher risk buildings should be spot checked. A sample from the jobsite itself could be tested.
Self- regulation is a joke.
Well, they did submit samples to a test house - the Buildings Research Establishment - for fire safety testing. And they got safety certifications from independent bodies like LABC.
Just you know, it turns out while the test rigs were being built, the BRE might have looked the other way while someone put fireproof magnesium oxide boards over the temperature sensors.
And when the manufacturer sent 'all the test results' to LABC for the certification, they didn't include the tests that ended in "a raging inferno". You see, that test was terminated early, so it wasn't a completed test. And when the manufacturer got the certificate granted, which included a bunch of caveats about how the material specifically had to be used, they immediately quoted it in their marketing materials without any of those caveats.
Of course all this fraud was done by some 18 year old junior employee, who was told it was 'industry standard behaviour' by a mid-level employee (who has conveniently since died) and the CEO had 'absolutely no idea' this was going on right under his nose.
Exactly. And go take a swim in a river if you want to experience actual consequences of the mismanagement of our water infrastructure.
There is still hope for Grenfell prosecutions after the inquest, but justice repeatedly delayed for no good reason is justice denied, obviously.
Cant stop laughing. For some strange reason this sentence reminded me of Yes Minister.
computer says no
It's easier for software to have a catastrophic failure than a building, bridge, or dam. It's not really a fair comparison when those are like easy mode compared to software.
I don't think it's "easy mode" and more that they're just very different domains.
As time goes on, I feel more and more strongly that we shouldn't compare software engineering with other types of engineering at all. My younger brother works in the structural engineering space and the types of issues he's faced across different roles is very different to the types of issues I face in software.
Building design and construction have to handle the physical world and dealing with the realities of physically building the structure. Once it's built, however, the occupants don't make drastic changes to the building itself. In comparison, something like Facebook or Twitter needed to change drastically just to remain usable as they grew. "Just stop adding users" isn't a sensible solution to the problem.
Just to be clear that I do not excuse the, frankly, shit software design of Horizon and the fucking appalling behaviour of the Post Office throughout this scandal. I do think, however, that comparing software engineering the other types of engineering does a disservice to all and doesn't take us closer to actually improving the craft.
I've got ~15 years of software development under my belt. And I've also experienced a 2ish year stint where I was exposed to a lot of structural engineering.
The structural engineers take a piece of steel out into the desert and they dump sand on it until it collapses. Then they divide whatever weight the sand was by 3 and write it down into a book which is then published for all the other structural engineers. This type of steel with this type of configuration with this length can hold up to this much weight.
Meanwhile, in software engineering the limiting factor that prevents collapse is often "these sets of constraints and requirements are too complicated for the team to keep track of in their heads in this messy codebase". Not only do we not have a way to test this and publish the results, it's not even obvious what exactly you're even trying to measure.
[Cyclomatic complexity isn't very well correlated to defects (iirc lines of code is better correlated). Design patterns, best practices, and code smells are all just poetry that we use to story tell our ways out of blame. And Weyuker's 9 Properties are at best a waste of time. There is currently no way to measure bad code that fails under load until it does so in production.]
FWIW, techniques to perform such measures have existed for ~30 years in PL research (some of Rust's type system comes from this kind of research), but I don't know of any industrial code that makes use of it.
Really? I would love a link to the research then.
So far the only[] thing I've found was Weyuker's 9 Properties from measurement theory, and that did not seem particularly compelling.
Now, I get that linear/affine types and dependent types (and theorem provers, etc) can be used to prove that you're following the spec. But this doesn't prove that the code is easily comprehended by the people interacting with it. Code that is provably correct (even in the event where the spec is actually exactly what you want) can still fail because it's too complicated for anyone to modify it anymore.
For example, the function composition operator type in agda is kind of intimidating:
[] - There are also random blog posts et al scattered throughout the net, however I haven't found anything that seemed to actually work. For example, https://www.sonarsource.com/resources/cognitive-complexity/Weyuker and cyclomatic complexity seemed to be the only things from academia.
I'm thinking of languages such as Camelot or Hobbes. I'm sure that there are new things, I haven't followed the field actively for ~15 years.
It looks like Camelot is just a variant of SML. I couldn't really find any information about Hobbes (at least, what I found is probably not what you're talking about).
Can you mention the features of these languages that you think are worth looking into?
Both of them use linear types and something kinda vaguely like Rust move semantics (and, in the case of Hobbes, time & space sandboxes) to guarantee statically that code will execute within given time and/or space bounds. I also wrote (~20 years ago) an extension of Erlang that could do something similar (for small subsets of Erlang).
Granted, this doesn't fulfill all the requirements for being resilient, but if we manage to make it usable (let's say as usable as Rust), it would be a pretty large step forward.
Spoken like someone who doesn’t know how buildings, bridges, or dams are built.
For me the biggest joke is that these people call themselves "software engineers" in the first place.
You've been downvoted by people who value the word "engineer" in their job title even though they haven't earned it. The title is valued because real engineers have earned a great deal of respect through their diligence, and software "engineers" are for the most part parasites leeching off that respect earned by others.
Yes. I come from a country where 'engineer' (German: 'Ingenieur') is a title with strict requirements by law. You cannot just call yourself 'engineer' without the appropriate degree or certificates.
These people have been called 'software developers' or just 'programmers' back in the day. In fact, I'd argue that most commercial software development is more like plumbing than engineering.
Here's a hint: if you're writing control software for airplanes, medical devices or industrial plants, you're an engineer; if you're developing a UI frontend for a website, you're probably not.
What if it’s a UI/frontend for a website that manages a safety-critical system?
Then it might be :)
So what is the difference? Is it a Engineering degree from a credited Engineering school? Or do you just need a PE license?
I downvoted because this sanctification of engineering (guild, not practice) is as tedious as the idea that credentialing is a silver bullet for software quality.
I've worked on enough projects with "real engineers" to see that rigor varies significantly with the person. e.g. I've seen a dropout with more engineering rigor than a Waterloo grad (granted, this was at a company with a very selective interview process...)
In practice, you do whatever and then pay a licensed PE with an absolutely massive liability insurance policy to stamp your design. If something goes wrong, they take the fall and go sip drinks on a beach somewhere.
As usual, incentives rule everything around us.
The unfortunate reality is that everyone else has accepted that they are software engineers. That unfortunately trumps what "real" software engineers think.
Moving safely means moving slow, and that costs more - it's as simple as that. Plus it would also automatically disqualify like 80% of the workforce, due to the need for a proper formal education and certification, which would literally paralyze the industry.
And the current solution is not that different from what we have e.g. in civil engineering, the building codes for high-rise buildings are way more strict and complex than the regulations for a garden shed - in many places no permit is needed at all for it. And sometimes that means that that shad will fall in a catastrophic way, too.
In the US, professional engineering licenses tend to be required when you're signing off on things for regulators. Which is common in Civil Engineering and Architects; it's not actually that common with Mechanical Engineers and isn't even available any longer for Software.
But you're right, it generally requires you have a 4-year degree and have worked under a licensed engineer for some number of years.
In Europe it is. At least in potential life endangering environments. Maybe not for a web page for the govermenmt in order to apply for trivial documents such as the national ID; but for stuff like Healthcare backends, any self-called Engineer from the US couldn't even join that project without a Bachelor Degree.
I've worked in this area (in Europe) and that is nonsense. There is no requirement, other thans on the software itself. Bachelor-ish people are generally recruited, but not for their degree, but their skills (and low cost).
I'm from Spain. In lots of places you need a proper degree because of the accountability on civil causalitities.
It depends on what you mean with 'a lot of places', but coding up healthcare data systems or firmware for devices where you'll have no interaction with patients or hospital, it's the company that's liable.
Them hiring only people with degrees is almost certainly orthogonal to legal requirements in such cases.
Any sort of professional interaction with patients, sure, whole different ballgame.
Would that be so bad, in the grand scheme of things?
At the very least qualifications and licensing should be required for anything critical, or for any engineers writing software that does work that would otherwise be done by a legally liable licensed professional.
Components at every stage of the design and manufacturing of a commercial airplane are signed off by licensed engineers who are legally liable for negligence, and yet Boeing could outsource the 737 Max's software to $9/hr programmers in India. Obviously physical engineering failures still happen, but it's ridiculous that there's no liable professional responsible for software that keeps a plane full of hundreds of people from falling out of the sky.
If a licensed accountant was caught lying about hundreds of people stealing from their employer to the point of getting some sent to jail he could easily do jail time for negligence or perjury. Instead some seemingly anonymous dev wrote shoddy software that ruined hundreds of lives and the Post Office gets to just say that the computer did an oopsie.
When a dam or a bridge breaks it is game over.
When software breaks you restart and it's back to "working" order. It would be a different scenario if the computer would set itself on fire on failure.
Except when it's critical software in an aircraft, nuclear power plant, submarine, financial institution etc etc
Of course, or something as mundane as BIOS.
Computers, at its best existed for only 200 years. That's counting from the difference engine that Babbage never finished building, and if we are talking about the first modern computer that does contribute significantly in a way, that's around the time ENVAC and Z3 appeared, just around 80 years.
Structural Engineering existed for over 2000 years. With that comes with a lot of people died in natural disasters and advancing material science to improve structural integrity. But even the Japanese to this day still can't solve the earthquake and the problems the fallout makes.
I actually think it is. That's why real-time operating system and mission-critical hardware exists for human to fly in the air and explore the space, not counting a lot of time-sensitive industry robotics software that controls the actuators in real time as well.
That said, the strict requirements of real-time programming requires a lot of expertise such as CPU cycle counting, CPU slack reduction, timing requirements and choice of algorithm (such as EDF scheduling and real time memory allocators like TLSF, and you won't see jemalloc on embedded devices, right?), stack or heap memory allocation which also complicates the programming stuff, because removing malloc might free you from OOM but this means a lot of functions needs to accept extra parameters to state the output location, and that means a lot of normal software can't be used. (You can go for a hybrid approach by using arena allocation, but that still isn't a perfect solution)
As you see, even for soft real time engineering (which I actually mentioned so far, I don't know much about the real hardcore "hard real time engineering" though), the sheer complexity here already, means there's a lot of design decisions which simply just makes people stay away and just go for normal software engineering (but in the end, if everything is predicable its fine).
When Zuckerberg said that, he's referring to his startup mindset when Facebook was really just a startup in his day. In a highly competitive market environment, startups has to fight desperately for their own survival, even with lots of fundings and VC rounds.
Startup can move very fast and their agility is the only weapon against the old dogs. Now you've become one of the old dogs, you don't break things.
Also for much of that 2000 years, most bridges that weren't big landmarks were somewhat consumable. It's not uncommon in small towns in Europe to have the bridge from the first half of the twentieth century, built because the nineteenth century bridge washed away in a storm, and that bridge was built because the seventeenth century bridge fell down from lack of maintenance.
That wasn't the issue for the brige, but from centuries old idiotic people founding villages near a river or stray with names like "Villar del rio" (village of -by- the river). At least that's the case of Spain. Then floods happened really often, and everyone ranted on the supposed bad materials.
Even Facebook realized it was dumb and Mark Zuckerberg apologized to us all on stage at f8 a decade or so ago and said they were now going to say "move fast with stable infra", yet I still see tons of people fetishize and emulate the original.
Honestly though, FB wouldn't have won without the move fast and break things approach. But different techniques work at different scales of company.
You're doing super well if you're successful enough that some of your values need to be re-written.
And even then they didn't pivot to "don't break anything." They still are willing to break user workflows are quickly test new features that have a high risk of being rolled back. They key is that they are able to set more explicit tradeoffs on the risk they want to take relative the benefits and what scenarios they want to optimize. The same applies to all engineering disciplines - arguably managing requirements and tradeoffs to produce a working system is what makes something a work of engineering vs another discipline.
I think one of the issues is that software engineers are building on sand. Civil engineers have Newton and Einstein to fall back on.
It feels like there is no ground truth in computing, each layer has to assume the layer below is generating random errors and try to account for that, because humans are not perfect.
The laws of physics stay the same, technology is constantly changing.
yes, and instead of numerical values that can be corrected with straightforward mathematical techniques, you have contracts (API) that are either broken or not, and theres nothing you can do about it, i mean unless you want to write thousands of extra lines of code to account for every combination of the lower layer misbehaving
When you get trained as a Chartered Engineer/Surveyor/Accountant/Purchaser etc you get schooled in ethics, professionalism, whistle blowing, how to deal with being unfairly pressurised etc. Never mind Newton, professional standards would have really helped here.
This isn't a story about software engineering, it's a story about a shitty governmental department in a mediocre country not fixing something that everyone knows is broken, and threatening people with jail time instead.
Show me a similar story about stuff this bad happening at Google and I'll believe that there's something wrong with the field as practiced at its best.
the software bugged and said they commited fraud, thats still half the problem and is a good time to repeat "stop putting software in things". we still need 40 more years before software "engineers" grow up and start doing things even anything near correctly. the other half of the problem is those people who just want tech for every problem. they just want shiny buttons, regardless of if they work. i dont know why but everything that these companies like fujitsu, hp, dell, acer, asus, sony, samsung, etc are all complete shit. i would vote to remove the government if they told me theyre going to start using some fujitsu product to decide who should go to jail. i just noticed a $6000 vent hood where if you adjust the fan speed you cant toggle the light within the next few seconds. even lay people understand that there is no reason for this aside from what could only be reasonably understood as absolute shit engineering.
the problem is people asking idiots who cant program even a simple button, to program serious enterprise, government, and military systems.
The wider 'mood music' is also key here; the government was in the process of privatizing the Post Office i.e. so that they weren't the sole equity owner. Every decision (or non-decision, more appropriately) stemmed from that.
This is why the CEO was rewarded with a CBE and a cushy NHS post in 2019 when the scandal was in full swing. She was acting like a good girl for her political masters in managing the crisis, stringing it out long enough in the hope that everyone would forget
I’ll bite.
It’s that way because software is much much much more complex than that.
Building is a good example. One architect after university has enough knowledge and mental capability to design a building that won’t topple: There are discrete number of parameters to work on and fallible elements are identified and understood very well. There’s manageable amount of degrees of freedom that architect is working with. It’s also not very common for physics to get an annual update.
But when you go to funny-cat-pictures.com there are layers upon layers of complexity. There’s TCP, UDP, traffic encryption, indefinite number of hoops in routing, indefinite number of electronic devices with various quality, etc. Request comes happens from millions lines of code operating system and generation happens on a different millions lines of code of operating system with thousands of lines of codes of software code that interacts with databases, proxies, CDNs, load balancers and is templating, serving, translating, transpiling, compiling and serving while defending against adverse attacks, managing the cache, optimizing for network topology and requester rendering software. That changes every single day slightly.
In 99% the worst case is - your picture won’t load. But when you get to the more serious software development (in a way that it’s critical domain, not that “serve cat pictures” isn’t serious job, mind you) all of it is very very visible.
Every element is so unimaginably complex that it has literal tomes of knowledge written and published about it. Describing every element of transaction between end user and cat serving website would take hundreds of pages - and it would be different than other cat website.
Most of the building, bridges etc. was made either by one person or small group of people working on it.
Rarely software is written seldom, and many well known products have tens of thousands of inconceivable smart engineers and we still can make fun of their - not so rare - failures.
So yeah. Systems suck, organizations suck, people suck. But they suck in relatively safe but also complex environment.
For years software engineering was made fun of because it’s not real engineering. But when you look close enough the environment is orders of magnitude more difficult. Civil engineers might be offended but they can’t hold a candle to structures software is keeping straight.
Software is a bridge built on a raft, floating on an ocean, which tries to get shot down by armed pirates and semi-controlled by a cost-saving manager trying to look good in annual progress report.
Lol. Sounds like someone was very upset about being called "not a real engineer".
Because bits don‘t rot and while you need to build a new bridge now and then which will include your learnings from the old one, you never have to replace software because it can be copied ad infinitum.
Metaphorically speaking, bits do rot, this is an expression for software stopping to work as the environment changes over time: http://www.catb.org/jargon/html/B/bit-rot.html
Maybe that's survivor bias? Crappy organizations fail at infrastructure projects before they even get off the ground.
Conversly softer endeavors involving just money, people and services can operate for a long time on finger crossing.
Because those engineers are required to have rigorous qualifications. Please see my comment linked below
https://news.ycombinator.com/item?id=39014625
I always figure the major difference between software engineering and engineering physical things, is that software is virtual and fluid.
Once a physical thing is built or being manufactured, the cost of change is exponentially higher than with software.
So software engineers can experiment more frequently and can get away with low quality, as it can always be fixed later.
Of course, this same mentality can carry into industries where the stakes are life and death, which is where accountability is very much lacking.
Hi! Let me tell you about my wonderful country. Norway! I'm sure you've heard about it. The socialist utopia in Scandinavia! ;-)
The Braskereidfoss dam failed during a flood last fall. Luckily just a 'river dam' and not a reservoir dam. Still rather bad.
Oh, and during the same freak whether event, but different place in the country. The Randklev railway bridge failed over the river Lågen, just by Ringebu.
And during the last 8 years, we've had two road bridges collapse. One by Sjoa in 2016 (Perkolo bridge) and then 16 months ago Tretten bridge. We had the fantastic idea of building wooden bridge sfor road traffic. Here's a nice article on Tretten bridge: https://en.wikipedia.org/wiki/Tretten_Bridge
But yeah, not too many building collapses . Just a bunch of infrastructure collapse.
The biggest issue isn't the software glitch system. It's the legal system that threatened innocent people with prison for theft unless they admit they are guilty for crimes they didn't do and pay for damages they didn't do.
This case was brought to public attention and repairs were attempted only because it's huge and involved hundreds or thousands of people.
How many disparate cases there are, where people's lives are destroyed and innocents are rotting in jails, we have to ask?
Every skillful programmer knows that all software is crap.
Judges, salesman, and managers don't understand that.
A big problem is they don't want to do the work to understand that, which is the exact outlook the PO had...
"We need some software, ok let's get a big reputable company in to do it for us, we shouldn't get bogged down with all those horrible technical details"
Not wanting to defend the PO but it wasn't really their decision - it was a PFI (private finance initiative) foisted upon them by the Tory government of the day as one of their recurring "STOP BENEFIT FRAUD!" lunacies.
Wasn't it enacted under Blair?
Started in 1994 under the Tories, rolled out in 1999 under Labour as the reduced system after DSS withdrew.
Thanks
And that skillful programmer will fight with all power to avoid any kind of minimum standard and liability for crap software, continuing the cycle and abuse.
It's always spectrum from THE SOFTWARE IS PROVIDED “AS IS” to high-assurance methods used in aerospace and similar safety-critical fields.
The skillfull programmer may accept liability when you give him a verification team with a few PhDs, the ability to withhold signoffs, flexible deadlines etc. etc. Few are willing or required to pay for that. So they get a mystery box with a 90% chance of crap.
They do understand that. They do not care.
This is a somewhat Sith-like dealing with absolutes.
Laypeople generally understand that software may crap the bed in the sense of "the system is down, please wait, then try again". But few people have experienced subtle changes in stored data.
A judge looking into his document cloud may be ready to see a "sorry, not available right now" notice, but doesn't expect that some sinister program is, in the background, silently editing texts of his judgments and pronouncing people guilty when he intended to free them etc.
The problem with the Horizon scandal is in this sinister manipulation of data. It may also have been done by Fujitsu people themselves, in order to cover some tracks and tamper with evidence. This is a very untypical failure mode.
I think for the courts the issue is a bit more subtle. The question is, who's job is it to prove that the other person is wrong ("burden of proof")? Should it be the job of the prosecutor to prove that Intel's processor produces the right answer when an ADD instruction is executed? Or should it be the job of the defendant to show that Intel's processor doesn't produce the right answer? What about proving that the compiler produced binaries which faithfully represent the algorithm? What about Excel?
In our normal life, if a computer is doing the wrong thing, we don't start by assuming a broken compiler; we start by assuming that the new, not-well-tested code is probably broken.
It seems that in the UK before the 90's, the burden of proof was always on the prosecutor to prove almost everything about the system, which is kind of ridiculous. So they passed a law trying to fix it, but messed it up the other way, putting the entire burden of proof on the defendant, without giving them any real way to disprove it. (I mean, shouldn't "discovery" at least mean I can inspect the source code?)
A more balanced law would say that widely-used software with extensive test suites can generally be assumed to be working properly; but that custom-purpose software needs at least some level of evidence that it's correct, and that defendants have a right to inspect any software that's used against them in court for defects.
I disagree, the software glitch was the problem here.
We are supposed to be able to rely on computers to store and add numbers or report a system failure. This accounting software showed in black and white that some funds that the sub-postmasters were responsible for had gone missing.
What else was the legal system supposed to do? The broken software was simulating crime perfectly.
It wasn't though. If the post office enforcers had taken even a cursory look at the transactions around the 'thefts', they would have noticed obvious errors. One of the bugs basically just duplicated a close-of-day transaction, sometimes many times. This would obviously have looked like an error, it would be a stupid way to commit fraud. It was obvious that the Post Office just preferred to extort money out of the postmasters as opposed to actually work out what was going on (as evidenced by the bonuses for successful payments or convictions)
Except it wasn't; the main problem was how the PO was handling it. ICL/Fujitsu were aware of near-identical bugs in an earlier project[1], and PO employees omitted parts of an audit from 2004 that described similar issues as well[2]
It all goes back to ICL/Fujitsu and the PO being aware of the issue and withholding the information from anyone not already "in the know"; lawyers, judges, changing witness statements to hide incriminating evidence, etc.
[1]: https://archive.vn/ah6K2
[2]: https://archive.vn/fXqx2
I think if a handful of people had been prosecuted then it would still be an outrage but understandable. But this was hundreds of cases. I think the legal system has some responsibility for not maybe thinking "Huh, what are the chances of so many previously law abiding people all committing the same crime in the same time period?".
Absolutely wrong. Mistakes happen. Bugs, fat fingers, laziness, hangovers--whether by human or machine, errors occur. The legal system was supposed to uncover the facts. Because of the Post Office coverup, the judges were told " no, there are no bugs. No, nobody has remote access to these terminals. Yes, the only possible way these figures could turn up is through theft." This despite the fact that at least one Post Office inspector explicitly wrote in a report that there was no evidence of theft. The legal system failed to penetrate the veil of lies and find the truth. That's a legal systemic failure.
Agreed. The bug is a footnote.
The legal system failed these people horribly. And the people who pursued these cases with no direct evidence whatsoever should suffer jail time.
It's correct that this is ultimately a failure of the legal system.
However, the role of software here must not be minimized. Software makes it easier than ever to diffuse responsibility and create opaque processes that leave the least powerful people at the bottom of the hierarchy holding the bag. By rigidly encoding flawed assumptions and executing them without question, software is the ultimate realizer of our Kafkaesque nightmares.
It's not entirely unlike Therac-25, including the deaths (albeit more indirectly caused in this case.). There was a certain element of operator error, but that doesn't excuse the faulty programming.
No argument that the software bears fault, too.
However, when you accuse someone of stealing money, you should have to prove that they stole the money. This isn't some invisible crime. There should need to be evidence that the stolen money went into their account, got spent to buy something, got pulled from the till on camera, got transferred to Bitcoin--something.
The fact that all these people got convicted with no evidence that the money was ever in their possession is a gigantic legal problem.
It also doesn't help that the lawyers in the case fabricated evidence and covered up the issues with Horizon to secure the convictions. The people doing this need to spend some time in one of the overcrowded prison cells they sent the sub-postmasters to.
Exactly, and by the number of errors among the death-row/lifelong convictions - which one would presume are the most sensitive and carefully reviewed ones - these numbers are HUGE, especially in the US.
There are not as many that involve jail, but there are a variety that involve ruined lives and even bankruptcy. A recent example is the Phoenix pay system that was used to pay Canadian federal government employees and contractors. https://en.wikipedia.org/wiki/Phoenix_pay_system And I agree that the problem wasn't the glitches. I personally think it was the corporate governance that failed, not the software development and debugging process. The legal system was complicit and enlarged the overall consequences, but the but for test tells me that it was the poor corporate governance that was at fault for a root cause.
This is not well researched at all. Anything that depends on “if everyone acted intelligently and good will” is broken. If you’re interested in academia I strongly recommend you go back and look at designing systems that function in the face of incompetence and even adversaries.
You can have your opinion, but I have mine after reading lots of research papers. Obviously, there are ways that things can still be improved. But even the design of systems that function in the face of incompetence and even adversaries will face intractable problems of incompetent leadership and governance. It's one thing to say "this is how it should be done" and it's another to say that we've managed to get people to do it the way it should be done. No matter how much improvement we see on ideas for how it should be done, we're still no closer to solving the second part, despite much effort. In the end, there will always be incompetent leaders in charge at some point for some project somewhere, and more often than we'd prefer. You can lead a horse to water, you can't make it drink.
It’s not an opinion and you can cop out to lame “what if the government is hitler” arguments, but resilient systems are definitely an engineering/science/math problem.
The entire field of cryptography wouldn’t even exist if the boundary of research ended at “good actors”.
And yet we still have people out there trying to create their own cryptography when the golden rule is to not roll your own crypto. For whatever reason, best practices don't get followed 100% of the time, even if they exist. For cryptography, the situation is better than most other domains. I think we're having different conversations here. You seem to be having a technical conversation. I'm having a sociotechnical conversations within the context of organizations and their workers and managers. I'm seeing you discuss technical solutions to sociotechnical issues, which is not what I am discussing. Even when the technical ideas are perfect, organizations still need to implement the ideas. That implementation tends to not follow allegedly perfect specifications for many reasons.
But you are entitled to your opinion and that's fine. We can agree to disagree, nothing wrong with that.
That’s the point. Cryptography is significantly better precisely because the research effort has gone into systems that are hard for people to fuck up.
Just look at the fight was to get everyone to agree that the model should be, “everything including the algorithm should be public, except for the key”. That’s a socioeconomic argument.
And that’s why making safe systems where mistakes are protected against is a critical area of research.
Rust is popular because it protects against whole classes of bugs, despite it being no faster than C/C++.
Surely the field of cryptography relies on conscientious and competent actors developing solutions that are robust in the face of malicious actors.
I am skeptical that there are software development practices that will allow me to hire a team of feckless incompetents and have them develop quality software. If you know of any I'm interested to hear about them.
Are there software dev processes that operate well in the face of incompetent or malicious people? I can think of ways to mitigate the damage, but at the end of the day surely you need some competent and conscientious people on your project.
I could give you a huge list but mostly it is computer programmers having the specification constantly changed by management and stakeholders. Even bad software developers can eventually make the software functional, even good software developers can write bad software if the organisation is going out of its way to break everything they do.
There is definitely a sensemaking process where organizations have to figure out what it is that they really need. But I wouldn't fault organizations for that. Most startups go through the same process trying to figure out product-market fit and you don't see those startups blaming their customers for not knowing what they want.
Most startups die doing this, most large orgs just set first to loads of money and make their software buggy, they already have product market fit.
If they had product market fit, they wouldn't have major feature change requests that turn the product upside down and inside out. But either way, startups don't blame their customers for being unable to meet their customers' needs. I think it's poor practice to blame organizations for being unable to meet organizational needs, especially when we already know that organizations and users don't know how to conceptualize software requirements well, let alone create software.
I constantly see large companies trying to reinvent the wheel in really haphazard ways, what has been your experience? Mine is I have been contracting in roughly 20 large companies since 2010 and before that I worked at Yahoo! and others.
But I've had experience too. My career started in a national telecom where I was part of a skunkworks team to develop internal applications because the organization was fed up with the IT department delivering solutions that didn't fit their needs. We approached issues differently from the ground up. Software developers gathered requirements on their own by job shadowing employees, and then delivered MVPs within days, which were then constantly iterated to finally solve the real problems. The software developers had complete control over what was made and why with zero change management or approval processes. We also had complete control over what technologies we used to make our apps. We mostly used .NET, but we also did some Java Swing and Ruby on Rails, and of course everything also used Javascript.
Our relatively small skunkworks team developed apps that changed the end-to-end solution delivery processes for major business units, both consumer and business sectors, saved the company 8 digits in opex and capex each year, and won an international award for "Best Support Team" (the Stevies, sort of known as the Oscars of the business world). Our greatest feat that year that enabled us to win the award was keeping the company afloat during a four-month union labour dispute by improvising solutions that automated everything in sight. At the end of the labour dispute, the CEO send a company-wide email about how important we were, awarded us this made-up award "Holding the Fort". When the union came back to work, we trained them how to use the new tools, but we unfortunately were also enablers of heavy downsizing, which I always disliked. Some of these people were hardworking people who did nothing wrong and followed the rules. Many of them were elderly and had little chance to go back to school to get new skills (we're talking 50-year-old clerical workers, etc). It drastically changed how I thought about corporate software work. That being said, we were all young cowboys, and it was possibly the best team I've ever experienced in my life.
I experienced the absolute opposite in many ways when I worked overseas for IBM, managing projects that spanned the Asia Pacific. I was the go-to PM many of their mission-critical infrastructure projects, including helping with datacenter migration from Japan to Australia, necessitated by the 2011 Fukushima earthquake and tsunami. I also experienced a middle ground as a venue technology manager for the Vancouver 2010 Olympics. Lots of pressure and set processes, but a lot of extremely competent people too.
Look, I'm not doubting your experience, but I've had mine too, which shaped my views, just as I'm sure that your experiences have shaped yours. We can agree to disagree, nothing wrong with that.
TLDR
It would be really interesting to study The Post Office in particular. Something about this organisation attracts some very sour people. Or perhaps they weren’t always like this but have become so in my adult lifetime over the last few decades?
In the early 2000s there was a TV ad campaign for “The People’s Post Office” where the sub-postmaster role was played by John Henshaw, a character actor known for playing hard bastards and, in his most recent role on The Cops, an exploitative bent copper from Bradford. A strange but apt piece of casting.
Low salary coupled with customer facing job creates sour people.
Imagine you are talking everyday to weird, bitter, arrogant, rude customers for years - even with high salary you will be not so positive.
Sounds relatable. I also think there may be some selection effect going on: if you didn't have a better alternative than a low paying customer facing job you may be already have a little sourness to start with
Don‘t think like that. Medicine is a complicated field as well, but after evidence based medicine was invented death rates were going down. It is certainly possible to find and enforce lists of best practices.
The flip side of that is that despite how awesome modern medicine is, we still can't get people to get vaccinated (for free!) and put a mask on their face during an infectious disease emergency.
You will find that people just suck no matter what field you work in.
Reminds me of a saying from a dear friend: „Humanity is a boat with a big hole and we constantly need to shovel out water to keep it from sinking.“
What is "IS" in this context? I did some Operations Research modules at uni and thoroughly enjoyed it, but it had nothing to say about why projects didn't work.
Information systems at a guess
Were there any particular themes that stood out as you when understanding the causes of IT incompetence? I'm hoping there's a less depressing answer than "some people will never care".
I'd say that the biggest thing that stood out to me was what people call communities of practice. Each professional community has their own knowledge and best practices, but these don't generalize well to other professions. So you have boundary spanners who can bridge two professional communities by being good at both, but those types of experts are rare. Also, the amount of experience and learning required to become an expert in two professional communities, rather than just one, requires such a large amount of time that most people can't be bothered to put in the effort.
It's enough work to do one's own job well already. The go getters can of course do it as a natural course of action, but they are outliers. There are a limited number of job opportunities that require developing this experience on the job, so there are limited opportunities to become a good boundary spanner in the first place. Furthermore, people aren't naturally interested in multiple disparate subjects. True renaissance folks like Leonardo da Vinci who are interested in becoming experts in both art and engineering are rare. Elon Musk types that will try to dive deep into multiple unrelated areas are rare. All of this adds up to boundary spanners being rare. As such, leaders who can develop cultures that handle multiple areas simultaneously (see the founders of Flexport who understand both tech and shipping logistics) are rare.
In short, expertise is hard to develop, expertise in multiple areas is rare, and coordination between two areas that understand different worlds is difficult without boundary spanners. As a result, you get failures. See any software engineer who creates a startup to try to revolutionize some old-school industry and then fails dramatically because they don't understand the problems that actually need to be solved. The outliers will figure it out, but not everyone can become an outlier due to reasons discussed, among other reasons.
I did a quick look around for some blogs about this early paper to digital transaction register migration, I didn't see much for such a major case.
Just a few basic things that wasn't included, no audit/transaction logs, transactions modified by tech support to keep the system running.
Operators couldn't prove they didn't steal funds, and the british law that computers systems are to be trusted as fact, pretty much convicted them all.
This isn't a software error and it is pretty clear.
There are two problems here. First, the branch manager is responsible for calculated shortfalls, even if the software is broken. Second, there is no way to overturn broken software. Third, the prosecutors are overzealous in trying to shut these people up and convict them straight away.
The software itself was just a convenient medium for abuse of authority.
Most of the time issues like these are from companies that pay like $50k salary for senior positions.
So everything checks out.
Some people suck at management, some people suck at coding, and some people suck at self-awareness.
I can think of two things that I believe would make a difference in any LargeCorp: First, a standarized way to visualise and execute business logic that allows developers and management to reason together. (The no-code movement is on the right track in fostering a common way to interface with code). And second, a responsible editor for each piece of code.
I think a key factor is that software historically hasn't enjoyed industrialisation to the degree of hardware (or construction for that matter). I can buy a standardized CPU of millions of transistors and integrate it into a standardized motherboard with just a snap. We have managed to standardize software up to the OS level, but after that it's up to the developer and her shortcomings.
https://www.codevalley.com/ does some interesting work.
Our current system of the world quite strongly disincentivises honesty and integrity - rather, being a bombastic charlatan with a flexible relationship with the truth will get you anywhere.
Because we’re not professionals. We don’t profess anything and do not have standards. There is no regulation for our industry and no IT association that can strike you off from practicing this craft. There is no accountability, and when there is no accountability, people naturally regress to either lazy or exciting behaviours.
People who are selfless, humble, intelligent, competent AND be people of integrity, are never the people who win the contract for any information system though.
Hence why we need to keep things simple. The human part will never change, or at least change at rate that will take many generations to improve if you are an optimist. I actually prefer things to be Hybrid rather than all-in digital.