How a software glitch at the UK Post Office ruined lives

This is a new summary of the real-life case that inspired me to go do PhD studies in information systems. I wanted to understand why IT incompetence still could exist to this degree in this day and age after all the knowledge the world had developed about good IT and software practices over decades of experience. Of course, I quickly found out that IS research had already figured most of this out, and that perhaps, people were just people and crappy organizations were just crappy organizations, and perhaps that's something that will never change because bell curve distributions exist for almost everything.

If all people could be selfless, humble, intelligent, competent at everything they ever try to do, AND be people of integrity, I'm sure many of the world's problems would disappear, in addition to just IT problems. People suck, organizations suck, societies suck, and we all suck in our own way. I think organizational research like I'm trying to do still tries to point us towards a better path nevertheless, but sometimes it's a throw your hands in the air thing. Some people will never care.

Needless to say, I'm no longer focused on researching this topic, as it seems really well-researched already. But it's still interesting to see that this particular example still pops up in a news report now and then. There are still plenty of other big examples that pop up every year, but this one seems to have staying power to stay in the news media.

I quickly found out that IS research had already figured most of this out > People suck, organizations suck, societies suck

and yet you don't see very many buildings collapse, bridges fail, and damns flood down valley. The idea is that there should be liability assigned to important systems, for which this liability makes the onus on the creator/owner to build in safe guards, checks or other protections to prevent disasters.

Why this isn't applied to software engineering is a whole nother story, but i think it probably should. Move fast and break things is not something i wanted to hear tbh.

There is something about how buildings, bridges, and damns are tangible and therefore easier for people to conceptualize and appreciate. There is a natural literacy there in that you don't have to understand what goes into making a bridge to understand the consequences of having a bad bridge.

Meanwhile, software's inner workings are encapsulated so that they're not self-evident as to consequences if things go awry. Furthermore, software is inconsistent in how it works, and this trains people to think that software is just naturally glitchy, but the glitches aren't a big deal. See the joke about different types of engineers also: https://www.reddit.com/r/Jokes/comments/pqr8t3/four_engineer...

The mechanical engineer says: “It’s a broken starter”

The electrical engineer says: “Dead battery”

The chemical engineer says: “Impurities in the gasoline”

The IT engineer says: “Hey guys, I have an idea, how about we all get out of the car and get back in”

If that's what the IT guy recommends, how do you get users, never mind corporate managers and executives, to take software system quality seriously? Obviously, this isn't what a proper software guy will think, the proper software guy knows that things are a bit more complex than that. But this is what the software guy communicates to non-technical people. And only a few rare non-technical people will tell the software guy, "OK, look, just tell me what's really going on here, how deep the problem is, and what we need to do to fix it, no matter what it takes." Most people won't have the time for that because fact of the matter is that the system satisfices needs until it doesn't, and then it's too late.

SW engineering is still in its infancy compared to other disciplines that have 100s of years of knowledge accumulated and built into their education process. SW education is different country to country and institution to institution. In no other engineering discipline would it be tolerated that you could be self educated. The lack of a standards and governing body that you would be expected to be accredited to is a major gap.

In Europe in order to be called yourself an Engineer you need a Bachelor's Degree as a mininum. If not, it's illegal to work as an Engineer because being one requires civil accountability on casualities. You might work as one only if a colleague at work signs up your project. And, yet, depending on the case your colleague will refuse because he or she would be the one to sue in case of risks.

Not everywhere in Europe. Not in the UK for example.

There is legal accountability though, and a much better understanding of who can be held to blame for what. Software suppliers are pretty much immune to consequences of carelessness.

However, I think the real problem is that software is far more complex than a bridge or a building. IT system have complex hardware (far more complex than any mechanical device) running even more complex software (counting all the layers from OS up). On top of that people (meaning users/buyers) have no idea how to evaluate safety/reliability/security and mostly seem to regard it as a nice to have to be traded off against other nice to haves, not an essential baseline.

Once its implement everyone assumes the computer must be right, and acts accordingly. The presumption that computers are right is even enshrined in UK law (the intent was to stop people getting out of things like speeding tickets by claiming speed cameras where faulty) but I think everyone has come across situations where the final word on a dispute was "the computer says so".

How about the safe code in avionics/nuclear sites, ADA/Spark, Misra C? Not magic or as safe-proof as a physical engineering where it's far easier to guess how a system will behave, OFC, but it's a good start.

Yes, it gets done when necessary which shows we can do it.

It does not get done everywhere because it is not a high enough priority.

I'm pretty sure that Bologna abolished 'Engineer' titles.

In Germany, the protected title Ingenieur is still around but most bachelor degrees in a technical field grant you the right to use it. Whether or not you are one doesn't change the liability situation (although you can't perform certain works at all without). It's not really relevant in software development.

The English term "Engineer" can be used by anybody though.

I think not, because the difference between being an actual one and not it's to spend several years in jail (you and your employer) in case of life or health related damages and harms.

Not in Sweden. There are a lot of creatively named engineers around here.

This is true in the US in regulated industries as well

Belief in software glitchiness should have saved these people from erroneous prosecution, but instead what we see is people trusted the computers to be infallible.

No they trusted management to be infallible, even when management couldn't redo the calculations of the software manually.

This sort of thing is inevitable once a company is controlled by people who don't understand what the company does.

Management replaced know-how with software, and it mostly worked. Then they probably fired the few remaining people that had the know-how. Then, when it became painfully clear something was wrong, management chose to blame anyone, including to the point of pursuing prison sentences, rather than admitting they didn't know their business and someone made a mistake. They chose to destroy people's lives rather than admit they couldn't do the ONE job they claim they can.

This sort of thing is inevitable once a company is controlled by people who don't understand what the company does.

I think we should examine why this doesn't happen with structural engineering firms.

For the third time on this thread I'm going to say it is because you have to have proper qualifications to do that for a living.

Ok but, for example, Boeing has recently come under fire for poor quality control, which people are blaming on management decisions taking power away from technical experts.

And the enormous Millennium Tower in San Francisco is sinking and tilting.

So it's not unique to software.

I previously worked in factory automation back when computers were first getting introduced into factories. After I spent a day tracking down and correcting some software glitch that was causing a problem, the factory manager would ask what the problem had been. I was instructed to say "bad valve, we replaced it," which I noticed they found far more satisfying to hear than "software writing to wrong register."

In one of the books I got on technology (early '00s?) there was an anecdote about a computer at a military facility crashing each night (probably in the 70s or 80s). They brought in a consultant to look at it and after an extended period of time looking for the problem in the code, the consultant was staying late in the data center wracking his brain over the problem.

At 10pm, a janitor came in, unplugged the computer, and plugged in his floor clearer, cleaned the floor, and then when done, unplugged the floor cleaner, and plugged the computer back in.

The consultant then suggested using the outlet on the other wall that had a free outlet (and got a 'difficult to unplug' cover for the outlet with the computer).

The next night he stayed late again and the janitor used the outlet on the other wall.

The consultant then told management that the problem had been solved and it was a buffer problem.

There's a book called The Devouring Fungus: Tales of the Computer Age [1] that features a story like this, was that the book?

Apparently there are tons of these stories [2] [3], many of them probably urban legends.

But there was one last year that was definitely real [4], when a cleaner removed power to a freezer holding decades of samples, apparently because they were annoyed by an alarm sound it was making.

[1] https://www.goodreads.com/book/show/3227607-the-devouring-fu...

[2] https://old.reddit.com/r/talesfromtechsupport/comments/5yrs1...

[3] https://www.logikalsolutions.com/wordpress/information-techn...

[4] https://www.theguardian.com/us-news/2023/jun/27/cleaner-coll...

That was the book. It's been a few years since I've dug it out... and looking it up on Internet Archive...

https://archive.org/details/devouringfungust00jenn/page/96/m... (pages 97 to 98) is the proper telling of the story.

The IT engineer says: “Hey guys, I have an idea, how about we all get out of the car and get back in”.

I have worked with ECU programming. When test drivers got stuck and called me it was the first thing I told them. "Restart the vehicle". If that didn't work, "disconnect the battery and wait 5 minutes".

They quickly learned.

1. This is garbage, we should do better.

2. Early implementations of Britain's TPWS on railways are exactly like this. Suppose you're pulling into a terminus station and, for whatever reason, you happen to stop such that your train's sensors are right on top of the TPWS "toast rack" (basically think radio transmitter). When the next driver turns the train on, it can see a TPWS transmission. Now, it knows perfectly well it isn't moving, so we're not in some terrifying near-death scenario, and it could let you, a human train driver, sort this out by, you know, not crashing the train into the station buffers. Nope, in v1.0 the firmware just considers starting in this situation to be a fatal error and won't let you move at all, the only authorised solution is to get rid of any passengers who've boarded your train, switch the train into it's (unsafe for passenger service) maintenance mode, drive it away from the beacons a few yards, stop, turn off the power, and then reboot the computer where it now can't see a troubling TPWS transmitter and now it's a working train again.

This happened to us with a first model year vehicle multiple times until the dealership received the necessary software upgrades. It was quite frustrating to be stuck at a busy fuel pump for 10 minutes until the software systems restarted properly.

The IT engineer says: “Hey guys, I have an idea, how about we all get out of the car and get back in”

"The IT person says". IT is not an engineering discipline (nor is software development in most cases), although they really should be.

And "turn it off and on again" is the canonical "joke".

Software is more like blueprints/processes for manufacturing. When something doesn't work right it's a yield rate issue. Oops the process failed, try it again.

Software can, and should, also be helpfully smarter. Where there's a chance for invalid inputs, they should be validated. That would help with errors like:

* Authentication failed, check your account credentials.

* Could not connect to remote server.

* Invalid data input, failed to parse around byte/octet 1,234,666, near line 56789 (printable filtered) text: 'Invalid String Example Here'

Yes, those are my take on 'software' versions of the starter, battery, and impurity.

Lots of well written software will do this, mostly good compilers for software.

Poorly written software, often hides these errors if it even collects them at all. PHBs and other less professional people seem allergic to thinking, for them it's easier to call in a specialist or just not do it with the broken tool.

This is a British scandal, and we’re quite capable of avoiding consequences for malfeasance in every sector of our economy, not just IT thank you very much.

The Grenfell Tower fire comes to mind, as someone was mentioning above how this sort of stuff rarely happens when it comes to buildings and the like. As far as I know there were no real consequences for the people that were directly responsible for that tragedy.

Not only no consequences, it looks like they're trying to cash in by rinsing the occupants for the cost to remove the cladding on similar buildings!

They're protecting property owners at every step. Why haven't they condemned large/tall buildings with inadaquate stairwells for fire evacuation (in fact, given the timeline of the fire, Grenfell's single staircase would have been amply adaquate to evacuate the building before people started dying)? Because that would totally hose the people who own such buildings. And why aren't they requiring large buildings to have adaquate staircases? Because that would eat into profitable floorspace. The UK prioritizes property owner profit.

Why haven't they condemned large/tall buildings with inadaquate stairwells for fire evacuation

They kinda have - check out the cladding scandal [1]. It's pretty major, a lot of properties are essentially unmortgageable until they get inspected to make sure they're safe.

[1] https://en.wikipedia.org/wiki/United_Kingdom_cladding_crisis

>The UK prioritizes property owner profit.

So the US didn't fall far from the tree. It's just like 'daddy'. <wipes away imperial tear>

It was deregulation in my mind that caused Grenfell.

You used to have to get your design cleared by a Local Authority Buildings Inspector. Now you can opt out and get a private company to do it!

Grenfell revealed loads of failings.

Some were deregulation, but some were outright deliberate fraud, like when Kingspan and Celotex scammed the test process and produced misleading safety documents to get their flammable insulation used where it wasn't legal.

And local authorities don't come out of it well either - the building was owned and maintained by a local authority, they hired the architects, the design-and-build contractor, and the guy whose job it was to carry out fire risk assessments. And they chose a bidding process that drove the price so low the bidders who wanted to use non-flammable cladding all dropped out.

Definitely need to stricter regulation though - the idea we could trust the construction industry to use flammable cladding safely has proven false. Flammable cladding should be banned on tall buildings all together.

I think we are in agreement. From what I remember, Kingspan changed their formulation and didn't repeat the fire testing.

Again, how is it you are allowed to do your own testing, rather than submit samples regularly to a test house? Or perhaps products on higher risk buildings should be spot checked. A sample from the jobsite itself could be tested.

Self- regulation is a joke.

Well, they did submit samples to a test house - the Buildings Research Establishment - for fire safety testing. And they got safety certifications from independent bodies like LABC.

Just you know, it turns out while the test rigs were being built, the BRE might have looked the other way while someone put fireproof magnesium oxide boards over the temperature sensors.

And when the manufacturer sent 'all the test results' to LABC for the certification, they didn't include the tests that ended in "a raging inferno". You see, that test was terminated early, so it wasn't a completed test. And when the manufacturer got the certificate granted, which included a bunch of caveats about how the material specifically had to be used, they immediately quoted it in their marketing materials without any of those caveats.

Of course all this fraud was done by some 18 year old junior employee, who was told it was 'industry standard behaviour' by a mid-level employee (who has conveniently since died) and the CEO had 'absolutely no idea' this was going on right under his nose.

Exactly. And go take a swim in a river if you want to experience actual consequences of the mismanagement of our water infrastructure.

There is still hope for Grenfell prosecutions after the inquest, but justice repeatedly delayed for no good reason is justice denied, obviously.

This is a British scandal,

Cant stop laughing. For some strange reason this sentence reminded me of Yes Minister.

computer says no

It's easier for software to have a catastrophic failure than a building, bridge, or dam. It's not really a fair comparison when those are like easy mode compared to software.

I don't think it's "easy mode" and more that they're just very different domains.

As time goes on, I feel more and more strongly that we shouldn't compare software engineering with other types of engineering at all. My younger brother works in the structural engineering space and the types of issues he's faced across different roles is very different to the types of issues I face in software.

Building design and construction have to handle the physical world and dealing with the realities of physically building the structure. Once it's built, however, the occupants don't make drastic changes to the building itself. In comparison, something like Facebook or Twitter needed to change drastically just to remain usable as they grew. "Just stop adding users" isn't a sensible solution to the problem.

Just to be clear that I do not excuse the, frankly, shit software design of Horizon and the fucking appalling behaviour of the Post Office throughout this scandal. I do think, however, that comparing software engineering the other types of engineering does a disservice to all and doesn't take us closer to actually improving the craft.

I've got ~15 years of software development under my belt. And I've also experienced a 2ish year stint where I was exposed to a lot of structural engineering.

The structural engineers take a piece of steel out into the desert and they dump sand on it until it collapses. Then they divide whatever weight the sand was by 3 and write it down into a book which is then published for all the other structural engineers. This type of steel with this type of configuration with this length can hold up to this much weight.

Meanwhile, in software engineering the limiting factor that prevents collapse is often "these sets of constraints and requirements are too complicated for the team to keep track of in their heads in this messy codebase". Not only do we not have a way to test this and publish the results, it's not even obvious what exactly you're even trying to measure.

[Cyclomatic complexity isn't very well correlated to defects (iirc lines of code is better correlated). Design patterns, best practices, and code smells are all just poetry that we use to story tell our ways out of blame. And Weyuker's 9 Properties are at best a waste of time. There is currently no way to measure bad code that fails under load until it does so in production.]

There is currently no way to measure bad code that fails under load until it does so in production.

FWIW, techniques to perform such measures have existed for ~30 years in PL research (some of Rust's type system comes from this kind of research), but I don't know of any industrial code that makes use of it.

Really? I would love a link to the research then.

So far the only[] thing I've found was Weyuker's 9 Properties from measurement theory, and that did not seem particularly compelling.

Now, I get that linear/affine types and dependent types (and theorem provers, etc) can be used to prove that you're following the spec. But this doesn't prove that the code is easily comprehended by the people interacting with it. Code that is provably correct (even in the event where the spec is actually exactly what you want) can still fail because it's too complicated for anyone to modify it anymore.

For example, the function composition operator type in agda is kind of intimidating:

  _∘_ : ∀ {a b c}
        {A : Set a} {B : A → Set b} {C : {x : A} → B x → Set c} →
        (∀ {x} (y : B x) → C y) → (g : (x : A) → B x) →
        ((x : A) → C (g x))

[] - There are also random blog posts et al scattered throughout the net, however I haven't found anything that seemed to actually work. For example, https://www.sonarsource.com/resources/cognitive-complexity/

Weyuker and cyclomatic complexity seemed to be the only things from academia.

I'm thinking of languages such as Camelot or Hobbes. I'm sure that there are new things, I haven't followed the field actively for ~15 years.

It looks like Camelot is just a variant of SML. I couldn't really find any information about Hobbes (at least, what I found is probably not what you're talking about).

Can you mention the features of these languages that you think are worth looking into?

Both of them use linear types and something kinda vaguely like Rust move semantics (and, in the case of Hobbes, time & space sandboxes) to guarantee statically that code will execute within given time and/or space bounds. I also wrote (~20 years ago) an extension of Erlang that could do something similar (for small subsets of Erlang).

Granted, this doesn't fulfill all the requirements for being resilient, but if we manage to make it usable (let's say as usable as Rust), it would be a pretty large step forward.

Spoken like someone who doesn’t know how buildings, bridges, or dams are built.

For me the biggest joke is that these people call themselves "software engineers" in the first place.

You've been downvoted by people who value the word "engineer" in their job title even though they haven't earned it. The title is valued because real engineers have earned a great deal of respect through their diligence, and software "engineers" are for the most part parasites leeching off that respect earned by others.

Yes. I come from a country where 'engineer' (German: 'Ingenieur') is a title with strict requirements by law. You cannot just call yourself 'engineer' without the appropriate degree or certificates.

These people have been called 'software developers' or just 'programmers' back in the day. In fact, I'd argue that most commercial software development is more like plumbing than engineering.

Here's a hint: if you're writing control software for airplanes, medical devices or industrial plants, you're an engineer; if you're developing a UI frontend for a website, you're probably not.

What if it’s a UI/frontend for a website that manages a safety-critical system?

Then it might be :)

So what is the difference? Is it a Engineering degree from a credited Engineering school? Or do you just need a PE license?

I downvoted because this sanctification of engineering (guild, not practice) is as tedious as the idea that credentialing is a silver bullet for software quality.

I've worked on enough projects with "real engineers" to see that rigor varies significantly with the person. e.g. I've seen a dropout with more engineering rigor than a Waterloo grad (granted, this was at a company with a very selective interview process...)

In practice, you do whatever and then pay a licensed PE with an absolutely massive liability insurance policy to stamp your design. If something goes wrong, they take the fall and go sip drinks on a beach somewhere.

As usual, incentives rule everything around us.

The unfortunate reality is that everyone else has accepted that they are software engineers. That unfortunately trumps what "real" software engineers think.

Move fast and break things is not something i wanted to hear tbh.

Moving safely means moving slow, and that costs more - it's as simple as that. Plus it would also automatically disqualify like 80% of the workforce, due to the need for a proper formal education and certification, which would literally paralyze the industry.

And the current solution is not that different from what we have e.g. in civil engineering, the building codes for high-rise buildings are way more strict and complex than the regulations for a garden shed - in many places no permit is needed at all for it. And sometimes that means that that shad will fall in a catastrophic way, too.

In the US, professional engineering licenses tend to be required when you're signing off on things for regulators. Which is common in Civil Engineering and Architects; it's not actually that common with Mechanical Engineers and isn't even available any longer for Software.

But you're right, it generally requires you have a 4-year degree and have worked under a licensed engineer for some number of years.

In Europe it is. At least in potential life endangering environments. Maybe not for a web page for the govermenmt in order to apply for trivial documents such as the national ID; but for stuff like Healthcare backends, any self-called Engineer from the US couldn't even join that project without a Bachelor Degree.

I've worked in this area (in Europe) and that is nonsense. There is no requirement, other thans on the software itself. Bachelor-ish people are generally recruited, but not for their degree, but their skills (and low cost).

I'm from Spain. In lots of places you need a proper degree because of the accountability on civil causalitities.

It depends on what you mean with 'a lot of places', but coding up healthcare data systems or firmware for devices where you'll have no interaction with patients or hospital, it's the company that's liable.

Them hiring only people with degrees is almost certainly orthogonal to legal requirements in such cases.

Any sort of professional interaction with patients, sure, whole different ballgame.

Moving safely means moving slow, and that costs more - it's as simple as that. Plus it would also automatically disqualify like 80% of the workforce, due to the need for a proper formal education and certification, which would literally paralyze the industry.

Would that be so bad, in the grand scheme of things?

At the very least qualifications and licensing should be required for anything critical, or for any engineers writing software that does work that would otherwise be done by a legally liable licensed professional.

Components at every stage of the design and manufacturing of a commercial airplane are signed off by licensed engineers who are legally liable for negligence, and yet Boeing could outsource the 737 Max's software to $9/hr programmers in India. Obviously physical engineering failures still happen, but it's ridiculous that there's no liable professional responsible for software that keeps a plane full of hundreds of people from falling out of the sky.

If a licensed accountant was caught lying about hundreds of people stealing from their employer to the point of getting some sent to jail he could easily do jail time for negligence or perjury. Instead some seemingly anonymous dev wrote shoddy software that ruined hundreds of lives and the Post Office gets to just say that the computer did an oopsie.

When a dam or a bridge breaks it is game over.

When software breaks you restart and it's back to "working" order. It would be a different scenario if the computer would set itself on fire on failure.

Except when it's critical software in an aircraft, nuclear power plant, submarine, financial institution etc etc

Of course, or something as mundane as BIOS.

and yet you don't see very many buildings collapse, bridges fail, and damns flood down valley

Computers, at its best existed for only 200 years. That's counting from the difference engine that Babbage never finished building, and if we are talking about the first modern computer that does contribute significantly in a way, that's around the time ENVAC and Z3 appeared, just around 80 years.

Structural Engineering existed for over 2000 years. With that comes with a lot of people died in natural disasters and advancing material science to improve structural integrity. But even the Japanese to this day still can't solve the earthquake and the problems the fallout makes.

Why this isn't applied to software engineering is a whole nother story

I actually think it is. That's why real-time operating system and mission-critical hardware exists for human to fly in the air and explore the space, not counting a lot of time-sensitive industry robotics software that controls the actuators in real time as well.

That said, the strict requirements of real-time programming requires a lot of expertise such as CPU cycle counting, CPU slack reduction, timing requirements and choice of algorithm (such as EDF scheduling and real time memory allocators like TLSF, and you won't see jemalloc on embedded devices, right?), stack or heap memory allocation which also complicates the programming stuff, because removing malloc might free you from OOM but this means a lot of functions needs to accept extra parameters to state the output location, and that means a lot of normal software can't be used. (You can go for a hybrid approach by using arena allocation, but that still isn't a perfect solution)

As you see, even for soft real time engineering (which I actually mentioned so far, I don't know much about the real hardcore "hard real time engineering" though), the sheer complexity here already, means there's a lot of design decisions which simply just makes people stay away and just go for normal software engineering (but in the end, if everything is predicable its fine).

Move fast and break things is not something i wanted to hear tbh.

When Zuckerberg said that, he's referring to his startup mindset when Facebook was really just a startup in his day. In a highly competitive market environment, startups has to fight desperately for their own survival, even with lots of fundings and VC rounds.

Startup can move very fast and their agility is the only weapon against the old dogs. Now you've become one of the old dogs, you don't break things.

Also for much of that 2000 years, most bridges that weren't big landmarks were somewhat consumable. It's not uncommon in small towns in Europe to have the bridge from the first half of the twentieth century, built because the nineteenth century bridge washed away in a storm, and that bridge was built because the seventeenth century bridge fell down from lack of maintenance.

That wasn't the issue for the brige, but from centuries old idiotic people founding villages near a river or stray with names like "Villar del rio" (village of -by- the river). At least that's the case of Spain. Then floods happened really often, and everyone ranted on the supposed bad materials.

Even Facebook realized it was dumb and Mark Zuckerberg apologized to us all on stage at f8 a decade or so ago and said they were now going to say "move fast with stable infra", yet I still see tons of people fetishize and emulate the original.

Honestly though, FB wouldn't have won without the move fast and break things approach. But different techniques work at different scales of company.

You're doing super well if you're successful enough that some of your values need to be re-written.

And even then they didn't pivot to "don't break anything." They still are willing to break user workflows are quickly test new features that have a high risk of being rolled back. They key is that they are able to set more explicit tradeoffs on the risk they want to take relative the benefits and what scenarios they want to optimize. The same applies to all engineering disciplines - arguably managing requirements and tradeoffs to produce a working system is what makes something a work of engineering vs another discipline.

I think one of the issues is that software engineers are building on sand. Civil engineers have Newton and Einstein to fall back on.

It feels like there is no ground truth in computing, each layer has to assume the layer below is generating random errors and try to account for that, because humans are not perfect.

The laws of physics stay the same, technology is constantly changing.

yes, and instead of numerical values that can be corrected with straightforward mathematical techniques, you have contracts (API) that are either broken or not, and theres nothing you can do about it, i mean unless you want to write thousands of extra lines of code to account for every combination of the lower layer misbehaving

When you get trained as a Chartered Engineer/Surveyor/Accountant/Purchaser etc you get schooled in ethics, professionalism, whistle blowing, how to deal with being unfairly pressurised etc. Never mind Newton, professional standards would have really helped here.

This isn't a story about software engineering, it's a story about a shitty governmental department in a mediocre country not fixing something that everyone knows is broken, and threatening people with jail time instead.

Show me a similar story about stuff this bad happening at Google and I'll believe that there's something wrong with the field as practiced at its best.

the software bugged and said they commited fraud, thats still half the problem and is a good time to repeat "stop putting software in things". we still need 40 more years before software "engineers" grow up and start doing things even anything near correctly. the other half of the problem is those people who just want tech for every problem. they just want shiny buttons, regardless of if they work. i dont know why but everything that these companies like fujitsu, hp, dell, acer, asus, sony, samsung, etc are all complete shit. i would vote to remove the government if they told me theyre going to start using some fujitsu product to decide who should go to jail. i just noticed a $6000 vent hood where if you adjust the fan speed you cant toggle the light within the next few seconds. even lay people understand that there is no reason for this aside from what could only be reasonably understood as absolute shit engineering.

the problem is people asking idiots who cant program even a simple button, to program serious enterprise, government, and military systems.

The wider 'mood music' is also key here; the government was in the process of privatizing the Post Office i.e. so that they weren't the sole equity owner. Every decision (or non-decision, more appropriately) stemmed from that.

This is why the CEO was rewarded with a CBE and a cushy NHS post in 2019 when the scandal was in full swing. She was acting like a good girl for her political masters in managing the crisis, stringing it out long enough in the hope that everyone would forget

I’ll bite.

It’s that way because software is much much much more complex than that.

Building is a good example. One architect after university has enough knowledge and mental capability to design a building that won’t topple: There are discrete number of parameters to work on and fallible elements are identified and understood very well. There’s manageable amount of degrees of freedom that architect is working with. It’s also not very common for physics to get an annual update.

But when you go to funny-cat-pictures.com there are layers upon layers of complexity. There’s TCP, UDP, traffic encryption, indefinite number of hoops in routing, indefinite number of electronic devices with various quality, etc. Request comes happens from millions lines of code operating system and generation happens on a different millions lines of code of operating system with thousands of lines of codes of software code that interacts with databases, proxies, CDNs, load balancers and is templating, serving, translating, transpiling, compiling and serving while defending against adverse attacks, managing the cache, optimizing for network topology and requester rendering software. That changes every single day slightly.

In 99% the worst case is - your picture won’t load. But when you get to the more serious software development (in a way that it’s critical domain, not that “serve cat pictures” isn’t serious job, mind you) all of it is very very visible.

Every element is so unimaginably complex that it has literal tomes of knowledge written and published about it. Describing every element of transaction between end user and cat serving website would take hundreds of pages - and it would be different than other cat website.

Most of the building, bridges etc. was made either by one person or small group of people working on it.

Rarely software is written seldom, and many well known products have tens of thousands of inconceivable smart engineers and we still can make fun of their - not so rare - failures.

So yeah. Systems suck, organizations suck, people suck. But they suck in relatively safe but also complex environment.

For years software engineering was made fun of because it’s not real engineering. But when you look close enough the environment is orders of magnitude more difficult. Civil engineers might be offended but they can’t hold a candle to structures software is keeping straight.

Software is a bridge built on a raft, floating on an ocean, which tries to get shot down by armed pirates and semi-controlled by a cost-saving manager trying to look good in annual progress report.

Q: Why is the quality of your work worse?

A: Because I am very, very smrt and engineering is for babies.

Lol. Sounds like someone was very upset about being called "not a real engineer".

Why this isn't applied to software engineering is a whole nother story, but i think it probably should.

Because bits don‘t rot and while you need to build a new bridge now and then which will include your learnings from the old one, you never have to replace software because it can be copied ad infinitum.

Metaphorically speaking, bits do rot, this is an expression for software stopping to work as the environment changes over time: http://www.catb.org/jargon/html/B/bit-rot.html

and yet you don't see very many buildings collapse, bridges fail, and damns flood down valley

Maybe that's survivor bias? Crappy organizations fail at infrastructure projects before they even get off the ground.

Conversly softer endeavors involving just money, people and services can operate for a long time on finger crossing.

and yet you don't see very many buildings collapse, bridges fail, and damns flood down valley.

Because those engineers are required to have rigorous qualifications. Please see my comment linked below

https://news.ycombinator.com/item?id=39014625

I always figure the major difference between software engineering and engineering physical things, is that software is virtual and fluid.

Once a physical thing is built or being manufactured, the cost of change is exponentially higher than with software.

So software engineers can experiment more frequently and can get away with low quality, as it can always be fixed later.

Of course, this same mentality can carry into industries where the stakes are life and death, which is where accountability is very much lacking.

and yet you don't see very many buildings collapse, bridges fail, and damns flood down valley.

Hi! Let me tell you about my wonderful country. Norway! I'm sure you've heard about it. The socialist utopia in Scandinavia! ;-)

The Braskereidfoss dam failed during a flood last fall. Luckily just a 'river dam' and not a reservoir dam. Still rather bad.

Oh, and during the same freak whether event, but different place in the country. The Randklev railway bridge failed over the river Lågen, just by Ringebu.

And during the last 8 years, we've had two road bridges collapse. One by Sjoa in 2016 (Perkolo bridge) and then 16 months ago Tretten bridge. We had the fantastic idea of building wooden bridge sfor road traffic. Here's a nice article on Tretten bridge: https://en.wikipedia.org/wiki/Tretten_Bridge

But yeah, not too many building collapses . Just a bunch of infrastructure collapse.

The biggest issue isn't the software glitch system. It's the legal system that threatened innocent people with prison for theft unless they admit they are guilty for crimes they didn't do and pay for damages they didn't do.

This case was brought to public attention and repairs were attempted only because it's huge and involved hundreds or thousands of people.

How many disparate cases there are, where people's lives are destroyed and innocents are rotting in jails, we have to ask?

Every skillful programmer knows that all software is crap.

Judges, salesman, and managers don't understand that.

A big problem is they don't want to do the work to understand that, which is the exact outlook the PO had...

"We need some software, ok let's get a big reputable company in to do it for us, we shouldn't get bogged down with all those horrible technical details"

Not wanting to defend the PO but it wasn't really their decision - it was a PFI (private finance initiative) foisted upon them by the Tory government of the day as one of their recurring "STOP BENEFIT FRAUD!" lunacies.

Wasn't it enacted under Blair?

Started in 1994 under the Tories, rolled out in 1999 under Labour as the reduced system after DSS withdrew.

Thanks

And that skillful programmer will fight with all power to avoid any kind of minimum standard and liability for crap software, continuing the cycle and abuse.

It's always spectrum from THE SOFTWARE IS PROVIDED “AS IS” to high-assurance methods used in aerospace and similar safety-critical fields.

The skillfull programmer may accept liability when you give him a verification team with a few PhDs, the ability to withhold signoffs, flexible deadlines etc. etc. Few are willing or required to pay for that. So they get a mystery box with a 90% chance of crap.

Judges, salesman, and managers don't understand that.

They do understand that. They do not care.

This is a somewhat Sith-like dealing with absolutes.

Laypeople generally understand that software may crap the bed in the sense of "the system is down, please wait, then try again". But few people have experienced subtle changes in stored data.

A judge looking into his document cloud may be ready to see a "sorry, not available right now" notice, but doesn't expect that some sinister program is, in the background, silently editing texts of his judgments and pronouncing people guilty when he intended to free them etc.

The problem with the Horizon scandal is in this sinister manipulation of data. It may also have been done by Fujitsu people themselves, in order to cover some tracks and tamper with evidence. This is a very untypical failure mode.

Judges... don't understand that.

I think for the courts the issue is a bit more subtle. The question is, who's job is it to prove that the other person is wrong ("burden of proof")? Should it be the job of the prosecutor to prove that Intel's processor produces the right answer when an ADD instruction is executed? Or should it be the job of the defendant to show that Intel's processor doesn't produce the right answer? What about proving that the compiler produced binaries which faithfully represent the algorithm? What about Excel?

In our normal life, if a computer is doing the wrong thing, we don't start by assuming a broken compiler; we start by assuming that the new, not-well-tested code is probably broken.

It seems that in the UK before the 90's, the burden of proof was always on the prosecutor to prove almost everything about the system, which is kind of ridiculous. So they passed a law trying to fix it, but messed it up the other way, putting the entire burden of proof on the defendant, without giving them any real way to disprove it. (I mean, shouldn't "discovery" at least mean I can inspect the source code?)

A more balanced law would say that widely-used software with extensive test suites can generally be assumed to be working properly; but that custom-purpose software needs at least some level of evidence that it's correct, and that defendants have a right to inspect any software that's used against them in court for defects.

The biggest issue isn't the software glitch system.

I disagree, the software glitch was the problem here.

We are supposed to be able to rely on computers to store and add numbers or report a system failure. This accounting software showed in black and white that some funds that the sub-postmasters were responsible for had gone missing.

What else was the legal system supposed to do? The broken software was simulating crime perfectly.

It wasn't though. If the post office enforcers had taken even a cursory look at the transactions around the 'thefts', they would have noticed obvious errors. One of the bugs basically just duplicated a close-of-day transaction, sometimes many times. This would obviously have looked like an error, it would be a stupid way to commit fraud. It was obvious that the Post Office just preferred to extort money out of the postmasters as opposed to actually work out what was going on (as evidenced by the bonuses for successful payments or convictions)

I disagree, the software glitch was the problem here

Except it wasn't; the main problem was how the PO was handling it. ICL/Fujitsu were aware of near-identical bugs in an earlier project[1], and PO employees omitted parts of an audit from 2004 that described similar issues as well[2]

It all goes back to ICL/Fujitsu and the PO being aware of the issue and withholding the information from anyone not already "in the know"; lawyers, judges, changing witness statements to hide incriminating evidence, etc.

[1]: https://archive.vn/ah6K2

[2]: https://archive.vn/fXqx2

I think if a handful of people had been prosecuted then it would still be an outrage but understandable. But this was hundreds of cases. I think the legal system has some responsibility for not maybe thinking "Huh, what are the chances of so many previously law abiding people all committing the same crime in the same time period?".

Absolutely wrong. Mistakes happen. Bugs, fat fingers, laziness, hangovers--whether by human or machine, errors occur. The legal system was supposed to uncover the facts. Because of the Post Office coverup, the judges were told " no, there are no bugs. No, nobody has remote access to these terminals. Yes, the only possible way these figures could turn up is through theft." This despite the fact that at least one Post Office inspector explicitly wrote in a report that there was no evidence of theft. The legal system failed to penetrate the veil of lies and find the truth. That's a legal systemic failure.

Agreed. The bug is a footnote.

The legal system failed these people horribly. And the people who pursued these cases with no direct evidence whatsoever should suffer jail time.

It's correct that this is ultimately a failure of the legal system.

However, the role of software here must not be minimized. Software makes it easier than ever to diffuse responsibility and create opaque processes that leave the least powerful people at the bottom of the hierarchy holding the bag. By rigidly encoding flawed assumptions and executing them without question, software is the ultimate realizer of our Kafkaesque nightmares.

It's not entirely unlike Therac-25, including the deaths (albeit more indirectly caused in this case.). There was a certain element of operator error, but that doesn't excuse the faulty programming.

However, the role of software here must not be minimized.

No argument that the software bears fault, too.

However, when you accuse someone of stealing money, you should have to prove that they stole the money. This isn't some invisible crime. There should need to be evidence that the stolen money went into their account, got spent to buy something, got pulled from the till on camera, got transferred to Bitcoin--something.

The fact that all these people got convicted with no evidence that the money was ever in their possession is a gigantic legal problem.

It also doesn't help that the lawyers in the case fabricated evidence and covered up the issues with Horizon to secure the convictions. The people doing this need to spend some time in one of the overcrowded prison cells they sent the sub-postmasters to.

Exactly, and by the number of errors among the death-row/lifelong convictions - which one would presume are the most sensitive and carefully reviewed ones - these numbers are HUGE, especially in the US.

There are not as many that involve jail, but there are a variety that involve ruined lives and even bankruptcy. A recent example is the Phoenix pay system that was used to pay Canadian federal government employees and contractors. https://en.wikipedia.org/wiki/Phoenix_pay_system And I agree that the problem wasn't the glitches. I personally think it was the corporate governance that failed, not the software development and debugging process. The legal system was complicit and enlarged the overall consequences, but the but for test tells me that it was the poor corporate governance that was at fault for a root cause.

This is not well researched at all. Anything that depends on “if everyone acted intelligently and good will” is broken. If you’re interested in academia I strongly recommend you go back and look at designing systems that function in the face of incompetence and even adversaries.

You can have your opinion, but I have mine after reading lots of research papers. Obviously, there are ways that things can still be improved. But even the design of systems that function in the face of incompetence and even adversaries will face intractable problems of incompetent leadership and governance. It's one thing to say "this is how it should be done" and it's another to say that we've managed to get people to do it the way it should be done. No matter how much improvement we see on ideas for how it should be done, we're still no closer to solving the second part, despite much effort. In the end, there will always be incompetent leaders in charge at some point for some project somewhere, and more often than we'd prefer. You can lead a horse to water, you can't make it drink.

It’s not an opinion and you can cop out to lame “what if the government is hitler” arguments, but resilient systems are definitely an engineering/science/math problem.

The entire field of cryptography wouldn’t even exist if the boundary of research ended at “good actors”.

And yet we still have people out there trying to create their own cryptography when the golden rule is to not roll your own crypto. For whatever reason, best practices don't get followed 100% of the time, even if they exist. For cryptography, the situation is better than most other domains. I think we're having different conversations here. You seem to be having a technical conversation. I'm having a sociotechnical conversations within the context of organizations and their workers and managers. I'm seeing you discuss technical solutions to sociotechnical issues, which is not what I am discussing. Even when the technical ideas are perfect, organizations still need to implement the ideas. That implementation tends to not follow allegedly perfect specifications for many reasons.

But you are entitled to your opinion and that's fine. We can agree to disagree, nothing wrong with that.

For whatever reason, best practices don't get followed 100% of the time, even if they exist. For cryptography, the situation is better than most other domains.

That’s the point. Cryptography is significantly better precisely because the research effort has gone into systems that are hard for people to fuck up.

Just look at the fight was to get everyone to agree that the model should be, “everything including the algorithm should be public, except for the key”. That’s a socioeconomic argument.

Even when the technical ideas are perfect, organizations still need to implement the ideas. That implementation tends to not follow allegedly perfect specifications for many reasons.

And that’s why making safe systems where mistakes are protected against is a critical area of research.

Rust is popular because it protects against whole classes of bugs, despite it being no faster than C/C++.

Surely the field of cryptography relies on conscientious and competent actors developing solutions that are robust in the face of malicious actors.

I am skeptical that there are software development practices that will allow me to hire a team of feckless incompetents and have them develop quality software. If you know of any I'm interested to hear about them.

Are there software dev processes that operate well in the face of incompetent or malicious people? I can think of ways to mitigate the damage, but at the end of the day surely you need some competent and conscientious people on your project.

I could give you a huge list but mostly it is computer programmers having the specification constantly changed by management and stakeholders. Even bad software developers can eventually make the software functional, even good software developers can write bad software if the organisation is going out of its way to break everything they do.

There is definitely a sensemaking process where organizations have to figure out what it is that they really need. But I wouldn't fault organizations for that. Most startups go through the same process trying to figure out product-market fit and you don't see those startups blaming their customers for not knowing what they want.

Most startups die doing this, most large orgs just set first to loads of money and make their software buggy, they already have product market fit.

If they had product market fit, they wouldn't have major feature change requests that turn the product upside down and inside out. But either way, startups don't blame their customers for being unable to meet their customers' needs. I think it's poor practice to blame organizations for being unable to meet organizational needs, especially when we already know that organizations and users don't know how to conceptualize software requirements well, let alone create software.

I constantly see large companies trying to reinvent the wheel in really haphazard ways, what has been your experience? Mine is I have been contracting in roughly 20 large companies since 2010 and before that I worked at Yahoo! and others.

But I've had experience too. My career started in a national telecom where I was part of a skunkworks team to develop internal applications because the organization was fed up with the IT department delivering solutions that didn't fit their needs. We approached issues differently from the ground up. Software developers gathered requirements on their own by job shadowing employees, and then delivered MVPs within days, which were then constantly iterated to finally solve the real problems. The software developers had complete control over what was made and why with zero change management or approval processes. We also had complete control over what technologies we used to make our apps. We mostly used .NET, but we also did some Java Swing and Ruby on Rails, and of course everything also used Javascript.

Our relatively small skunkworks team developed apps that changed the end-to-end solution delivery processes for major business units, both consumer and business sectors, saved the company 8 digits in opex and capex each year, and won an international award for "Best Support Team" (the Stevies, sort of known as the Oscars of the business world). Our greatest feat that year that enabled us to win the award was keeping the company afloat during a four-month union labour dispute by improvising solutions that automated everything in sight. At the end of the labour dispute, the CEO send a company-wide email about how important we were, awarded us this made-up award "Holding the Fort". When the union came back to work, we trained them how to use the new tools, but we unfortunately were also enablers of heavy downsizing, which I always disliked. Some of these people were hardworking people who did nothing wrong and followed the rules. Many of them were elderly and had little chance to go back to school to get new skills (we're talking 50-year-old clerical workers, etc). It drastically changed how I thought about corporate software work. That being said, we were all young cowboys, and it was possibly the best team I've ever experienced in my life.

I experienced the absolute opposite in many ways when I worked overseas for IBM, managing projects that spanned the Asia Pacific. I was the go-to PM many of their mission-critical infrastructure projects, including helping with datacenter migration from Japan to Australia, necessitated by the 2011 Fukushima earthquake and tsunami. I also experienced a middle ground as a venue technology manager for the Vancouver 2010 Olympics. Lots of pressure and set processes, but a lot of extremely competent people too.

Look, I'm not doubting your experience, but I've had mine too, which shaped my views, just as I'm sure that your experiences have shaped yours. We can agree to disagree, nothing wrong with that.

TLDR

It would be really interesting to study The Post Office in particular. Something about this organisation attracts some very sour people. Or perhaps they weren’t always like this but have become so in my adult lifetime over the last few decades?

In the early 2000s there was a TV ad campaign for “The People’s Post Office” where the sub-postmaster role was played by John Henshaw, a character actor known for playing hard bastards and, in his most recent role on The Cops, an exploitative bent copper from Bradford. A strange but apt piece of casting.

Low salary coupled with customer facing job creates sour people.

Imagine you are talking everyday to weird, bitter, arrogant, rude customers for years - even with high salary you will be not so positive.

Sounds relatable. I also think there may be some selection effect going on: if you didn't have a better alternative than a low paying customer facing job you may be already have a little sourness to start with

Don‘t think like that. Medicine is a complicated field as well, but after evidence based medicine was invented death rates were going down. It is certainly possible to find and enforce lists of best practices.

The flip side of that is that despite how awesome modern medicine is, we still can't get people to get vaccinated (for free!) and put a mask on their face during an infectious disease emergency.

You will find that people just suck no matter what field you work in.

Reminds me of a saying from a dear friend: „Humanity is a boat with a big hole and we constantly need to shovel out water to keep it from sinking.“

I quickly found out that IS research had already figured most of this out

What is "IS" in this context? I did some Operations Research modules at uni and thoroughly enjoyed it, but it had nothing to say about why projects didn't work.

Information systems at a guess

Were there any particular themes that stood out as you when understanding the causes of IT incompetence? I'm hoping there's a less depressing answer than "some people will never care".

I'd say that the biggest thing that stood out to me was what people call communities of practice. Each professional community has their own knowledge and best practices, but these don't generalize well to other professions. So you have boundary spanners who can bridge two professional communities by being good at both, but those types of experts are rare. Also, the amount of experience and learning required to become an expert in two professional communities, rather than just one, requires such a large amount of time that most people can't be bothered to put in the effort.

It's enough work to do one's own job well already. The go getters can of course do it as a natural course of action, but they are outliers. There are a limited number of job opportunities that require developing this experience on the job, so there are limited opportunities to become a good boundary spanner in the first place. Furthermore, people aren't naturally interested in multiple disparate subjects. True renaissance folks like Leonardo da Vinci who are interested in becoming experts in both art and engineering are rare. Elon Musk types that will try to dive deep into multiple unrelated areas are rare. All of this adds up to boundary spanners being rare. As such, leaders who can develop cultures that handle multiple areas simultaneously (see the founders of Flexport who understand both tech and shipping logistics) are rare.

In short, expertise is hard to develop, expertise in multiple areas is rare, and coordination between two areas that understand different worlds is difficult without boundary spanners. As a result, you get failures. See any software engineer who creates a startup to try to revolutionize some old-school industry and then fails dramatically because they don't understand the problems that actually need to be solved. The outliers will figure it out, but not everyone can become an outlier due to reasons discussed, among other reasons.

I did a quick look around for some blogs about this early paper to digital transaction register migration, I didn't see much for such a major case.

Just a few basic things that wasn't included, no audit/transaction logs, transactions modified by tech support to keep the system running.

Operators couldn't prove they didn't steal funds, and the british law that computers systems are to be trusted as fact, pretty much convicted them all.

This isn't a software error and it is pretty clear.

There are two problems here. First, the branch manager is responsible for calculated shortfalls, even if the software is broken. Second, there is no way to overturn broken software. Third, the prosecutors are overzealous in trying to shut these people up and convict them straight away.

The software itself was just a convenient medium for abuse of authority.

Most of the time issues like these are from companies that pay like $50k salary for senior positions.

So everything checks out.

Some people suck at management, some people suck at coding, and some people suck at self-awareness.

I can think of two things that I believe would make a difference in any LargeCorp: First, a standarized way to visualise and execute business logic that allows developers and management to reason together. (The no-code movement is on the right track in fostering a common way to interface with code). And second, a responsible editor for each piece of code.

I think a key factor is that software historically hasn't enjoyed industrialisation to the degree of hardware (or construction for that matter). I can buy a standardized CPU of millions of transistors and integrate it into a standardized motherboard with just a snap. We have managed to standardize software up to the OS level, but after that it's up to the developer and her shortcomings.

https://www.codevalley.com/ does some interesting work.

people of integrity

Our current system of the world quite strongly disincentivises honesty and integrity - rather, being a bombastic charlatan with a flexible relationship with the truth will get you anywhere.

why IT incompetence still could exist to this degree in this day and age after all the knowledge the world had developed about good IT and software practices over decades of experience

Because we’re not professionals. We don’t profess anything and do not have standards. There is no regulation for our industry and no IT association that can strike you off from practicing this craft. There is no accountability, and when there is no accountability, people naturally regress to either lazy or exciting behaviours.

People who are selfless, humble, intelligent, competent AND be people of integrity, are never the people who win the contract for any information system though.

Of course, I quickly found out that IS research had already figured most of this out, and that perhaps, people were just people and crappy organizations were just crappy organizations, and perhaps that's something that will never change because bell curve distributions exist for almost everything.

Hence why we need to keep things simple. The human part will never change, or at least change at rate that will take many generations to improve if you are an optimist. I actually prefer things to be Hybrid rather than all-in digital.

Here some more information about what really went down, technically, behind the scenes: https://www.theguardian.com/uk-news/2024/jan/09/how-the-post...

"One member of the development team, David McDonnell, who had worked on the Epos system side of the project, told the inquiry that “of eight [people] in the development team, two were very good, another two were mediocre but we could work with them, and then there were probably three or four who just weren’t up to it and weren’t capable of producing professional code”."

(Just in case somebody says I am putting blame on developers) Obviously, the responsibility is firmly on management. People making code bugs should not be held responsible for other people going to prison for it.

Obviously, the responsibility is firmly on management. People making code bugs should not be held responsible for other people going to prison for it.

This is a controversial opinion but I disagree, at least to a point. Managers don’t really know what we do. The only people who really understand the engineering trade offs involved are engineers. When lives are on the line as a result of our work, we shouldn’t be insulated from the consequences of our choices. That’s not good for society and ultimately not good for us. We change the world with our work. It’s healthy to understand and own the consequences of that.

The law agrees in parts. The principle of tort law is that everyone is responsible for foreseeable harm caused to your “neighbours”. Your degree of responsibility - and in turn liability - scales with how much expertise you have in the domain. An expert should have been able to foresee the harm more than a novice. The senior engineers on the team should have done better. I believe they are at fault.

(IANAL, this is not legal advice, yadda yadda)

Managers don’t really know what we do

That looks a lot like incompetent management.

Management aren’t trained engineers. They shouldn’t have to be. We’re the experts in the room. That’s literally what we’re hired for. We should act like it and stop trying to pass blame to other people.

Whilst I agree in some respects, the biggest gulf to me is between companies like Stripe who successfully manage a large chunk of the words commerce (led by a brilliant engineer-CEO) and the 'IT Projects' that seem to plague the public sector here in the UK.

My point is that particularly in the UK we have this culture that the Geeks should just do their job and let us Business Types take care of the rest. Countries like Germany have a much higher respect of technical people and qualifications e.g. it's very common for CEO's to have PhDs

It’s not about having a PhD. If I pay to have a house built, the point of paying money to a construction company is that I don’t know how to build a house. I’m hiring them because they’re experts and I’m not. If the house falls down and kills someone, the construction company is responsible. I don’t know how to tell if a building is safe because I’m not a working engineer. (And even if I was, it’s still the construction company’s job to build my house properly. That’s what I’m paying them for.)

In software, it’s the same. Your employer hires you because they need an expert. It’s your job to take responsibility for the software you write. Even if the CEO has a PhD, it’s not the job of senior management to review your code. And - trust me - they don’t want to. Instead take responsibility for your own work. You are worth your paycheck because you know how to build software well. Stop trying to shirk your job.

I think your analogy is a bit off, construction projects are primarily in their nature one-offs. If I was a restaurant owner and needed a new restaurant built then sure I'd expect someone to take all the responsibility. If I wanted a new chef to design a menu, keep standards up and also roll out new dishes then the restaurant owner should know about food, it's their "bread and butter"

Most companies nowadays are more like restaurant owners i.e. computers are a core part of their business. They can't and shouldn't rely on engineers to get things right all the time because humans are flawed and therefore their code will be. The CEO has the responsibility to put people and systems in place to account for this.

I hear what you're saying; but I think its hopelessly naive and unrealistic to expect the CEOs of large companies (like banks, telcos, insurance companies, etc) to learn enough Python that they can make competent technical decisions.

Of course, if the CEO defunds their infosec teams, they bear the blame when they get data breaches. But I also don't think the engineers on the ground get to avoid blame when their crappy software leaks customer data. "I thought about making it secure but we had a deadline so I didn't" is a lazy excuse. Blaming management for everything is a lazy excuse. We're engineers. Not typists.

Of course, we shouldn't expect the CEO of the PO to dig into code, and we shouldn't excuse the standards of engineers.

To continue the restaurant analogy, if customers are consistently getting food poisoning due to the sloppy hygiene of the kitchens then the buck stops with the restaurant owner. They'd need to diagnose the issue, put in place better training and supervision of the cooks and have systems to regularly check that standards are being met

Right. But also, if a surgeon doesn't wash their hands and a patient dies as a result, the surgeon is at fault. Not the hospital's management team. Or, not just the hospital's management team.

Yes but this was a case of a very filthy/corrupt hospital

The title of a singular "glitch" is very misleading in this case, as this was a cacophony of cock-ups and cover-ups for 20+ years

Upper management doesn't have to be trained engineer. However immediate direct management of development team definitely have to be trained engineers to be competent.

Also, a good manager should detect if a team contains a few "dead weights". It eventually always lead to frustration from more competent team members and this is something that can be detected and addressed if you talk to your team members.

The only thing that may be difficult for managers in the public sector is firing people unless they are contractors. I don't know how it works in the UK but I have worked in gov agencies in another european country and firing someone who was just working badly (to the point of wasting time and energy of others) was almost impossible. And sometimes upper management would throw unfit people in your team just because they needed to put them somewhere. I had a few time wasters in some of my team that had been put in our team by the unemployment agency. Basically they had spent money on them taking "classes" (which were just Microsoft certification classes) and threw them at us while most of them weren't even interested to begin with. You really had to do direct fault to be fired and it would take months or even years. The only one I have seen being fired was a project manager who booked fake meetings to go play golf during office hours.

A good jockey isn’t a guy who can run like a horse. A good manager doesn’t even need to know how to program, let alone be an expert debugger who can spot errors made by their team.

A jockey needs to know what a horse is and how a horse lives, trains, runs, etc.

A manager that has no technical knowledge is useless, like a jockey who doesn't see a difference between a poney and a horse or distinguish a healthy horse from an injured one.

A manager that has no technical knowledge is useless

Thats taking it way too far. Sure; its useful for managers to understand programming concepts (deployment, testing, etc). But the job of a good manager isn't managing code. Its managing people. Making sure Sally is happy in her new role. Setting up a meeting between Jake and the sales team so they can get to the bottom of that important bug. Helping mediate that conflict between the software team and the design team over what features to prioritise.

A manager is hired to be an expert at humans. Not an expert at computers.

If managers are hired only because they're experts at managing humans, then they're inadequate for the role of managing a dev team. These guys are not just managing people, they're managing processes that they have to know something about to manage effectively.

You can't expect managers to have the same level of technical expertise as the people they are managing. In your view of the world, specialization doesn't exist.

I'm not saying i expect that at all. I'm saying they should have some knowledge about what they are managing than just being a people person. Wouldn't you hate your direct manager having no clue what a code review is? What scrum is? How features are estimated? How estimates can be widely inaccurate? What tech debt is and why it should not be swept under the rug?

The only one who can have responsibility for anything is the one who has the authority. So long as developers are in a position of "just get it done or you're fired" as well as being outsourced to save costs, they have no authority in this and therefor zero responsibility. If management "doesn't know what we do" and doesn't want to have the responsibility then they have to give us the authority to say "no, this is not going to be done tomorrow and we're not cutting any corners".

just get it done or you're fired

Who actually works in a job like that? Do you seriously think you’re a slave to your boss, with no personal agency? Do you think your employer wants you to be feckless? Do you think that’s good for your career?

Your capacity to take responsibility is the differentiating factor between junior and senior engineers. Learn to step up. If nothing else, your pay check in 10 years time will thank you.

Incentives do matters. It should be pretty clear to anyone on this forum. The parent is absolutely right. Responsibilities do matter. In this case, there was clearly a systemic problem that was repeateadly ignored by mulitple levels of hierarchy.

You position is a position of principle. When peopke lives are at risk (boeing, fukushima and yes even the post-office) pragmatism must prevail. Those in charge must pay the price, otherwise you incentivise financial results over everything else. People die? Oh that's because Greg in engineering is an idiot, burn him!

The problem is with insufficient QA and processes in place. If a coder delivers code that is insufficiently tested before deployment, the onus is on the product owner. It's the reason "owner" is synonymous with "person who is responsible."

I don't know where you live, but in my (extensive, but rather local) job experience so far, it's possible to raise concerns and propose solutions, etc., but once the IT director or VP of whatever says "we're not going to do that", it's over. You have to work on something else. Or find whistleblower protection, and good luck with that. Otherwise it'll be "Can you step into my office for a minute? Close the door, please."

The only one who can have responsibility for anything is the one who has the authority

So if a cop gets an adresse wrong and is a bit too trigger happy and ends up killing innoscent people. Its their chief of police who should go to jail because they told them to go arrest a suspect? Unless we change the system to allow cops to just do whatever they want whenever with not leadership?

The idea that just because a programmer doesn’t have complete autonomy over their work that they suddenly become unaccountable for negligence and errors is ridicules.

I agree, but only if we have the same standing and legal protection as professional engineers.

If I’m going to be legally responsible for software bugs I must have the legal right to tell management that their timelines are not possible, and that they can’t deploy software I won’t sign off on.

That and for outsourced software the executives become personally responsible.

I agree with every one of those points. There were two massive data breaches in Australia recently. I’m livid that nobody was held liable for damages over it.

An expert should have been able to foresee the harm more than a novice. The senior engineers on the team should have done better. I believe they are at fault.

Well, the reality is usually at these large software development agencies that senior engineers are prevented from doing what they think is right.

For example, they might have been pressed to deliver new features in an extremely inefficient system. They might have been inundated by low quality code from less experienced devs. They might have been busy with communication with stakeholders and unable to do much about it.

So singling out these developers would be like singling out couple cops for being racist when the entire Police department is known for racism. You know, technically you are right but that still does not seem to be right thing to do in case of a systemic problem.

The question would be if these developers had any way of knowing the consequences of what they were doing.

Also, "good developers" is a relative term. In an organisation like that a "good developer" might simply describe a person that is at all capable of writing working code. It does not mean they were experienced or aware what is happening around them.

It is the responsibility of managers to recognise the issues with the environment their people are working in.

Well, the reality is usually at these large software development agencies that senior engineers are prevented from doing what they think is right.

Can you imagine how that conversation would go if the engineers were personally liable? “Hah you want me to sign off on that? No - I don’t want to get sued when it inevitably goes wrong. I’m sorry boss but I won’t do it. And you won’t find another engineer in the building who will. Lives are on the line. We either do it properly or we leave it alone. I’m not sticking my neck out to make the business a quick buck.”

Also, "good developers" is a relative term.

Legally, as I understand it the courts look at job titles, education and experience to make a judgement.

Managers don’t really know what we do.

Uh.....

The only people who really understand the engineering trade offs involved are engineers.

Maybe you should have engineering managers who understand the subject?

So, you're telling me your manager doesn't care if the software is broken and has legal consequences for it's users?

That strategy only works if the software itself is inconsequential.

I agree, except that it's important to remember that code by itself is as meaningless as programs with no side effects. A bug becomes a problem when it's a part of a running program.

From that article (and a few others)

In fact, staff at Fujitsu, which made and operated the Horizon system, were capable of remotely accessing branch accounts, and had “unrestricted and unaudited” access to those systems, the inquiry heard.

This has always bothered me. Sure, it's possible to build APIs that audit access completely. But I can easily write code that circumvents those APIs. Code isn't like a building where the walls are impenetrable and the doors the only possible access points - we can redecorate without ever touching the door. Building in an unaudited backdoor for operators seems bad, but if you can edit the source code the backdoors are infinite.

There should be application level auditing and database level. The people with access to managing the database level auditing should be extremely limited.

Interestingly, it seems they may have built their own master-master xml-based database. It's easy to guess that they didn't add an audit feature etc.

They were using Riposte from https://www.eschergroup.com/riposte-platform/ and Oracle.

They were using dial up ISDN lines to send the data back, but Riposte didn't support that, or scale to 20k terminals, so that was all new code

In general they had a distributed database that couldn't do ACID

https://www.postofficetrial.com/2019/12/fisking-horizon-tria... https://www.computerweekly.com/news/252496560/Fujitsu-bosses... https://www.benthamsgaze.org/2021/07/15/what-went-wrong-with...

Accounting 101 use journal entries to correct mistakes. Dont edit original records... Have a transaction log...

Listen. We all know what should have been done.

They were not able to do the first thing about running a transaction (ensure that one side of the transaction isn't executed multiple times). What you are saying is an obvious thing and yet it probably is well beyond the maturity of the team that was working on it.

People making code bugs should not be held responsible for other people going to prison for it.

Why not? If I'm the single developer and seller of this app, should I not be held responsible? What if there's also a QA person? Two of each?

Should the person selling or marketing the app be held responsible instead, even if they aren't technical? Why are the developers who didn't care enough to double-check their code free of responsibility?

If I'm the single developer and seller of this app

In that case you are also the responsible manager or product owner.

Should the person selling or marketing the app be held responsible instead, even if they aren't technical?

Of course. The person who takes the customers money is responsible for delivering the result and any warranty.

Why are the developers who didn't care enough to double-check their code free of responsibility?

They are not the product owners. They don't decide what is the correct way the product works. Maybe they created the bugs because they implemented the specification exactly as written?

They don't decide what is the correct way the product works.

Of course we are, we're the ones writing it.

Maybe they created the bugs because they implemented the specification exactly as written?

If your argument is "maybe they were told to write the bug in", I don't know what to tell you. If I were told to write a life-destroying bug into the software I worked on, I'd quit, because I don't want that on my conscience.

In your hypothetical situation, I would blame the justice and banking system. It should not be so vulnerable or eager to believe an app made by one person, a self made "expert" on something, new theory etc.

Like, you as a single seller are also responsible for making false claims. But, the justice itself should be more robust then that.

> there were probably three or four who just weren’t up to it and weren’t capable of producing professional code

See the other discussion on HN front page about coding tests in interviewing.

Horizon was a child of the 90s. The software industry has changed a lot since then. Back in those days only Microsoft was routinely requiring programmers to code during the interview, so hiring was nearly random. Software teams often looked like that, with a tiny number of people who could write working code covering for many more who just couldn't at all.

The Daily WTF dates from this time. It's full of stories like the Brillant Paula Bean:

https://thedailywtf.com/articles/The_Brillant_Paula_Bean

You don't hear stories like that much anymore. The industry settled on testing concrete skills before hiring, and that washed out a lot of the people who previously managed to get hired into projects despite not being able to code properly. Whether Fujitsu does it or not now, no clue. But that situation wasn't unusual back then.

that situation wasn't unusual

Um, I'm working as a professional computer programmer today, in 2024, and I can assure that programming continues to be replete with incompetent programmers who are not capable of producing professional code.

I would say another child of the 90's is management / users being too trusting of technology which IMO is the real reason for this mess. These days, everyone knows that software can make mistakes and one shouldn't rely on a single system / data point when it comes to critical decisions (ie. sending someone to jail).

I don't think that was so much the case in the 90's, attitudes seemed to have been that these sorts of systems don't make mistakes, and therefore can always be trusted. I look at this as the 90's version of "The Titanic is an unsinkable ship"

EDIT: I would add that you can rely on systems / single data points for some decisions. How I see it is trust in a single point of data goes down as the criticality of the decision goes up. If someone sends you a message to meet them for lunch downstairs, that decision has low criticality and therefor you can rely on a single data point / application to make the decision. However for more critical decisions (ie. should we send this person to jail for stealing money), trust in any system should be low by default and a consensus from multiple data sources is required.

I really hate how this is being framed as "a software glitch caused [...]".

It wasn't a "software glitch" that caused anything. Yes the software sucked, but lots of software sucks. It was human decisions that caused this, by using software they knew (or could and should have known) was not fit for purpose. And by ignoring people who told them it was not fit for purpose after it was deployed. And the government not stepping in. And National Federation of Sub-postmasters. &c &c &c. All of this is well documented.

That the initial mistakes were made was kind of ridiculous for lots of reasons, but okay, mistakes happen. But that it took 20 fucking years to correct while the broken software continued to be used is just indescribable, and absolutely not the fault of any software but the result of human choices. That's the real problem; in an alternative universe people realized Horizon was a piece of crap shortly after being deployed, the mistakes were corrected, and they fixed it or stopped using it.

Humans caused the misery. Not software. The software bit is almost a minor detail IMO. By framing it as "a software glitch" people will focus on software to "fix" things, but that's not where things need to be fixed.

Can we really say it wasn't the software that ultimately caused this when the software absolutely had bugs?

If nothing else this puts into focus plenty of bureaucratic problems, but we wouldn't be here without the underlying buggy software that opened the door for bureaucracy to fail so badly.

Yes we can. The levels of corrupt behaviour involved in this story, going right to the very top of Government, literally, absolutely transcend any "bugs happen" narrative.

Not to mention the possibility that the Post Office could remotely change the accounts without postmaster's knowledge, that is worse than ignored bugs

Substitute software for standard accountancy practices or a filing system that led to lost files, giving the incorrect impression that postmasters were stealing, and the result would have been the same. The core issue is an organisiation that values avoiding blame at all costs over transparency and accountability.

Can we really say it wasn't the software that ultimately caused this when the software absolutely had bugs?

Yes, because as the OP clearly stated, the problem wasn't the bugs.

The problem was the bugs were noticed, flagged, discussed, _and systemically ignored_.

In any organisation things will go wrong at times, for all sort of reasons: software bugs, human error, organisational dysfunction, malicious actors, whatever.

Mistakes get made. And even if the original mistake was hugely egregious, it's usually not a huge problem if you actually deal with it well (barring things like medical errors, but even there, trying to cover up a mistake can cause more harm than the initial mistake, at least in some cases).

For these kind of institutional problems most of the time it's really the response to the mistake that turns a mistake in to a huge catastrofuck.

But that it took 20 fucking years to correct while the broken software continued to be used is just indescribable, and absolutely not the fault of any software but the result of human choices

100% agree. Managers who were in charge of these choices should be convicted.

If the ITV show is factually accurate, looks like the managers knowingly tried to send innocent people to prison to cover their own careers and reputation.

They should have the book thrown at them.

That’s the crux of the scandal. The Post Office had experts claim under oath that ‘horizon is faultless and fit for purpose’. Mounting evidence shows that these experts were evidently aware that they were lying and so the scandal is taking on a new turn. Police are investigating the perjury that might have contributed to the prosecution’s by the PO. I want to see key people at the PO sent the jail in the same manner any one else who perjured themselves would be. It appears that some power politics went on in the background that either covered up the scandal, or could have ended it early, but turned a blind eye.

> If the ITV show is factually accurate, looks like the managers knowingly tried to send innocent people to prison to cover their own careers and reputation.

There is plenty of strong evidence that this is the case. Stronger than the evidence used against the post workers accused. Take note that no one is threatening the show runners with libel actions & such…

> They should have the book thrown at them.

And thrown hard. This is not a set of honest mistakes, but a protracted campaign to pervert the course of justice.

Definitely true. But the software does also seem to have sucked extremely and surprisingly hard for something that was mission-critical and handling millions of pounds of transactions a day. From what I've read (which is admittedly so far less than I'd like), the whole architecture was deeply inappropriate: it wasn't just subject to a 'bug' or 'glitch'.

A key detail is that the Post Office specifically had a method for secretly and remotely editing accounts of post masters and lied about its existence. Along with experts lying under oath that the software had no known issues that could result in the account errors they were seeing, when in fact the ‘expert’ was well aware at the point that horizon was rife with such bugs.

Maybe the post title is inaccurate. The article title itself continues with "and a centuries-old British company ruined lives".

It wasn't a "bridge collapse" that caused anything. Yes the bridge sucked, but lots of bridges suck. It was human decisions that caused this, by using a bridge they knew (or could and should have known) was not fit for purpose. And by ignoring people who told them it was not fit for purpose after it was deployed. And the government not stepping in. And Association of Professional Engineers. &c &c &c. All of this is well documented.

That the initial mistakes were made was kind of ridiculous for lots of reasons, but okay, mistakes happen. But that it took 20 fucking years to correct while the faulty bridge continued to be used is just indescribable, and absolutely not the fault of any bridge but the result of human choices. That's the real problem; in an alternative universe people realized the bridge was a piece of crap shortly after being deployed, the mistakes were corrected, and they fixed it or stopped using it.

Humans caused the misery. Not the bridge. The bridge bit is almost a minor detail IMO. By framing it as "a bridge collapse" people will focus on bridge design to "fix" things, but that's not where things need to be fixed.

When will the software sector be regulated like other sectors?

>When will the software sector be regulated like other sectors?

But you just said the SW sector didn't cause this, it was those who ordered it and were responsible for managing the public service using it.

With nearly everything in the world run by software these days, it's an easy scapegoat, and an even easier clickbait title. Engineers should be furious about this framing. Even most legit software issues can be trace to poor management, poor processes, rushed deliveries, and inadequate funding. But you would never know that from reading mainstream media. To them we are all bumbling overpaid idiots.

It's only human to blame inanimate objects for their own failings...

> But that it took 20 fucking years

20 years and a high-profile TV serialisation heightening public perception of the problem a little too close to an election for comfort. The government is running around saying they care and how much of a tragedy they think it is not because they genuinely care but because the voting public currently care and there is a general election within the next 12 months.

One of the massively empathy lacking comments from a couple of weeks ago (prior to the proposed law change specifically to speed up exonerations for this scandal, announced last week) was that they couldn't just quash all the convictions without retrial because that might let some genuine cases of fraud through – as if letting one or two criminals off more lightly (they have had 20 year of whatever punishment was originally handed down and its side effects) is such a problem compared to continuing to punish hundreds of innocents. Let them all off, then retry those few if you find new compelling evidence. But before doing that: get the people who tried to cover it all up, who bullied many into silence despite knowing (at least some of) the truth, and those who just stood by and let it all keep happening, etc., both in the post office and the government, into court first – what they have done is far more reprehensible than what those post workers were (wrongly) accused of. Handing back one CBE just doesn't cut it.

Exactly, framing the software removes the accountability from where it belongs - people. Software cannot be responsible for anything, people can, and should assume it.

glad to see this as the top comment, this is basically what I stepped into the thread to point out.

describing this as a software glitch is kind of heinous in its own right.

Your game crashing is a software glitch, what happened in the UK is far beyond that. It'd be like describing the issues Boeing is experiencing as a "hardware glitch". Decisions were made that should not have been made, full-stop.

Plenty of obvious villains in this story but where was the legal system in all this? 900 prosecutions without any real evidence or just because “the computer says so”? As one of the 3 pillars, isn’t the legal system and judiciary supposed to act as a backstop against this sort of capricious mass-persecution of completely innocent people?

Here are the judge's instructions to the jury for one of the trials:

"There is no direct evidence of her taking any money [...] She adamantly denies stealing. There is no CCTV evidence. There are no fingerprints or marked bank notes or anything of that kind. There is no evidence of her accumulating cash anywhere else or spending large sums of money or paying off debts, no evidence about her bank accounts at all. Nothing incriminating was found when her home was searched." (The only evidence was a shortfall of cash compared to what the Post Office’s Horizon computer system said should have been in the branch.) "Do you accept the prosecution case that there is ample evidence before you to establish that Horizon is a tried and tested system in use at thousands of post offices for several years, fundamentally robust and reliable?"

My word against yours wouldn't be enough to meet the standard of "beyond a reasonable doubt", but the Post Office's word backed up by a computer system? It seems that was convincing enough for the jury. They gave a guilty verdict in the above case.

The judge should have asked for an accountant to review the books.

Surely the system must be able to spit out what services and products were sold and how much they would be worth.

The problem was that it was more like a bank. Person gives you $9k cash, you add that to their account but it doesn't work so you push the button again. Now the computer thinks you should have $18k more cash in the till but there's only $9k there. There's no way for a bookkeeper to know that you didn't take that extra money.

This sort of thing is why I've always hated the concept of a random jury from the general populace - In this example, I don't want Joe Bloggs the butcher's understanding of complex computer systems being the determination of whether I am criminally convicted, I want 12 people who have at least some experience programming.

I find it funny that juries are something that some people think is immutable while it's something that's unique to Anglo legal system. The first time I see jury in American media I'm immediately confused on why a random person can decide a court like that.

Which trial?

One of the problems is that the UK legal system has a presumption that computers are reliable. They are assumed to be working properly unless proved otherwise, which shifts the burden of proof on the person trying to claim that they are not working properly.

Many commentators are saying that this presumption should be changed:

https://www.theguardian.com/uk-news/2024/jan/12/update-law-o...

https://www.forbes.com/sites/emmawoollacott/2024/01/15/law-o...

This was coupled by the victims being lied to that they were the only ones. Had they known this was systematic they could have mounted a more effective defence.

You have to assume that at some point though, right? There are so many layers of software it's never gonna be possible to inspect it all the way down.

So does that mean guilty until proven innocent? If I'm correct here, then that's utterly sickening.

If you want an example read all of https://www.bailii.org/ew/cases/EWHC/QB/2007/5.html

Incredible that literally hundreds of people were apparently in the same situation - none of them had any paper trail of money being moved into their accounts, no unexplained wealth, no extravagant spending… All of them just that the computer said so!

I guess part of then problem is that the justice system takes every case in isolation, but the legal system really needs some mechanism where there’s a “hang on, something is wrong here” after the first few…

First change in this case specifically is probably stopping the archaic convention of the post office making their own prosecutions in the UK…

This is the real failure. The software is neither smart nor dumb, it’s a machine. In this case, it was broken, but people who actually have the ability to critically analyze and judge situations deferred to the output of a machine that they have no real visibility into the internals of, and took its output as gospel.

The post office can draw its own prosecutions - no need for the prosecution service etc., and in general, a magistrate confronted with His Majesty’s Postal Service’s Honourable Legal Team (OBE, CBE) and This Dirty Bloke With a Regional Accent What We Reckon Stole From Us will chose the government every time.

in the UK the legal system has devolved beyond all reason

I have read about this somewhere else. The key issue seemed to be in the Post Office internal “Justice” system, which is opaque, biased and refused to consider evidence. It’s pretty stupid that in an age where we strife for universal rights, your job could strip you from defending yourself on the public justice system, with all its defects and warranties…

The IT bug was an issue, sure, but the political mismanagement of an institution stuck in the past is what caused all the ruin for so many people. And it flew under the radar until Netflix made a movie. Actually, the lady running the PO was awarded a Goverment recognition.

IT and code generation is full of pitfalls, but this one lays somewhere else.

From the FT article:

__The state-owned Post Office acted as investigator and prosecutor in the cases, using the general right in English law for any individuals and organisations to pursue private prosecutions without involving the CPS.

A public inquiry into the scandal has heard that the Post Office, among other aggressive legal tactics, accused sub-postmasters of theft to pressure them into pleading guilty to lesser charges.

The CPS has identified 11 cases it brought against sub-postmasters that involved “notable evidence” from the Horizon system.

Legal experts said the government had been warned several years ago that private prosecutions carried a higher risk because those pursuing them were more likely to have motivations other than securing justice.

Lord Ken Macdonald KC, a former director of public prosecutions, said: “If you’ve got a body with skin in the game [such as the Post Office] acting as a prosecutor, that creates obvious risks and dangers.”__

IMO the private prosecution is a necessary safeguard. In other countries you get the opposite problem: a gross injustice has been done, but the public prosecutor won't prosecute (often because the perpetrator was connected with the government in one way or another), so nothing gets done.

From what I heard, this provision for public prosecution dates from the olden days when the post office was carrying important and sensitive messages, like that from the royals, and when they had to fight highway robbers and such. In this day and age, such a provision doesn't make sense, and must be done away with. As for the argument, "nothing else done otherwise", we don't see similar prosecution privileges for other essential services in life (health service, public transport etc.).

Power corrupts; absolute power corrupts absolutely.

From what I heard, this provision for public prosecution dates from the olden days when the post office was carrying important and sensitive messages, like that from the royals, and when they had to fight highway robbers and such.

You were misinformed.

As for the argument, "nothing else done otherwise", we don't see similar prosecution privileges for other essential services in life (health service, public transport etc.).

Yes we do; as the article says, in the UK any private organisation or individual has the right to bring a prosecution.

You were misinformed.

As I could be now. My sources were the BBC and PrivateEye podcasts. What are yours?

It wasn’t actually a private prosecution, the Post Office has the right to prosecute on behalf of The Crown. The CPS can intervene on private prosecutions so it might not have flown under the radar for so long if it had actually been a private prosecution.

Management but no leadership

The CEO managed the crisis, therefore she was rewarded. The government should have shown leadership and demanded answers after the first few years of warning signs. Yet it took 20 years and a TV drama to force them to show any leadership

Well, they only took 60+ years in admitting they went over they line by castrating Turing for being gay - after his huge relevance in keeping the UK safe in WW2 and using his name and face as poster for the role of UK in Computing History.

The UK Goverment really needs to shake off all this archaic nonsense and modernize themlselves. It’s not in line with the fantastic culture of their country.

The good old "Lions led by donkeys" saying comes to mind.

Something to take into account when we consider replacing our legal system with a blockchain.

The key issue seemed to be in the Post Office internal “Justice” system, which is opaque, biased and refused to consider evidence.

No doubt pushed forwards by the racists believing in their heart that some of these sub postmasters HAD to be guilty because of their ethnicity.

Slight correction, ITV made the series Mr Bates vs the Post Office, not Netfix (no idea if its being shown internationally on Netflix under license however).

It’s pretty stupid that in an age where we strife for universal rights, your job could strip you from defending yourself on the public justice system, with all its defects and warranties…

I agree it's pretty stupid, but it is becoming the case for more and more people; while slightly different than the situation in the UK, forced arbitration clauses strip the right of the employee to seek justice, and they're getting more and more common.

It was not a software glitch! Such thing would get normally discovered and fixed in a few months

It was corrupt managers, police and goverment! Literal conspiracy to destroy lives of innocent people!

So the corruption made it that somehow that bug-free software produced inconsistencies in accounting? Interesting.

Nobody said the software is bug free. That is your interpretation. The corruption lies in the fact that the bug wasn't fixed.

Nobody said the software is bug free.

The comment I replied to literally said exactly that.

Seems like an uncharitable interpretation. Reads more like they were replying to the headline by saying management sent people to prison, not the glitch.

No it didn't. It correctly says that the glitch itself didn't ruin lives, and that the officials' response to the glitch did ruin lives.

Who prevented the bug from being fixed?

There were hundreds of these cases, and the affect post-masters were calling the IT helpline repeatedly saying sums of money were going missing. Personally I think that if you have hundreds of people of good reputation and standing swearing up and down that they didn't steal money, and there's a record of their having called repeatedly to report erroneous data in the software, then there should be some sort of investigation to figure out what was going on. Instead they were shouted down and threatened.

There was a case where after a call to the IT helpline the "missing" money just doubled, and the helpline just said "well pay the new amount or go to jail"

The point is that the bugs on their own would have been merely a nuisance leading to unnecessary audits if not for the severe mismanagement around them.

Based on the article, this also appears to be miscarriage of justice. A lawyer recommending an innocent person to plea guilty sounds like a lawyer not doing their work properly.

They didn't have access to the full documentation, let alone be able to review the software itself.

They have access to the suspect. If the suspect don't understand whether they did it, it should call for a full-scale investigation of the software so at least somebody understands exactly why they are guilty.

In Denmark the police thoroughly investigates cases also when the accused plead guilty.

In a country that has a rule of law, punishment is not negotiation. I do realize that it is up for interpretation for the Anglo system due to its over-commercialized legal system.

They weren't investigated by the police at all. It seems that the post office in england can do their own investigations and prosecutions… the prosecutor wasn't even the same you'd get if you steal from a non-post office.

Or they know they are up against a kangaroo court.

whixh would be an even more serious matter for something trying to resemble a first world country.

Agreed. Although, as I understand it, this also happens to be a staple of the US legal system: https://stfrancislaw.com/blog/whats-a-plea-bargain-and-how-d...

didn't terry gilliam make a movie about literal bugs in government computers ruining lives back in the early 80s?

Brazil. With a very unfortunate typo causing mistaken identity.

The most hilarious part to me was when the main character's friend (the government torturer guy) suggested that the mixup was intentionally caused by the wrongfully accused (or something along those lines?) as a way to make the government look bad.

When I applied for a passport around 20 years ago my local passport office used an old fingerprint reader that often failed to register fingerprints. I was told to try different fingers and when all attempts failed I jokingly offered my toes for scanning. The officials operating the system told me that won't be necessary, because "the system has provision for people without hands"... needless to say my hands are still attached to my body.

"Tuttle? His name's Buttle. There must be some mistake."

"Mistake? Ha! We don't make mistakes."

Proceeds to drop ceiling plug through ceiling.

"That's bloody typical. They've gone back to metric without telling us."

Mr Bates vs The Post Office just came out on the ITV, is about the incident in the article.

trailer @ https://www.youtube.com/watch?v=zPkvYXufpAY

Brazil

Wouldn't say it was just a software glitch, bureaucracy tried to cover it for bad press at all levels in the government ran agency.

That's why I don't really (knowingly) want to work on something that could ruin lives if there's a bug. medical devices? Hell no. (But you never know where your code ends up running.)

As you said this didn't just involve software issues to get that big, but I still wonder how all the devs that worked on this over the years could just shrug it off all these years. You can't tell me not a single dev got wind of these issues. Did they just immediately jump to "impossible I'm the most awesome person on this planet no way there's a problem these people all just stole money". I'd have sleepless nights going over the whole system in my mind trying to figure out where things went wrong. Is this some sort of dunning -kruger going on? I don't really consider myself a coding wizard or anything, but I think I know about enough to know I'll keep producing bugs til the day I die. Rust won't stop logic bugs from happening.

Well it was basically some banking software… I'm following a BBC podcast about the story, and basically the managers and investigators at the post office knew there were issues.

They told every single victim "you're the only one who's claiming there are issues".

Perhaps your humility qualifies you for it! It strikes me that the problems aren’t caused by lack of technical chops per se - it’s almost always because people don’t consider the possible downsides of mistakes and act accordingly.

It's not an government agency. The post office is a completely private company.

Not any old private company. The government just happens to own all the shares

"The Post Office is owned by the government, through the Department for Business, Energy and Industrial Strategy (BEIS) and UK Government Investments (UKGI), however, the Post Office Ltd Board has responsibility for the operations of the Post Office" [0]

And "The Department of Business Energy and Industrial Strategy holds government responsibility for postal affairs, including the Post Office" ... "the Post Office Ltd remains accountable to the government" [0]

[0] https://researchbriefings.files.parliament.uk/documents/CBP-...

Indeed. And it was a number of different bugs, not just one.

In my last team I banned use of the word 'glitch'. It was a catch-all phrase for "bug I don't want to take ownership of" used by developers and product owners alike when they were talking to customers. It has no place in a modern tech team.

Can you own everything?

You can own all the bugs.

You can't own everything, but referring to something as a "glitch" is essentially ITs thought-terminating cliche.

These days there are many work places where people don't have enough skill finding and fixing "bugs". People used to copy and paste Stack Overflow answers into the company code base until it "worked". Then if there was a problem it could sit for weeks or months with someone and then moved to another person. It's interesting because in most cases doing simple prints just to see what's going on could give a hint what to fix, but even that was too complex. It's getting worse because if software is making money then there is further incentive to cut costs of development (hiring less experienced and cheaper engineers) and corners (let's not do TDD it's a waste of time). UK government is addicted to big consultancies that are famous for this and they have been exploiting public sector by delivering half arsed projects (if at all) at extremely high cost while tasking cheap and inexperience labour to do the work and pocketing the rest of the money. There is no organisation interested in investigating all those contracts.

I have also banned "random", replacing with "intermittent", in the past.

I found the caption misleading. This article describes the dreadful ramifications of error(s) in a complex IT system. What it falls short of are details on the hows.

It was from the article, not sure how its misleading.

Its pretty much the biggest tech scandal in the news right now.

https://en.wikipedia.org/wiki/British_Post_Office_scandal

Because it wasn't a software glitch. Probably such misleading headline has been chosen to help Fujitsu regain some of its share value.

Discussions on this wiki article from few days ago:

https://news.ycombinator.com/item?id=38937705

Actually the article has been posted since 2 years ago, but didn't get much discussion until recently.

https://hn.algolia.com/?q=https%3A%2F%2Fen.wikipedia.org%2Fw...

Would someone suggest how would hundreds of investigations (and courts? Or there were no courts at all?) were not able to shed any light on the bug? How is it even possible to continue for more than 20 years?

The Post Office knew that the Horizon system was deeply flawed and that the prosecutions they were pursuing were unsound. They intentionally covered up this fact. The courts adjudicated based on evidence provided by the Post Office, but that evidence intentionally omitted things like error logs that would have undermined their case. The defendants had a right to see that evidence, but the Post Office simply denied that it existed. The Post Office were helped by the fact that the rules of evidence presume that computer records are accurate unless shown otherwise - the burden was on the defence to show that Horizon was flawed, not on the Post Office to show that it was reliable.

Lots of things had to go wrong for this miscarriage of justice to happen, but at the core of it all is the fact that the Post Office lied for years - to the defendants, to the courts, to the media, to government. Things should have been handled differently, the case has revealed a number of systematic failings that need to be urgently addressed, but there's only so much the legal system can do to defend against very sophisticated people who engage in a determined campaign of deceit. We can reduce the risk of something like this happening again, but sadly we can't eliminate that risk entirely.

https://davidallengreen.com/2024/01/how-the-legal-system-mad...

It's worth noting that courts (in the common law system) don't generally play any kind of inquisitive role in cases - there's a presumption that both sides bring legal representation for an adversarial debate where the court is impartial and independent.

Given many of the sub post masters struggled to fund their legal cases (and legal aid is not likely going to stretch to funding the intensity of expert witness work required to discredit a whole system), this also seems to be one of the non technical failings not often talked about.

There's certainly questions about why courts didn't spot issues, but ultimately the common law adversarial system assumes both parties can get equal quality of representation. Unless you have very strong technical knowledge (this site is a non representative sample!) you'd struggle to really shed any light on the bug and get a court to agree with your line of defence.

Ultimately that's what did happen in the group litigation by Bates, about which the TV drama was ultimately created.

Imagine you are head of some administration. Times are getting hard, profits are decreasing so you can not embezzle money as much or as easily. You know people under you must be stealing, and some new software just happen to be able to prove it. Here is the solution to help pay your third mansion in the south of France.

The BBC aired this story as a documentary in 2015 as part of the Panorama series. They received threats from the Post Office over the content.

The Post Office threatened and lied to the BBC in a failed effort to suppress key evidence that helped clear postmasters in the Horizon scandal.

The Post Office's false claims did not stop the programme, but they did cause the BBC to delay the broadcast by several weeks.

https://www.bbc.com/news/uk-67884743

It wasn't just a "glitch" -- it was also a PR campaign (quite successful up to now) that supporessed the voices of those affected.

I find it shocking that this has been known about for so long, but only now that there was a TV drama about it is it actually being addressed. It came up in the news and I thought "what, I remember hearing about this 5 years ago, surely it's been sorted out by now?"

My parents both work for the BBC and are pretty disheartened about this (though they're happy people finally care). It really feels like presenting the results of journalism as a drama, rather than a documentary, is orders of magnitude more effective at enacting change.

I would like to understand enough about law, governments and the postal service to assign blame here, but I'll just say that it's really fucking sad that this is still in any way ongoing after being known about for so many years. This shouldn't have needed to be publicised in such a way before the wheels of justice started turning. I remember hearing about this several years ago and thinking "Jesus, that's rough. Technology sure ruins lives." And now I'm hearing about the (still ongoing) situation in 2024 and I no longer blame the tech. It's all people. People are awful.

This is very similar to the robodebt scheme [0] that happened in Australia with similarly devastating consequences: - Welfare recipients' suicide after receiving automated debt recovery notices - Debt notices were issued to deceased people. - Issuing debt notices to disability pensioners.

Looks like Wikipedia has termed this Algocracy (government by algorithm).

[0] https://en.wikipedia.org/wiki/Robodebt_scheme

It's kind of similar, but in Robodebt it was more a case of covertly implementing a wrong algorithm for debt calculation (because it inferred good results for the government of the time), placing the responsibility of disproving the debt on the individual with ridiculous requirements and no support or process to easily challenge them.

It was really a result of the Liberal & National Party (LNP) who were in government at the time wanting to:

- look tough on welfare cheats for image/election purposes

- claim a potential massive source of income (that wasn't really there) for the next budget

Most of the wrong calculations were due to a result of income averaging fortnight or multiple fortnights worth of income and extrapolating them to an annual figure, which is the absolute wrong thing to do when a lot of people on welfare worked as casuals and/or had short term jobs and the fortnight(/s) were non-representative of their yearly income.

In some cases the first time the alleged debtee found out about the debt was via a demand from a debt collection agency.

When individual action was taken against them, the government was reducing debts to $0.00 and then claiming there was no longer a reason for a case against them and had it dropped. The only reason the debt recovery program was stopped was because lawyers acting upon someones behalf also claimed interest on a paid false debt, which meant they couldn't succeed using the same tactic at which point they rolled over and admitted the unlawfulness of the scheme, which had been going for many years at this point.

I recommend this youtube 3 part series to get a good understanding of what was involved:

[0] https://www.youtube.com/watch?v=OfsL9GAbl3M

It starts out mentioning that it resulted in the largest class action lawsuit in Australian history...

- look tough on welfare cheats for image/election purposes

Ironically the same reason for the Horizon system which ballooned into the PO Scandal.

And of course, the public servants and the politicians involved largely got off scot free. In the Australian Robodebt case as well as the British Post Office, the key people involved should be facing criminal proceedings to deter this sort of thing!

Wow, if something like this happened in the mainland Europe, probably many heads would fall, up to the ministerial level. For the establishment in good ol’ England it’s just another day.

Did you hear about the Dutch childcare benefits scandal (https://en.wikipedia.org/wiki/Dutch_childcare_benefits_scand...) yet? That one affected about 28’000 parents with 70’000 children and, due to the illegal separation of children from their parents this resulted in, is at the crimes-against-humanity level.

if something like this happened in mainland Europe, probably many heads would fall....

Absolutely no they wouldn't.

In UK law, it is presumed that computers systems work perfectly, and it is up to the "other side" to prove they didn't.

In short, if the computer said so, it is a fact, in court.

Lots more information here

https://evidencecritical.systems/2022/06/30/briefing-presump...

In UK law, it is presumed that computers systems work perfectly, and it is up to the "other side" to prove they didn't.

This is kind of true, but there's a bit of nuance -- presumptions can be rebutted by the other side adducing evidence to the contrary. The evidence does not have to be strong enough to "prove" anything, it's generally sufficient that it suggests the presumption might be invalid. So in colloquial terms you don't have to prove the computer system was flawed, just a real possibility.

That said, the article you linked to does go on to explain why even this scheme didn't work out in the post office cases.

I don't fully understand how people end up in prison over this... If the software says there should be X money in the cash drawer, but there is actually Y money, then clearly something is wrong, so you investigate.

You get the software to print off a list of every sale made that day with timestamps. You check the stores CCTV so see when every customer came into and out of the store. You cross reference. Most are small stores that probably only see 300 customers all day anyway.

If you are a store manager wrongly accused of fraud, of course you can be bothered to do this.

It would only take one store to do this to find out that the software was wrong.

There were more than 900 convictions, and you think that not one of those accused would have tried to prove their innocence?

Typically, when it gets to the point of being an accusation, employees will lose all access to all systems. It's also widely reported that evidence was routinely suppressed and emails conveniently lost.

And the software was found wrong in November 2019, when a judge ruled that the software contained bugs, errors and defects, which caused a settlement in that case and a cessation of further cases.

The oddest part is that with all these Horizon problems being pointed out for decades and then a judge ruling that it was faulty, there still hadn't been an inquiry until very recently.

I wonder if you could provoke the doubling bug until you got to a ridiculous amount or an overflow that would be just too ridiculous to not be considered a bug.

If a post office owed a billion pounds then that would be impossible to blame on the postmaster.

Yes but ironically if you intentionally triggered the bug you'll definitely be found guilty of fraud.

I don't think the "I was just trying to show how ridiculous the bug was" defense will go too well in court.

The UK Postcode scandal is interesting in that it was exposed at all. Many people in countless industries have had their lives ruined by similar software bugs and bad industry practices, but those people have not been able to fight or to get justice.

I remember meeting two guys in a pub. They worked for a bank. When I told them I work in IT, they said, 'Oh, it's because of IT systems failure that we had the 2008 financial crisis.' They truly believed in what they were saying.

P1: I brought down prod with a faulty migration script

P2: oh Im responsible for the deaths of many people and court cases spanning 2 decades. There's even a tv show based on the story

For anyone thinking the Post Office being a private company absolves the Government of any responsibility, not so fast.

It's wholly owned by the Government and there is a minister responsible for postal affairs, which includes the Post Office. However it has its own board for day-to-day operations.

The government has a _lot_ of blood on their hands

Source: https://researchbriefings.files.parliament.uk/documents/CBP-...

The worst thing about it was the assumption that the computer system was right and the people were crooks. This should give people wanting to stick AI into every orifice of our lives a pause, but I have a feeling it will not.

The glitch wasn't the reason people went to prison. It was incompetence, perverse outcome-based executive incentives, plain corruption and baseless hubris.

The problem that this unearthed was that evidence of crimes committed through information systems can be obscenely complex and therefore obscenely expensive to defend against.

"Computer says guilty" shouldn't be enough, but a defence would take months of debugging. Not something somebody on a £20k salary could ever afford.

But that's what happend. Hubris that Fujitsu's system was infallible. Targets and bonuses that stopped management asking uncomfortable questions. Layers of incompetence meaning people weren't asking the right questions, missing the correct burden of proof in the legal process.

And all this over an accounting system that can be forensically picked apart. Just imagine how bad it'll be when it's a black-box AI.

This story has finally been getting the air it deserves.

If you want a summary and insight into the staggering scale of the injustice, this article from Private Eye magazine is worth reading:

https://www.private-eye.co.uk/pictures/special_reports/justi...

This BBC radio programme, started in 2020, also gives a lot of good information including details of how suspected sub-postmasters were questioned by the Post Office.

https://www.bbc.co.uk/sounds/brand/m000jf7j

I would guess it's the corruption of tenders in the public sector.

The organisation in the public sector must choose the right vendor and then any sh*t sold by this vendor is correct, otherwise it would prove the wrong choice.

You were never going to get the best of the best working on this system in a random soviet looking tower block in Bracknell.

As a software engineer who has built distributed systems, I can attest that building reliable software is really difficult, bordering on impossible.

The hardest part is probably handling and recovering from all possible failure scenarios. You need to make sure that the system could crash while in the middle of processing any line of logic in your system and it should be able to recover elegantly; without skipping anything and without re-processing what has already been processed (which can cause duplication of records).

The challenge with distributed/partitioned systems specifically is that atomicity is much harder to achieve and strategies for achieving a similar result are complex and error-prone (e.g. two phase commits, using idempotency to avoid double-insertion)... For complex database transactions involving several tables with a custom two-phase commit mechanism, you have to be careful to process records of different types in a specific order. Also, you need to set up your database indexes carefully for fast lookup and sorting...

It still intrigues me that if I want a steel beam in my house I need a 'Chartered Engineer' to calculate the size. If I want to know what it will cost I use a 'Chartered Quantity Surveyor', and to work out the financial implications I might use a Chartered Accountant. All of these have rigorous multi-year training, with experience requirements...and a code of ethics. This is all for a £1K contract.

Meanwhile, if I want a multi-billion dollar critical infrastructure software project, I can use people who learnt to code at summer camp. Isn't it time we had proper qualifications for software 'engineers'?

Right how, software is seen as truth. This is simply not always the case. I’m happy ai is getting a lot of attention, as people seem to be more aware that its judgement can be wrong.

What is needed is the requirement that software decisions must disclose their data and decisions path/ “algorithm” in court.

Another thing we need laws for is banning a person from using a system. It’s insane that you can be banned for life without recourse or explanation. It’s basically you being thrown in jail for life without reason.

it’s not the software glitch that ruined life.

It was people making decisions about suppressing evidence, lying, not correcting the mistake.

It’s like saying faulty breaks killed thousand people and omitting that the company could have recalled the car after the first car but chose not to.

Is suspect this kind of reporting is intentional to minimize pushback. “How Fujitsu and the Post Office drove people into suicide” gets you angry lawyers. Blaming the glitch not.

The title should be "how post office executives knowingly ruined lives and obstructed justice"

The glitch caused it, but it was the way the UK Post Office handled it that ruined lives.

I think the higher ups of the post office should got to prison for a loooong time, strip their wealth.

It was a software glitch, but the coverup was massive, even years ago. It was intentional to hide their incompetence.