My impression is that the quality of train firmware is generally not very good, and I hope that this scandal will lead to greater scrutiny. 3 years ago, Deutsche Bahn publicly complained of "grotesque" software problems with newly delivered Bombardier trains. For example, when train drivers changed the direction of travel, the train software would crash. It then took 1 hour to boot the train up again [0]. Switzerland had similar problems in 2018 [1].
As a computer scientist, I find this embarrassing. Just compare these modern trains to the old trains built in East Germany [2] during the 80ies that were pulling old West German carriages [3] from the 50ies here until recently. Minimal or no usage of digital electronics. No "boot times". They just worked. And if they didn't, the train driver usually knew where to hit the engine with a hammer to fix it. You cannot expect a train driver to hack into the train firmware and fire up gdb to find out why it doesn't move.
[0] https://www.sueddeutsche.de/wirtschaft/deutsche-bahn-ic-1.47...
[1] https://bahnblogstelle.com/33872/twindexx-swiss-express-soft...
Bad software is a symptom, not the cause.
What is the cause then?
Bad culture that views software as a necessary evil or afterthought rather than an important part of the product.
Same as industrial design then. You get the occasional Braun, Herman Miller or Apple, and a vast number of nondescript silver/beige/black boxes.
It's probably true of lots of aspects of product design - if it's not driven from the top, it's mediocre.
Yeah, unless the engineers are using the product themselves. People in general seem to take care of their own tools. Much harder to get them to look after a product they don’t use themselves.
Here it's more like the software - any software - is a problem. I agree with GP, and my experience confirms that adding software to something that used to work without it almost universally makes it worse in every aspect, understandability and repairability being just two major ones. On top of that, taking anything that run on old-school industrial/embedded firmware and replacing that with software using modern practices and stacks of the software industry, 100% makes the product go to shit.
That again is a problem of leadership and not of software.
SW has an input problem and a testability problem. On one hand, the inputs to the SW are not limited (iMessage happily accepts any image file) and testing is limited to some known inputs. Software vulnerability assesment (worst case analysis) is usually performed outside of the development process at very high costs and limited outcome.
I think the compensation given to software developers by companies that view software as their product has drawn many of the skilled software developers away from jobs that would have once grabbed them because of the fun factor. Companies that make things that contain software are not in markets prepared to pay 2 and 3 times what they were for software. What you are left with is people who are willing to accept that fun factor as the difference in TC, and people who couldn't get jobs that paid more. These are the people we have making most of our safety critical systems. Go look at software developer compensation at X vs spaceX. That is the market at work. Fun does count as TC, but you also end up with people who aren't good developers pivoting to engineer new processes and tools in these domains. They latch on to whatever fad full stack is just getting over a case of, and try to apply it to train firmware. It wouldn't surprise me to find out they are all about scrumfall and have 10x more text in jira than git. And they have restful apis, or service oriented architecture in a safety critical embedded system.
Facebook, Amazon, Netflix, Google, Twitter are not selling software but they are able to attract a lot of skilled developers
They all are selling software, with SaaS model.
You can compare this situation with Boeing. And issues they had with software of 737max.
Some of the problems here might have been logic problems by inept coders; however, the underlying theme of this scandal is corrupt management. Even the erroneous code was an explicit piece of fraud that almost certainly was done under order by someone in the management chain.
Speaking from experience in vehicle firmware. Typically, job requirements include hardware qualifications and web software isn't considered relevant experience. People with web backgrounds are in demand for adding telemetry and diagnostics data collection to products containing software, but those are separate components.
That is true. You can get few times more money as regular Spring Java developer making CRUD in some bodyshop than writing industrial software for local Polish company.
I think you are correct. Because the pay is so miserable the talent pool is mostly vba developing engineers from inside the company. Because of that they can’t hire good technical leads that know or can enforce good practices or design good architecture. The result is a giant mess of software in trains planes and automobiles
The quality of SW has nothing to do with the pay. Notice that FAANG SW developers do not deliver safety critical SW.
There are more things to SW development than writing code.
Because this software is not made by software engineers, it's made by plc programmers, electric circuit designers and whoever did drift into the field.
Except for beckhoff to tc3 they haven't made it to object orientation yet, so the field is stuck as a whole in the blue screen mines of yore. Managing complexity with thin standard docs, no version control while the machines grow ever more complex sensor and actuator wise..
You can not treat modern machines like small embedded hobby devices - but the industry does.
Some outside-programmers make good money coming in and solving these yesterday's problems with proper software architecture and good c development practices. But the industries doesn't learn from this. Making software will forever not be a profession for them.
I'm not sure if you've ever used modern software. It's sometimes amazing just how unreliable it is. Web browsers crash every few weeks, windows is known for regularly needing a reboot, evince regularly crashes on me, you can't call 911 with some of cell phones, ... . This reminds me of https://danluu.com/everything-is-broken/ .
The clearest example of the difference of reliability is looking at public digital signage (on transit and elsewhere). If it's based on LED segments or something similarly basic (with old-school embedded software development) it will basically always work. New LCD Screens inside trains/busses and outside working with a modern software setup (using an OS, often with a pc architecture, quite often just displaying a website) are broken ~10%-20% of the time. Looking at (for example) busses, a large portion of the time the screen will either be blank, not display anything, old information or just wrong information. Going inside fast food restaurants with large LCDs for the menu, often something is broken, frozen or something else.
It is of course possible to make modern software more reliable. It's just much, much harder than making embedded software or PLC programming reliable. Software can be easily made more complex, but it's hard to make it non-complex or to wrap the complexity so it isn't an issue anymore. The ecosystem isn't set up for non-complexity.
I think to make software more reliable, you'd have to go back to the "waterfall" method of development.
If we went back to Dijkstra's notion of correctness by construction, then a specification for the program would be made, and then a programmer would prove their part of the code correct to the specification. They would write the precondition and postcondition of every effectful statement, document the invariant of every loop, and prove by induction that each loop does what it's supposed to do. Basically, annotate your program with Hiare triples. (There are books about how to do this). Then, extensive tests should be run for as much of rhe program as possible.
Nowadays, we have tools for this so that we don't actually have to write a proof by induction for every loop; instead, we have bounded model checkers. In theory, the manual proof writing could be isolated to the parts of the program whose properties a bounded model checker cannot verify.
However, it seems like this whole plan is infeasible unless regulations are written that enforce this onto the industry. It would make them a lot less productive, and therefore less profitable. The only benefit would be that software is more reliable. By necessity, it would have to become simpler, too. For instance, there's absolutely no way that web browsers like Chromium, with 38 million lines of code, will ever be verified, because they're too large and complex.
> ... they haven't made it to object orientation yet, ...
Not always a blessing and I've actually recently been thinking (e.g. in context of Lua) if object orientation is in most situations not better to avoid.
100% agree with this. IMO there are a few efforts to modernize PLC programming but I feel like they are still stuck in the 1990s software development. Take a look at Codesys, got Git support few years ago and in very bad shape. How do you test your code, in the field or buy another Codesys testing plugin....which is in rough shape.
The issue is as machines get way more complex this issue gets worse. Also there are generations of PLC devs that still want to stick with ladder logic. Huge fragmentation.
There are more "engineers" writing software for your car or a train than "engineers" at Microsoft, Google, Apple or Facebook.
I don't think that someone will be happy when driving with 100 km/h on a highway, the car will suddenly decide to restart itself. There are bugs everywhere where profits are put before engineering but calling those people names is not constructive. Especially when they use SW created by "engineers" which crash with no apparent reason when they are doing their work.
Haha wait until you find out how TVs worked in the 70s and how fast it was to change the channel *sob*
Even in the 90s, you could just power it on and it would show image near-instantly. Warm-up time and channel switch time were all firmly under one second. With the exception of cable TV set-top boxes, which were separate devices and first to include the ridiculous boot times and delays, that still would seem blazingly fast compared to what we have today...
Sometimes low-tech is just better. Here in Finland we got Sr1 electric trains from the Soviet Union in the 70's, and after some renovations the model is likely to stay in use at least until 2030.
Simply of old designs is often a blessing as long as the drawing and documentation is readable and good. It can be hard to get replacement electronics for 1970s designs so sometimes you have to design new components but the functionality was relatively simple back then so it’s possible to build a 1:1 replacement
That is also my impression as well. The softwareization of trains has led to deep regressions in both basic reliability and interoperability/flexilibity. Many modern trains suffer from software issues for basic driving [0] and delays when getting the software approved [1]. But the loss of compatability is in my opinion the worst regression. Modern EMUs basically only work together with other EMUs of the same batch. Even the same model ordered by two different companies often don't work together and basically forget about trying to use EMUs of different companies or ordered over a decade apart together. Meanwhile pre-digital everything it was common to use e.g. trams of different generations together and rewire them to work with each other. Older train cars work together without issues, good luck trying to use an IC2 and a Railjet together (or a RailJet and ICE-L). Even certain locomotives and train cars would often only work with each other.
It is way harder for different computerized systems to work together due to the higher complexity and more obfuscation (a traditional logic circuitboard is often easily reverse engineered. Reverse engineering software is a very specialized task). This is also very noticeable in other sectors, where interoperability has become much worse due to moving to proprietary digital protocols.
This is in part due to the difficulty in getting software approved as compared to previous tech (due to software being so intransparent) but also because of truly lacking quality. One of the reasons Bombardier was so deep in trouble was bad software, even leading to a contract of over 40 ordered trains just being cancelled ([2]).
In my opinion building reliable (and understandable) software is way harder than building logic or even mechanical systems. I don't know what the solution is, but it's been a problem for a long time.
[0]: https://www.vrt.be/vrtnws/de/2013/02/12/belgische_bahn_storn... [1]: https://www.augsburger-allgemeine.de/augsburg/Neue-Zuege-auf... [2]: https://de.wikipedia.org/wiki/Bombardier_Talent_3#%C3%96BB
Except that this isn't really a story about poorly written software; it's a story about corrupt management. Further, if we look at Boeing's recent issues with the 737Max, it's the same thing. In both of these cases, the bad software was almost certainly ordered to be written by management acting fraudulently for profit. The one error that has been discussed in the article was a stupid mistake, quite possibly due to the logic conditions being made overly complicated in order to enable the fraud, but the recurrent theme of all of the real underlying issues found was intentional design malfeasance, not incompetence.
You could say that about most software where the fresher the framework the more glaring the holes - here's a recent post about it: "Software disenchantment" https://tonsky.me/blog/disenchantment/
I don't think fixing the software failure will improve DB's punctuality
In the case of Bombardier, I suspect contracting also contributes to the problem. The same for financial institutions.