I found it interesting that Boeing did proactively tell airlines to inspect 737 MAXs for a possible loose bolt in a different part of the plane (rudder section) at least 8 days before the January 5th event. Example story: https://www.reuters.com/business/aerospace-defense/boeing-ur...
Unfortunately, Boeing did not know they had other issues with the plug door bolts.
Imagine the quality of the manufacturing and QA / final inspection to have that kind of issue.
I expect even at its worst, software development could learn a lot from aircraft QA.
Especially since most shops have pretty much tossed professional career QA out the window.
I work in the defense industry, it’s very much like the aerospace industry in that we deal with human life as a consequence of our work. We have software QA departments that operate very much like manufacturing or aerospace QA.
Software QA provides nothing of value to software development; having it as a dedicated function works against the overtly stated goals of the function and counterintuitively acts to degrade quality within software by mandating strict top down process and brittle end-to-end testing.
Although Software QA is intended to be an independent verification body that provides engineering organizations with tools and resources, in practice they function as a moral crumple zone [1] within the complex socio-technical defense industrial system, being one of the groups that the finger will be pointed to when something goes wrong and absorb shock to the business in the event of a failure. As a result they have a strong incentive to highly systematize their work with specific process steps, to shield them from liability, which can be applied generically to all projects.
Good software teams build quality into projects by introducing continuous integration, unit testing, creating feedback, and tightening these feedback loops. This acts to find problems quickly and resolve them quickly. Software QAs need for high level, top down, generic systemization requires them to work against these principles in practice. Bespoke project specific checks, such as unit testing, is not viewed as contributing to the final product and is discouraged by leadership who see it as waste.
To give an example of how these dynamics destroy quality in software. I once found a bug in software on a piece of test equipment where a logarithmic search function was not operating on a strictly sorted list. When I pointed this out to my leadership I was told that if we changed any part of code, it would require a new FQT, which would be too expensive to conduct and was not in the budget. Although the bug would have been trivial to solve, and was clearly wrong and would not provide any benefits by remaining in the test equipment software, the process required for changes prevented solving the issue.
[1] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2757236
No. Good software teams are led by competent, technical management. Managers who aren't afraid to get down into the dirty details. Managers who aren't afraid to roll up their sleeves and write code if they need to.
The process doesn't matter. The management of what is or is not important does. Agile is just one process out of many.
Imagine an accounting team led by someone who never did accounting in their life: "Just make the numbers work out! I don't care how you do it! My bonus is at stake!"
Sigh... This myth that the only people who can competently manage developers are other developers has been floating round for decades.
For some reason, developers seem remarkably blind to the skills other roles and disciplines require. Only a developer can do that, everyone else is basically useless fluff. Maybe it's a form of arrogance or just deep unself-awareness.
Let's apply your reasoning to medicine. I'm sure you would be completely fine with managers telling your surgeon what parts of the surgery can be 'optimized away'.
Haha, funny strawman. My reasoning is that non developers are capable of managing developers, notably people who have good management skills.
Your contention is that the surgeon should be running the hospital.
Hahah, indeed. So have you seen a law department in a company headed by someone who doesn't come from law background? How about a finance department headed by some schmuck who doesn't know anything about finance?
I disagree with you. You are stating it, but you are not giving reasons. Managers who weren’t developers tend to not be able to manage the team. They can’t help with or understand the technical decisions made. The non-technical managers tend to be project managers just focused on dates.
The head of surgery should be a surgeon, not an accounting manager.
The head of accounting should be an accountant, not a surgeon.
And even at the executive level of a hospital, you would want people who have spent their careers in healthcare, rather than, say, architecture.
But that doesn't contradict the parent, does it? I'd say you both make good points.
Well... here's a thought experiment.
Let's say you have a bunch of school children and architects create a skyscraper. I've given both groups the process to design a skyscraper.
So in both cases, I should end up with a safe building?
I’d bet the children would come out better simply because they have parents who are likely multi-disciplined as a group. A disparite group will (almost) always come up with better results than a homogeneous one (at least in my experience)
Why not both? Am I missing something? You can have feedback loops and CI and all that, "good craftsmanship" or "good practices" (not "best" practices because those often suck hah), where of course opinions vary on the details of that -- and then someone who is also good at the craft who spends more or most time on helping the rest work together, i.e. manage/lead them.
...or perhaps with no managers at all. I'm less and less convinced of the importance of management in engineering except to give investors an illusion of control.
I sort of agree, and I do think it’s possible depending on the team. But unfortunately developers can be too opinionated and get focused on low priority things.
Agreed, for good software teams.
I would content that most software teams at most companies are not good.
Which is to ask, with an average to bad software team is it better to have integrated or separate QA?
Does it make sense to degrade the performance of good software teams because bad software teams exist?
Ideally we’d always have good software teams, but in the real world sometimes you have to build software with bad teams.
Leaders have options, they can do things like reduce scope, increase budget, increase schedule, or full on abandon or cancel the project. These are all options available to leaders, but they require tradeoffs and decisions to be made on a project by project basis.
It is scalable to have a strict process that everyone has to follow, then impose a watchdog to enforce it on a wide scale. It may not be better to have separate QA, but it is easier for those in charge.
Consider the classic statistic "most drivers think they are above average".
I posit that the same is true of software teams, almost every team will self-assess as above average, i.e. good. Those teams will then imagine that, being good, they build quality into the process and very little verification QA is done.
I have worked as a software consultant for 15 years now. I've worked with at least 40 separate software teams in that time. Every single team manager would pep talk with "this is the best team I've ever seen". Some of this is obviously blowing smoke to get people to work harder and feel good. But over the years I've had candid conversations with managers and realized that most of the time the genuinely think their team is really good, truly top 10-20%.
Here's the rub. Being a consultant, I'm almost always brought in by higher level management because something is going horribly wrong. The team can't deliver quickly. The software they deliver is bug ridden. They routinely deliver the wrong software (i.e. incorrect interpretation of requirements.)
Often times these problems are not only the fault of the development team, management has issues too. But in every single case, the development team is in dire straits. They have continuous integration sure, and unit tests, and nightly builds, and lots of green check marks. But the unit tests test that the test works. The stress tests have no reality based basis for expected load. The continuous integration system builds software but it can't be deployed in that form for x, y & z reasons, so production has a special build system, etc...
In 15 years I have never once encountered a team that would not benefit from a QA team doing boring, old school, black box manual testing. And the teams that most adamantly refuse to accept that reality are precisely those that think they are really top tier because they have 90+% unit test coverage, use agile and do nightly builds.
So, my question is, do you (I don't mean the specific "you" here, rather everyone should ask themselves this, all the time) think that most bad software teams know they are bad? Including the one you are part of? Would it really hurt to have some ye olde QA, just in case, you know, you are actually just average? :)
I'm curious: in your many years of being a consultant to these bad teams, where the manager really thought they were top 20%, did you get a chance to talk to the rank-and-file team members, and did they paint a very different picture of the team health and software quality than their manager?
Also, did you run across any orgs where they basically refused to use a process like Agile, and instead just did ad-hoc coding, insisting that this was the best way since it worked just fine for them back when they were a 5-person startup?
Yes, generally I join teams and work as an engineer or sometimes as a team lead, so I'm talking to all the team members.
Most start up teams are composed of junior developers, often pretty smart people. Usually 5 or fewer years of experience. Many times these are people who have already accomplished stuff they didn't think they could do. So that generally means that yes they think pretty highly of themselves. To a degree it is quite justifiable, they tend to be very accomplished but in a narrow domain. Unfortunately they don't realize that their technical accomplishments in a specific field does not mean that they are experts everywhere. Their managers understand that these are smart people and assume again that this is therefore a good team.
Non start ups that I join are usually just plain dysfunctional.
Usually more the opposite. In my experience I come across teams that are sure they must not need any help because they follow all the rules in Scrum and have great code coverage metrics.
It is really common to see this kind of thing. I call it "the proxy endpoint fallacy". It can crop up anywhere that there is something that can be measured. In that example, it would be confusing adherence to Scrum with having a working SDLC or perhaps confusing code coverage metrics with the objective of having bug-free releases.
This isn't a software only fallacy. In politics, GDP is often confused with societal well-being. Always be wary of your metrics and change them as required to keep you tracking your actual goals.
Not parent, but in my experience as a consultant working with bad teams, the rank and file were 'doing the job.'
You usually had a few personality archetypes:
- The most technical dev on the team, always with a chip on their shoulder and serious personality issues, who had decided to settle for this job for (reasons)
- The vastly undertrained dev who was trying to keep up with the rest of the team, but would eventually be found out and tossed, usually to blame for a major issue
- The earnest and surprisingly competent meek dev, who presumably didn't have enough confidence to apply to a better job, but easily could have made it on merit, work ethic, and skill
- The over-confident dev who read a bit of SDLC practice, and could see every tree while missing the forest
The key is that, aside from the incompetent person, they had all always been working there for awhile. Consequently, there wasn't good or bad health and quality: there was just "the system" (at that company) and dealing with it.
And none of these folks ever worked at 5-person startups. ;) I think it was definitely more an issue of SDLC "unknown unknowns" they should be doing, than willful decisions not to.
Here's a real, perhaps unexpected counterpoint. Say you have a good software team. How do they build good software with bad management?
They quit.
Exactly!
Quit and do it for someone else
It makes the most sense to me to match the org structure to the teams you have.
If I'm trying to build something with undertrained, demoralized, underpaid engineers... it's not optimal to use methods intended for self-motivated, high-performance teams.
And nothing says there must be company-wide mandates. Maybe this area gets a formal, independent QA team, but this other area doesn't.
My experience just doesn't bear out that collapsing the QA function into development always leads to better outcomes.
I've seen the opposite happen too often, and QA be the sole bulwark between idiocy and customers.
If your devs aren't good what are the chances of your QA team being good enough to make up for their short comings? The dynamics laid out by the parent comment will just hit even harder. Your best bet is to enforce basic practices like continuous integration, coverage goals and maybe a coverage ratchet as a merge gate. Training and education on areas were the team is weak is also a must.
I don’t work in this industry, but this seems fairly ridiculous on its face: software is not at all like manufacturing.
In manufacturing, there’s a design and a manufacturing process, and a critical function of QA is ensuring that the manufactured produce is manufactured to spec.
With software, the software is written, compiled, and then repeatedly copied. And something should verify that it’s copied correctly, but this is straightforward and boring.
So software QA ought to be much more like the kind of validation that happens when designing hardware, not like the kind of testing and validation that happens as products are manufactured.
Ideally there should be a solid spec written and then qa can test against the spec. Maybe there is somewhere that does write solid specs, including accounting for corner cases, but in my 25 years working professionally in the industry, I’ve never seen it.
The only complete and precise specification of software is the code itself. If some other form of specification was complete, we would be able to auto-generate the code.
This is beside the point. The code specifies what the product _is_, not what it "should" be. If you ask for a word processor and I deliver a perfectly bug-free and feature-complete calculator would you really believe it lived up to spec?
This is also beside the point. I think both of you are trying to warn against the dangers that lie on both sides of this coin: people can invest too heavily in a specification and waste an enormous amount of time, and people can immediately jump into coding and code something that does not do what it was intended to do. Like with all things in life, there’s a balance between these two extremes that’s correct.
You need some level of specification so you know what you’re building, but you have to keep in mind that the final code defines what the behavior truly is. Sometimes, that behavior unintentionally becomes part of the specification because users begin to rely on it.
I do like the fact that you both used hyperbole to succinctly illustrate the dangers of veering too far in either direction though :)
A (human language) specification is simply _enough_ information about a system that a human can figure out the intention of the author. The smarter and more context-rich the human, the simpler the specification can be. The dumber and less context-rich the human, the closer the specification needs to be to code.
It's asymptotic. By the time you reach a human who is as dumb as an actual computer, the specification _is_ the code.
I work in software for CPU design/verification. Even here, where in theory there should be a rock-solid spec, there's not. There's a 12,000 page architectural specification, which is very helpful for specifying all the end-user visible state. But the microarchitectural specification is scattered all over different PDFs, visio docs, excel sheets, and sometimes the only spec is the RTL code itself.
I work in MedTech. We do this. A design has to be reviewed by QA, and is then tested, and the test is reviewed again. So just to counter the narrative, there are companies that do that, and it is working. In other jobs I also saw the cargo cult of QA. But in some industries it is just crucial, otherwise the pressure is too high to cut corners to implement something. It is a good mechanism to counter the need to move fast and break things.
I think that's why people always tell each other to not take things at face value.
Of course there is a big difference between sw and hw QA, in the thing that they test, and how they test them.
But they are also very similar. Any QA department has to think about ways that things can go wrong, and what things to test for, how to test, which testing methods, which standards to handle, keeping certifications, etc. During testing you also need to keep reevaluating if you actually are catching each problem/bug and how to implement changes in your company that decreases the amount of problems or increase the amount that you catch.
I think in that way there's a lot of overlap in thinking about business processes and how to identify problems with them.
Of course once a specific binary gets tested and approved by QA it shouldn't matter if it gets copied or whatever as long as you make sure its the same binary (by a checksum for example).
But still making sure that errors don't reach the customer, is vital in any QA. If errors does happen, QA is the department that can make sure that it doesn't happen again. And ofc be able to proof in court that you did your due diligence if something does happen.
It sounds like they are calling something QA but using it as a liability shield. It makes sense that you are upset about that, but naming something QA and having it do something else doesn't mean that QA as an effort is bad. It means that the people doing that are being deceptive.
Fair point, you are correct in your inference that there are some bad actors in my workplace. However, I’ll argue that the fundamental dynamics of bifurcating the responsibility of quality from software leads to a steady state where all QA departments end up as a liability shield given enough time.
This is driven by Pournelle's iron law of bureaucracy [1], which says that people who promote the bureaucracy rather than the mission of the bureaucracy will get promoted within the organization and come to dominate its decision making.
For example, in schools, administrators make more money than teachers. This is despite both groups having similar levels of education and intelligence. The reason for this is that administrators know the laws and regulations of the environment they’re working in and ensure the continuity of the organization. Despite not directly contributing to the organization’s stated mission of education, they are in charge of the organization and take more benefits from it.
Software QA has similar dynamics. A QA department may start out making good faith contributions to the organization. Eventually there are product failures, eventually leadership needs a scapegoat to show they’re doing something, and eventually QA takes the blame. People get moved, demoted, or fired. QA realizes its risk, and takes steps to mitigate it. They create a highly systematized workflow and process, adopt or introduce standards. Then assert that following process equates to good outcomes. When bad outcomes occur, they point to their strict adherence to following process as evidence of innocence.
If the process does not support the work or mission, that is a cost they are happy to impose on other functions to deal with. This is the final state until a system disruption happens.
[1] https://en.m.wikipedia.org/wiki/Jerry_Pournelle
But what makes 'software QA' fundamentally different than 'non-software QA' to give it the problems you foresee?
Because every QA is something new for development barring regression testing.
An equivalent software QA to building planes would be to verify a known process with existing tooling.
I have seen a case of Software QA taking a very different shape, so I'd like to argue that the outcome you describe is not intrinsic to software QA, but rather to company culture.
The case I'm talking about does not have a separate QA department, but QA people as part of every software team. If a product fails, that team is responsible, so software devs are in the same boat as QA. They focus on learning from these failures, so no scapegoat is needed. Process does get followed, but not as a defense mechanism, but because not doing so introduces noise that is an obstacle to improvement. In case of bad outcomes, people do point out that they followed process because then it is clear that the process is involved in the failure and should be improved.
Unfortunately, companies with that kind of culture are rare.
It's like saying communism isnt the problem, but that it s how every single group attempted to implement it that should be blamed.
Sure, maybe, but if nobody ever can implement the theoretical utopia, maybe we should talk of things humans can do instead and ditch the unimplementable idea.
QA cannot be done by a separate team the way you dream: it will always be a political buffer zone staffed by the cheapest half-competent people you can find, expulsing good people into dev or management. Or you merge it into dev/solution design.
The reason is simple: just like contract law, you only care about quality once you are in trouble and need to reverse back the source of the issue to give to the client a post mortem. Otherwise, you care first about velocity, or $ input/hr of effort.
Other fields do QA just fine.
About 2000, Software QA (and almost all traditional QA activities) were changed. The focus was on process over inspection. "Design in quality, do not inspect it into the product"
Suppliers (to include software) were expected to manage the quality of the product they provided; the purchaser would focus on how they managed the process, not in the compliance of every part.
This had a chance until software process was tossed in the name of "agile".
Here's a stupid question: How do you know your process is good unless you inspect it?
"Hey Bob I know you're a competent engineer, but don't worry about specifying a certain type of bolt or loctite, the untrained assembly personnel will figure it out. I'm sure they won't let 200 people die in a plane crash."
Make it impossible to use a wrong bolt, and train the assembly workers.
I recall a bug I was involved with at a telecoms equipment market in the early 2000s. The bug only showed up in our biggest base stations in high load situations. We diagnosed the bug, and there were a couple of parts to it. Sloppy software design in an optional hardware module (no state machine) was one part - and was fixed. But there was another underlying issue in the way message queues were handled.
Anyhow, the fix for this was created and written. But we never got to put it into production. The reason: the company didn't have a lab test facility that could put a sufficient load on the software to prove it. Even though we were getting field failures because of this issue that were getting a bad rep, we couldn't fix it because even though the old code was known to be buggy, we couldn't prove the new code. So the process said we couldn't ship it.
I’m going to be using “moral crumple zone” in every conversation I possibly can from now on.
"Hey, wanna hang out with a beer this evening?"
"Nah, I don't need to slouch anymore in this moral crumple zone!"
"We need this new feature in our program!"
"If we implement this it means the management fell in a moral crumple zone"
"What seat would you have to have for your flight?"
"Anywhere, but not in the moral crumple zone, please"
An example from my actual work life, lightly fictionalized—
Them: “Please review this design.”
Me: “Ok, sure, when do you plan to start coding?”
Them: “Oh, it’s already in beta.”
Me: “So you can’t do anything with my feedback, but you’ll say I reviewed it?”
Them: “Well…”
Me: “You’re putting me in a moral crumple zone here!”
I think you're sort of misunderstanding the role of QA.
You think that QA is a liability shield, but that is only a side effect of the work that they actually do.
The task of QA is exactly that: an entity that tries to assure that the quality is up to some standard. Even in favourable conditions mistakes happen, so how do you make sure as a company that not 1 in every 100 product are faulty and tarnishes the good reputation that your company has spent so much time and money on to build? You hire a QA to make sure problems get caught before delivery.
But if all humans make mistakes, and QA is human, how do you make sure that the QA doesn't make a mistake? A never ending chain of QAs expecting each other?
No of course not. One thing that helps with reducing errors is to have a rigid protocol that is followed to the letter everytime. Pilots, for example, have a preflight checklist that they have to run every time they operate the plane.
The rigid protocol of QA teams is therefore an essential part of their jobs.
Although from your standpoint as a developer it might seem strange that QA is 'preventing' you from fixing a bug, it is actually very reasonable.
Especially since you work in the defence industry, I hope you understand that it is very important that the software that operates radars, planes, missiles, bombs, etc is working exactly as expected. Understandably there is a great deal of effort made to assure that when those things are needed they work exactly to spec.
So in your example it is probably very reasonable that any change you make needs to go through some rigorous process. The fact that it 'only' was about test equipment, doesn't matter because test equipment is just as, if not more important as the stuff it tests.
The reason why QA has the side-effect of being a 'liability shield' is that it gives companies the ability to argue (and proof) after the fact that the company did their due diligence in making sure that the product was to spec.
Especially certification is basically to get an external organisation to approve your QA. In that case if you get sued you can rightfully claim that you did everything that was legally asked of you, and if there is blame, then it is the certifying company using insufficient standards.
I'm not going to argue with the general thrust of your comment, which I think is insightful as to how incentives can compromise objectives. But...
I've seen this happen where it was a bad thing, but also where it was a good thing.
It's all about risk.
What risk does the software defect pose to the mission? What risk is inherent in making any change to the software? Noting that even trivial changes can be fat-fingered and thus are a source of risk. I've seen it go wrong this way: a seemingly trivial change was made, but the developer accidentally checked an extra file into source control, causing a further defect.
And then: what is the cost of mitigating these risks? Maybe the software defect is as trivial as its fix. Maybe an acceptable fix would be to write up a workaround in the documentation.
I don't think it's always wrong to say no to fixing issues. I also don't think it's always right that a separate QA department contributes nothing to the organization, even if they act as a handbrake on the software developers (sometimes, precisely because they do that). Human factors are real.
High quality comment
Except, of course, from Boeing's aircraft-software QA... which killed hundreds of people already.
The problem was not really the software in isolation, but that pilots expected the 737 NG to behave exactly like the old version - because Boeing decided it was too expensive to retrain pilots.
The problem was software that prioritized input from a fauly external sensor, over pilot control, and literally crashed planes directly into the ground. At a certain step in the sequence it was not physically possible for a pilot to pull hard enough on the control element to counteract the software. Could they have disabled the system? Only if they could figure out the specific software trying to crash the plane.
Is that what you meant by "the problem wasn't the software?" Because the pilots should have been trained to unplug the computer to stop it from crashing the plane?
Pilots are definitely trained how to disable the autopilot, if needed.
Afaic, the fault apportionment was Boeing documentation > airlines >> pilots > Boeing technical design.
Not sure why you’re bringing up autopilot— the MCAS system runs even when the autopilot is disabled.
Edit: Also, how does the fault lie with the airlines? Boeing didn’t document the existence of MCAS in the flight manual or training materials.
Because the comment I was replying to
Yes.
The fault lies with the airlines because I don't for a second believe they didn't put pressure on Boeing to get the MAX certified without mandating retraining.
And then once that was done, didn't dig into the details too hard about what changes were made.
I have a low tolerance for 'I set up all the conditions and incentives to encourage you to break the law... but you should take all the blame when it explodes.'
At some point, the customer has to take some responsibility for what they asked for.
It’s easier to blame Boeing because they made the damn thing its documentation. We know for a fact they are at fault. Some or all of the airlines may or may not have put pressure on Boeing.
Wasn't MCAS designed to activate when A/P is disconnected, also?
This wasn’t related to autopilot and they removed mention of the MCAS system from the documentation to support the main selling point of the 737 MAX, which was that existing 737 pilots would be able to switch easily without recertification. They knew that they’d lose most sales to Airbus if the aircraft were compared on their merits so they were banking hard on their huge pool of certified pilots as the competitive edge.
If you listen to podcasts, these two episodes of Causality are excellent:
https://engineered.network/causality/episode-33-737-max/
https://engineered.network/causality/episode-50-737-max-ethi...
You might enjoy this. I have a pin that blinks "AOA Disagree".
Back when I flew regularly before covid, I was tempted to create a bunch of these and hand them out to the flight crew for the flights I flew on.
Ha, playing hardball! I wonder whether you’d find pilots who are Boeing loyalists who’d take offense, or if those guys are even madder at the current management for letting them down.
Pilots should (are supposed to) disable the auto-trim if it's doing something uncommanded/unexpected. Runaway trim can happen for reasons other than faulty software. MCAS was a new factor and they should have been told about it, I don't dispute that at all.
Here we are again, this misconception just won't die.
In the 737 MAX, the only way to disable auto-trim also disables powered trim (the thumb buttons). As grand parent says, at a certain step in the sequence it was not physically possible for a pilot to trim the plain back to stability manually. It simply can't be done.
In the 737 ng, there was a button to do just that. That would have been useful.
And that's even ignoring the fact that all symptoms were very different from those present in a runaway trim situation as described in the manual and learned by the pilots.
The manufacturer company put in larger engines than the aircraft is designed for. And they did it to avoid all the homologation licences and design costs involved in bringing a new aircraft to market with the appropriate tolerances, and to compete with another company's aircraft in time (Loss of sales).
They introduced MCAS in the aircraft for to balance by software a hardware issue, a big design negligent issue which can lead to stalling. It is beyond to trim an aircraft, and because of this there is a big difference in the scale of the values that the algorithm manages from a trimming.
It is not my field, but I think it is not a simple factor, and that it should not be put this over the Pilots like if it were a normal aircraft that received a simple update. Every pilot flying that plane should have been warned that it was not a classic plane with a classic update.
If this type of behaviour by aircraft manufacturers becomes the norm, costs over safety, we as passengers will suffer it, as other passengers unfortunately suffered it, while they blame the Pilots. In addition that nowadays the China's aircraft manufacturing industry wants to enter global market. Some days ago I read they want permission (homologations approvals) for to enter in the European Union.
PS: They also cut costs retiring backup sensors, delegating responsibility for a vital system due the MCAS to the buyer as if it was an unimportant feature; disaster was the order of the day. And the spending cuts were not limited to that, as we have seen in recent days.
Even more ridiculous, Boeing offered a second source of truth option, but marked it as an upcharge, which the airlines in question rejected. "No thanks, no need for a second AoA sensor, one is none is probably fine!"
Additionally, two feels like a really strange number. I would think three for a tiebreaker would be standard for any sensor with that much impact (no pun intended).
The expense for retraining pilots falls on the airline.
Retraining has its own problems. No matter how well retraining is done, pilots still make mistakes from doing the right thing for the previous plane that is the wrong thing for the one they are currently flying.
Adjusting airplanes to fly the same way is a major safety advantage.
Arguably, Boeing hit the uncanny safety valley -- similar enough so that pilots and airlines relaxed, but different enough so that relaxation ultimately killed people.
The emergency procedure for runaway trim was the same for both aircraft types, and was not followed. After the first crash, an Emergency Airworthiness Directive was issued to all MAX pilots reiterating the procedure, which was not followed in the second crash, as well as not reacting to an overspeed warning.
Unreported by the media, there was another MAX incident before the first crash. The crew had no knowledge of MCAS, but did follow the emergency runaway trim procedure, and continued the flight and landed safely.
"Runaway stab trim". It is a memory item, every pilot should be able to perform it from memory.
Turn off the motor, and the trim is manual. There is a crank right there in the cockpit. If it is too hard to turn, change aircraft configuration to reduce the forces required to. Pilot know how to do this. This pilot stuff, they understand the forces on the flight controls and what impacts them.
Boeing made an engineering mistake. The pilots also made an operational mistake. Unfortunately, both mistakes at the same time were fatal.
I pray that pilot training has improved. And that Boeing has made systems level changes to the aircraft that will preclude it happening in the future.
And that is how aviation becomes safer every year; at a significant cost of customers lives.
"Significant" might be inaccurate.
It looks like FAA Part 121 accidents over the last 10 years with fatalities have been... 4. [0]
For a total of 6 fatalities.
[0] https://www.ntsb.gov/Pages/AviationQueryV2.aspx; 2018 (1 passenger fatality) https://www.ntsb.gov/investigations/Pages/DCA18MA142.aspx ; 2019 (3 crew fatalities, cargo flight) https://www.ntsb.gov/investigations/Pages/DCA19MA086.aspx and (1 passenger fatality) https://www.ntsb.gov/investigations/Pages/DCA20MA002.aspx ; 2022 (1 ramp fatality) https://data.ntsb.gov/carol-repgen/api/Aviation/ReportMain/G...
That low accident rate is nigh inconceivable. It's an incredible achievement.
The fatal accident count is higher for GA, but I didn't normalize against flight hours or flights, just glanced at it.
I'm sure there's been a study somewhere that attempts to untangle all the factors that differ between commercial carriers and GA, to see which safety is most sensitive to -- continuous highly professional maintenance, highly trained and experienced crew, rigorous airliner certification regime, etc.
One of which (Atlas Flight 3591) was Pilot error:
Boeing also reduced the size of the manual trim wheels, which let them become impossible to turn sooner than on previous 737s.
The electric trim switches override MCAS. This was explained in the Emergency Airworthiness Directive sent to all MAX pilots after the first crash.
Also, overspeeding the airplane makes it much harder to turn the manual trim wheel. The cockpit voice recorder on the EA flight recorded the overspeed warning horn, which the crew did nothing about (they were at full power, should have pulled the throttles back).
The LA crew restored normal trim twenty-five times before crashing. What they never did was turn it off after restoring normal trim.
If a pilot can't be expected to maintain the pitch of a plane on takeoff, he has no business flying ANYTHING.
What Boeing did (and is STILL doing) is expect pilots to know or remember obscure NON-PILOTAGE (and in the case of MCAS, BURIED) trivia to prevent disaster.
Now... what's the more-responsible approach? Expect pilots to pilot, or expect them to recall an ever-growing list of workarounds to incompetent system design?
The whole MCAS was just unnecessary feature (bug fix). Without it the plane would have worked just fine. The pilots would just have had to go some amount of training scenarios to get the certification on how the MAX plain flies.
Wrong.
I agree.
I’m regularly critical of Boeing Defense (particularly space contracts where I’m a huge Boeing skeptic), but I think people are pretty off base if they think Boeing is just completely incompetent.
Airliner safety is insanely good. Just vast seas of competence, but when there’s a super rare failure, the incorrect impression people get is that Boeing (or Airbus) is just full of incompetency. Almost nothing that humans do is held to the same standard. Not spaceflight, not software, not healthcare, and certainly not automotive.
Flying a 737 Max with a bad door and without the fix to the angle of attack sensor is probably still better per mile than driving. In spite of going at 10 times the speed and miles above the Earth.
You can almost argue it’s held to a higher standard than it should, slowing development of cleaner aviation (and therefore killing more people in the future due to tertiary effects of climate change, etc).
It kind of annoys me when comment sections are filled with people talking about how incompetent Boeing is. It feels like out of shape slobs on their La-Z-boy chairs talking about how incompetent or slow some professional sports players are. Like, airliner safety is just a totally different league than almost anyone else plays in. On the worst day, their better than almost anyone else is on their best.
Because I dug it up for another comment, commercial carriers operating under Part 121 (roughly: scheduled passenger and cargo operation) had 4 fatal incidents in the last 10 years. [0]
Totalling 6 deaths.
In 10 years of US commercial carrier aviation.
One of those was literally 'the engine exploded and threw part of the turbine into the cabin (and also shredded some of the wing)'!!
Which resulted in 1 person dying and a successful landing.
[0] https://news.ycombinator.com/item?id=38921664
Ya but your sample size is way too small to measure the death rate. Aircraft deaths are rare, but flying is too.
The two MAX 8s that fell from the sky were 100% Boeing's fault and could have happened in the US. If 5% of airline traffic is in the US you can renormalize those hundreds of dead and you get dozens dead.
We know US pilots have been warning about the same issues that led to the deathly crashes later but were ignored. The thing is, one part of US commercial aviation being so safe is a lot of pilots responsible for the jet airliners are ex-military. Someone mentioned Southwest Airlines Flight 1380, yup, captain Tammie Jo Shults was one of the first Navy female fighter pilots. Miracle on the Hudson? Sully Sullenberger was an Air Force captain and training officer. Civilian training, no matter how good, is just no replacement for military training and experience.
I can't find specific numbers but estimates say about one in three has a military background. That's an awful lot.
Let's assume American pilots are gods. They were shouting that their crafts were unsafe.
No matter how good they are and how prescient, that doesn't help them if the aircraft computer decides it's stalling, forces a nose down and they cant fight the controls.
But, even if we assume omnipotence from these American pilot gods, and assume they can fly outside the bird and Superman-style catch it, they are still only 30% of American pilots. Just another population to normalize out.
Not surprising given that pilot training is really really expensive. Airlines love former military pilots because they are a significantly lower financial risk for them. Put them into type rating and off they go, it's rare that one ends up as a dud.
Okay, awesome. But how much of that was luck with the 737 Max that they didn't crash on US soil by US airlines?
How much was a rigorous safety regime and high quality training?
Military training. See my comment above: https://news.ycombinator.com/item?id=38925089
Most of it. Statistically. Its not hard to assign part of the deaths from the MAXes to the US.
Being better then driving shouldn't be the standart. Specially driving in the US.
Flying isn't safer then trains I would assume.
Flying has the advantage of being seperated from almost everything else. Most accidents happen when there is mixed traffic, specially cars operated by people with minimal training.
https://turbli.com/blog/the-safest-transport-modes-ranked-by...
And there's good reasons for that. Spaceflight actually is regulated pretty strictly (partially, because any spaceworthy rocket is effectively a missile), and space pilots and tourists both sign up for such missions fully knowing that they will have a very significant chance of dying one way or another - there simply hasn't been enough human spaceflight activity to work out and understand all the failure modes, unlike with other forms of transportation.
Humans, unlike birds, aren't naturally wired to travel by air... they need to be able to trust their lives to a significantly higher degree to someone else behaving like they should, because unlike in a car they have zero control (or the illusion of control) in an aircraft.
Additionally, the inherent security risk of an airliner is very high: what is a widebody airplane at its core? Hundreds of tons of weight, a decent portion of which is fuel, propelled at near-supersonic speed, and only two people in control of it. Anything goes bonkers and you can get thousands of people killed and injured (see 9/11).
In contrast, cars, even trucks, have way less capability to cause damage simply because they weigh so much less. The only thing that comes close is railways, and hell I don't get what the US is doing there, there's barely any regulation compared to European standards (see the videos I linked at https://news.ycombinator.com/item?id=38725988).
So much THIS
People do this with everything though, and air travel induces a large amount of fear in the populace. Not only are we not generally comfortable flying in the air for obvious reasons, but when it happens almost everyone has to concede control to a few people in the cockpit and on the ground. Driving, even if exponentially more dangerous, affords the illusion of control of one's outcome, given driving or having someone you know driving, and control over the vehicle maintenance, etc, as well as familiarity with the control and mechanism of the vehicle. These things don't exist with airplanes for the vast majority of people.
So, you can see why there is a need to find a human component to air travel problems, because that is something one can fix (fire the incompetent people, fine them, whatever), as opposed to all of the other things which must be accepted or rejected entirely.
It is entirely in line with human nature to do this, regardless of its accuracy or effectiveness.
Is it plausible that Boeing has "learned" from software/startup/venture-capital culture with regards to tolerating higher risk to minimize costs?
I suspect it's rather a case of parallel evolution between McDonnell Douglas brass and software startup culture, since cost-cutting culture goes back many decades (remember "Chainsaw" Al Dunlap[1] ?) — but I wonder if there's a more direct influence.
[1] https://en.wikipedia.org/wiki/Albert_J._Dunlap
In lots of ways, the "learning" there would just be "capitalism".
It's inherently short-sighted unless forced to do otherwise by legislation. Cutting small corners pays off A LOT until the hammer falls, so there's a massive advantage to doing it / you need to do it if competition is doing it, or you eventually shut down as they take all your business.
It's inherently a race to the bottom. Sometimes that's a net gain, sometimes it isn't.
But all those communist airlines had no problems at all - exceptional build quality and operational efficiency! /S
Yep, Chernobyl being a prime example. Or Komarov's failed re-entry after complaining about the design faults of the vehicle long before launch. Then there was the more uhhm run of the mill backyard blast furnace campaign which contributed to misallocation of workforce which then led to mass starvation.
There are many many more examples. I find it so tiresome to see young people just use capitalism as a catch all for the failure of something. It's such a lazy and uninformed argument.
I'm not carte blanche defending capitalism - its a mixed bag but it sure outpaces the competing systems put forward to date. It does need some stronger safeguards against industry self regulation - that has a bad track record.
I think we're on the same page. Economic systems need failsafes so that they don't suffer from positive feedback loops.
What anti-capitalist sympathizers, in my view, don't realize is that this is due to people being in the loop. These economic systems are merely vehicles, some better than others, but the conductors are people, be they communists or capitalists. At least with capitalisms there is a delayed regulator (negative feedback) in communism it's up to the system to decide if it needs to modify itself.
All major economic systems of all major national economies over the last century have perverse incentives. It’s not a capitalist thing.
Other systems had incentives such as, get it running by such and such date or have yourself and relatives sent to inhospitable place. So people rushed flawed designs into production.
That said, upper management at Boeing needs a shake-up. People need to get fired. They need to do what Intel is trying and that is to get more engineers in charge, or at least grant them veto power on designs.
They have, but post-Thatcher neoliberal capitalism has taken the existing perverse incentives and made them exponentially worse. We're on a course heading straight to feudalism, just with fancy titles with legal rights replaced by economic might.
It should be a lesson against dogmatic pursuit of absolutes: capitalism comes in a wide range of flavors, and the worst is if it’s completely unrestrained. Communism produced worse and worse results the further it got from any sort of public accountability, etc.
The two problems that I see is that the concept of nuance is somewhat at odds with having a simple concept to teach kids at school, and there’s always a group which is more motivated to game the system than the average person who really just wants to hang out with their friends, raise a family, etc. rather than play political games. Boeing didn’t start it by any means but they’ve benefited enormously from decades of reduced oversight and elevated pay driven by a sort of cartoon libertarianism where letting people get enormously rich will motivate them to build great things unfettered by “red tape”.
"Better to ask for forgiveness than for permission"
Ironically, I believe it was Grace Hopper who said it... Whoops.
That adage is okay, but for it to work not everything can be forgiven — there actually has to be an expectation to be held responsible towards acting on good faith.
Boeing Airliners are much safer now than before they merged with McDonnel-Douglas. (Because basically all airliners are.) And I say that as a regular Boeing critiquer.
The 737 NG from before the merger has a much better safety track record than the 737 MAX.
Here's a Netflix documentary (in the wake of the MCAS crashes) that alleges that after the merger with McDonnell Douglas, the culture of the firm changed. Previously dominated by engineers, it was now dominated by MBAs with a focus on profit and shareholder value.
"With impressive clarity, Downfall: The Case Against Boeing reveals corporate corruption that's enraging in its callousness and frightening in its scope."
https://en.wikipedia.org/wiki/Downfall:_The_Case_Against_Boe... https://www.netflix.com/hk-en/title/81272421
QA was literally invented for the airline industry.
Software QA when actually practiced is more advanced now than airline QA.
what does advanced mean when comparing things so unlike from each other?
also software is the least likely comparison I would have made; software quality is a shit-show on a general level, and the vast public is quite aware of this every time a subway timeboard blue-screens or gets frozen on an AMI screen, or the POS machine that they're forced to interact with at work does something equally as stupid.
...eh, I think "when actually practiced" is doing a lot of carrying there.
What do you mean by "actually practiced".
Outside of the aerospace and healthcare industries, I'm not sure there are many software shops that are doing QA to a level I would like to trust anyone's life with.
Nah, in the software world, the truth is QA is where the people who can't get jobs as programmers end up. I've seen testers go on to become programmers, but I've never seen a programmer become a tester. Maybe it's different for real-time or life-critical systems, sure, but I can confidently say this is how it is in web development.
Would you pay at least 2x for your software to have couple more nines of reliability? I’m gonna guess that “no”. At places where it costs $$$ to have bugs shipped to the end customer (e.g phones) or where there’re regulatory requirements they still have dedicated qa.
It depends.
1% of 10000 is 100.
.01% is 1.
If someone came up to me and said, "Hey I can save you 99% of expected costs with 1% of your profit.", I might go for it.
Which is what i said in second part of my comment. For most software businesses the cost of shipping a bug is trivial and/or poorly measured so due to McNamara fallacy it is readily exchanged for well measured cost of having a functioning qa team
software development: what is QA?
That's where failed software devs / management go to.
of course, but most of us aren't working on products where a quality problem would kill hundreds of people. Having aircraft-level QA would be plain silly, you don't expect that level of quality from most other industry like eg guitar manufacturing, do you?
Cockpit resource management is also something a lot of industries can learn from. As well as human error analysis. How an error came to be is often much more interesting then the personal shortcomings of the person who caused it.
At it's best, software QA and related methods should be equal to airline manufacture.
Think of railway signalling systems, control-by-wire bits of modern cars, medical equipment, etc. Where the design of the software is formally proven, and the implementation verified to ensure it fits the design.
Bolts are most likely tightened with a torque wrench or a gun that is set to a torque spec. Over tightening a bolt is as bad as a loose bolt. I speculate these passed QA from Boeing because they might have been correctly torqued to the spec. What happens in field is hard to understand. One possibility is vicinity to the engine can cause extreme vibrations, these can make them loose. Other possibility is the maintenance side of things - maybe a badly calibrated torque wrench could be the reason. Mechanical systems are not inherently immutable.
When it's happening on a two month old plane it's a production problem.
I would expect lock wire or some other method of ensuring the bolt does not un-torque itself. Especially for bolts that are not required to be removed past final assembly...
Nope. It is most probably caused by operational stress - rudder assembly is moving, fuselage is also working (compression and decompression cycles on take off and landing, thermal expansion and compression). I bet they don't just put red Loctite on it to keep it from getting loose. My bet is design flaws, not manufacturing or QA.
EDIT: I saw the pictures of bolts with pins and bolts without pins. The ones with pins cannot get loose, the others can. Let's see what happened.
The 4 restraining nuts and bolts on the door have a cotter pin like mechanism to prevent them from loosening. If assembled correctly they cannot loosen unless the pin fails.
I don't know about you, but in my industry, "QA" also means extensive testing to ensure that part/assembly/etc doesn't break with expected operations. So, yeah, from where I'm standing, this was a QA problem. Something did not get checked or tested as it probably should have.
It's a QA process better than almost every other industry in the world...
Sure, it failed, and it isn't perfect.
But planes have had a long track record of being absurdly safe.
Maybe an inadequate torque specification for the bolt tightening?
Given everything I've seen so far, I'd bet good money that what happened here was miscommunication between Spirit and Boeing. Spirit started out locking down the plug, then Boeing asked them to just loosely attach it[1] so Boeing could yank the plug for interior/wiring/AC/paint, then someone at Boeing forgot about the "loosely". So now, they get in a hurry (maybe the AC/interior didn't need any access to work on, which makes sense for this MAX variant, it wouldn't need as many hatches to pull wire) and it went down the Renton line as if the plug was fully installed. It's enough to pass high blow inspection and other inspections, but then over time that "shipment config" attachment vibrated out, and pop goes the plug.
Almost certainly systemic issue though, so that sucks. Sucks real bad.
They need to get a Tiger Team or whatever together to look at everything with a shipment config, and make sure those "ship kits" don't leak into the real actual airplane configuration. This is . . ok, this is really manufacturing 101 stuff, but well, things happen.
I'm in the industry, but haven't touched the MAX, so take this with a grain of salt.
[1] Basically a "shipping" or train configuration
They are not related. Probably different types of bolts, for sure different stress types. Rudder assembly is a moving part, these false door panels are not.
It's not a control surface, but it is a "moving part." That's what's baffling to me, that they spent a lot of effort building this hinge and pin roller system, and designed the door to hinge open up to 15 degrees.
It makes me wonder if there's maintenance procedures that at some point would require the operation of that door to successfully complete. Otherwise, the mechanism itself seems so incredibly overwrought, with lots of additional bolts, castle nuts, retaining pins, and even sprung hinges at the bottom.
Does anyone know why this "plug-type non-plug door" is built this way?
It's built to be an actual emergency exit.
It needs to be usable depending on how many passengers the interior is configured for.
So it has all of the door bits there. Maybe some parts like the emergency escape slide are not installed.
e: I should be clear that it's not usable as an emergency exit, as configured by Alaska. However the operator could choose to activate it later and install a usable exit.
Where did you see that? My understanding is that it's an optional plug door that's used to assist with interior installation. Once the interior is done, it's bolted shut and interior paneling is installed over top. From the inside, you can't tell it's there.
My source is [1], specifically at about 5:05.
Alaskan airlines chose a 178 passenger configuration for their 737-9, and so are not required to have a mid-cabin exit door.
Lion Air's chosen to go with a 221 passenger configuration, and so are required to have an operating door.
Obviously changing up the number of seats isn't done on-demand, you'd need to go for a refit/maintenance cycles.
But if Alaskan decided to change density, or sold the aircraft to someone else who decided to change density - then they could go and do this.
[1] https://youtu.be/nw4eQGAmXQ0?t=305 "The Boeing 737 Technical Channel"
Ahh. Well, in the case of the Alaska flight, it's a plug door and not used as an exit. It's pinned in place with large pin that has a bolt, a castle nut and cotter pin which lock the pin.
This is not correct. To the passengers, this just looks like another seat next to a window with a plug installed. It's not a door.
If there was a reconfiguration to a seating standard that required the extra exit, the plug would be removed and a proper door would be installed, with the associated interior pieces.
This is true.
However there's still common hardware in there to allow the plug to be installed and maintained. This is why it's a complicated set of kit vs just bolting in a permanent fixture.
This is not true. It's designed to be opened when inspecting the fuselage for corrosion or stress cracks at the opening. To open it you have to remove the interior plastic panels and undo the 4 bolts that this accident is about
If you are correct, then the implication is that the concern extends beyond door plugs for MAX-9 737s to all emergency exit doors on all models of aircraft sharing this design. This is somewhat reminiscent of the huge problem with the 688 (Los Angeles) class submarines, where the discovery of a faulty weld that had passed inspections raised doubts about all welds.
my guess is: to replace gaskets/seals
I think the point is there were at least two sets of loose bolts at the same time.
That’s “testing in production”, but beast-sized.
I recall a running joke from my childhood - from a former communist East European counry - about a certain car saying you should finish the assembly at home after purchase, tightening the screws before first use. Despite being a famously poor quality car - even in the sloppy East European practices - that supposed to be a joke not to follow suit!