The deficiencies found in the report were in Just Culture and Reporting Culture.
The five Key Elements of Safety Culture are:
1) Informed Culture- the organization collects and analyses relevant data, and actively disseminates safety information.
2) Reporting Culture- cultivating an atmosphere where people have confidence to report safety concerns without fear of blame. Employees must know that confidentiality will be maintained and that the information they submit will be acted upon, otherwise they will decide that there is no benefit in their reporting.
3) Learning Culture- an organization is able to learn from its mistakes and make changes. It will also ensure that people understand the SMS processes at a personal level.
4) Just Culture- errors and unsafe acts will not be punished if the error was unintentional. However, those who act recklessly or take deliberate and unjustifiable risks will still be subject to disciplinary action.
5) Flexible Culture- the organization and the people in it are capable of adapting effectively to changing demands.
Sources:
https://www.faa.gov/newsroom/Sec103_ExpertPanelReview_Report...
https://www.airsafety.aero/safety-information-and-reporting/...
No sane organization would ever implement this. If someone repeatedly makes mistakes, they're going to get fired even if the mistakes are unintentional. Anything else is going to cause more safety issues in the long-term as inadequate employees are allowed to proliferate.
This is just blameless post mortems and many, many many places implement this.
There are always going to be some level of "inadequate" employees, and also perfectly adequate employees that sometimes make mistakes in any organization and if your organization requires that no employees ever make mistakes in order to operate safely, then you have serious problems.
The purpose of a statement like that is that you don't just have a post-mortem that is like: "Our company went off the internet because an employee had a typo in a host name. We fired the employee and the problem is solved." When in reality the problem is that you had a system that allowed a typo to go all the way into production.
It's like that story of the pilot who, after his refueling technician almost caused a crash by using the wrong fuel, insisted that he always have that technician because they'd never make that mistake again.
the question is what do you do with the technician after the 2nd mistake. that is to say, When does this logic break down?
Redesign the system again if it's unintentional. It is almost impossible to control humans to the degree that they never make mistakes. It's far better to design a system in which mistakes are categorically impossible.
I'm trying to push back on the knee jerk sentiment that there are no bad employees, only bad systems.
There are no systems that are human proof, and what kind of human behavior is tolerated is a characteristic of the system.
In fact, there are humans that lie, cheat, are apathetic, and incompetent. Part of a good system is to not only mitigate, but actively weed these people out.
For example, if someone falsifies the inspection checklist for your plane, you dont just give them a PIP.
Why is it important to you?
Because Im an engineer in a quality controlled field (Medicine), and my personal experience is that firms place too much faith in quality systems and not enough emphasis on quality employees.
I see lots of engineers and QA following a elaborate procedures with hundreds of checks, but not bothering to even read what they sign off on, so they can go golf all day.
People seem to think that you can engineer some process flow to prevent every error, but every process is garbage if the humans dont care or know what they are doing.
Every process is garbage is you dont hire workers with the right skills demanded by that process. In an effort to drive down costs, lots of companies try to make up for talent with process, with poor results, for both the companies and patients. you cant replace a brain surgeon with 2 plumbers and twice the instructions.
Very interesting, I'm glad I asked.
Similarly, I read some head of a leading engineering organization (I think a NASA head or maybe Admiral Rickover) who said, essentially, 'you can't replace ability with process'. All the process in the world, they said, will not substitute for highly able personnel.
But perhaps safety, not usually dependent on ability, is a different matter. Possibly, the problems you describe are a matter of leadership and management - which doesn't undermine your point; those also are things that can't, past a certain irriducible point, be replaced with process.
I wholeheartedly agree that leadership/management is a part of problem. My main objection is the "no bad employee" rhetoric. Sometime times the problem with management is that they aren't getting rid of bad employees. Rot can start anywhere in an organization, and the rest of the org really needs to push back, not just management.
It actually reminds me a lot of the culture/discipline problems with some Police departments in the US. It is hard to enforce and cultivate organizational culture top down. Most of it is maintained peer-peer.
I guess it seems like that argument takes the discussion to an extreme. Does anyone actually advocate never firing employees? That there are literally no bad employees?
I think it's a combination. The leader has a large influence; they set the standards and the norms. At the same time, I agree with what you say about peers - perhaps peers spread and 'enforce' those norms. It may also depend on the size and age of the organization.
Yes there are obviously bad employees but the line for actual incompetent/malicious employee is a lot further away than most people understand.
A lot of bad management is hand-waved as crappy employees (by management - shocking!)
Falsifying the inspection checklist is not a honest mistake.
I think that this anecdote [0] is appropriate for showing the glaring disconnects that can exist in the human<-->system symbiosis.
[0]: https://www.controlinmotion.com/news/news-archive/a-little-h...
That's not really the question:
Punishment culture assumes people naturally do bad, lazy things unless they are deterred by punishment and fear. Therefore we must punish mistakes.
That perspective has long been debunked. You don't see competent, skilled leaders using it. It turns out that generally people want to do well (just like you do), and they don't when they are scared / activated (in fight/flight/freeze mode), poorly trained, poorly supported, or poorly led. They excel when they feel safe and supported.
If you are the manager and the technician makes the same mistake the 2nd or 3rd time, you will find the problem the next morning in your bathroom mirror. :) At best, you have put them in a position to fail without the proper training or support. Leadership might also be an issue.
I would say that every skilled leader must use punishments and consequences to some degree.
If your tech gets drunk every day and doesnt do their job, you need to cut them loose. This isn't a management problem.
Sometimes people end up in positions where they are not suited and will continue to fail. If you hired a plumber and you need a doctor, that isnt an on the job training, support, or leadership issue.
That is 100% a management problem.
I wonder how they got in those positions? That sounds like a management problem too.
It isnt always managements job to make the person workout in the role. Sometime it is managements job to fire that person to find someone better.
Some people are bad fits for positions. They might look good on paper, they might be trying something new, they might lie to get hired, they might change after starting, they might have been a risky hire, or any number of reasons.
I think you're envisioning people all being absolutists who follow an exacting rule book and can't consider context. (that's covered by the *flexibility* tentpole)
As N approaches infinity, there's definitely a value of N at which we discover the root cause is the airman and have to move on from him. I don't think it's particularly interesting to try to identify a constant value for N because it's highly situational, and we know we have to do *just* and *reporting* as well, the reporting falls out when the just does.
You hit the nail on the head. I do perceive a lot of people being "no bad employee" absolutists.
All I am looking for is recognition that the content of N matters.
It is part of what I see as a broader phenomenon where people emphasize systems and ignore agents. In reality, agents shape systems and systems shape agents in continuous feedback.
If you implemented some changes so the mistake is caught before disastrous consequences, you're already doing better. Well enough to let the 2nd one slide. Even the 3rd. After that, action seems reasonable. It's no longer a mistake, it's a pattern of faulty behavior.
That is a big IF. At some point it comes down to the error type, and if it is a reasonable/honest mistake.
The situation is very different if the fuel cans are hard to distinguish vs if the tech is lazy and falsifying their checklist.
Underlying any safety culture is a one of integrity. No safety culture can tolerate a culture of apathy and indifference.
I expect there's precisely 1 safety culture that can tolerate a culture of apathy and indifference -- one in which no work is ever completed (without infinite headcount).
You apply risk mitigation and work verification to resolve safety issues.
Then you recursively repeat that to account for ineffective performance of the previous level of verification.
Ergo, end productivity per employee is directly proportional to integrity, as it allows you to relax that inefficient infinite (re-)verification.
Exactly! All this talk about man vs system misses the point that man is the system designer, operator, and component.
This is why Boeing cant just solve their situation with more process checks. From the reporting, they are already drowning in redundant quality systems and complexity. What failed was the human elements.
Someone was gaming the system saying that the doors weren't "technically" removed because there was a shoelace (or whatever) holding them in place, Quality assurance was asleep at the wheel, and management was rewarding those behaviors.
Plenty of blame to go around.
Well certainly not after the first time at least
Imo it's a function of time, company and team culture, severity, and role guidelines.
If an employee makes a mistake but followed process, and no process change occured, that's just acknowledging the cost of doing business imo and would be a unbounded number of times so long as it's good faith from the employee
Not severity; that sort of thinking is actually part of low-safety cultures. A highly safe culture requires the insight that people don't behave differently based on outcome. In fact, most people can't assess the severity of their work (this is by design; for example someone with access to the full picture makes the decisions so that technicians don't have to). So they couldn't behave differently even if they did somehow make better decisions when it matters.
But, and I'll reiterate the point for emphasis, people make all their decisions using the same brain. It is like bugs; any code can be buggy. Code doesn't get less buggy because it is important code. It gets less buggy because it is tested, formally verified, battle scarred, well specified and doesn't change often.
Would s/severity/impact/g also be counterproductive of safety culture? Genuinely trying to learn here, gotta be responsible/accountable and all.
Maybe impact relative to carelessness/aloof-ity?
I agree that an engineer/person will not behavior differently based on outcomes, but if they know in advance something can have a wide, destructive blast radius if some procedure is not followed, I feel there's a bit more culpability on the part of the engineer. Regardless I don't think I feel I have a sufficient grasp on this concept I'm trying to define so definitely agreed I shouldn't have included 'severity' in the function definition nor any alternative candidate
My point is that good faith and sufficient competence are crucial. If the employee didn't care if the plane crashed, they are a bad fit.
If they cant read the refueling checklist, they are a bad fit.
Ideally you have system controls to screen and weed these people out too.
You take him into a boolean tree within a and with another employee for quality and put him on a improvement plan?
maybe. or maybe you turn them over to the authorities because the 2nd time their lazy and reckless disregard killed several people.
It's seemingly simple "oh the technician keeps messing up"
Did the technician mess up (sometimes true), or were they doing their job in good faith - was it the system/protocol/organization that made the task mistake prone? Did someone else actually mess up but the situation made it look like it's the technician's fault? Does this technician do a task/service that is failure prone? Are there other technicians on other tasks that are far less failure prone? Here the former technician would seem poor, the latter, excellent, but it's a function of the task/role and not the person.
I've been "the technician" - I catch a lot of blame because people know I'm anti-blame culture, so I'd rather take the blame on myself that point my finger to the next guy in line. I'm also willing to take on high risk tasks for the greater good even if they suck and are blame prone / risky. I believe in team culture in this way. If the organization doesn't respect that belief and throws me under the bus, I leave - which is quite punishing for them since they remain completely unaware of a major internal problem. If an organization "sees me" and my philosophy, then together we get very very good at optimizing the system to minimize the likelihood of failure / mistakes.
That was the late, and definitely great, R.A. "Bob" Hoover, I am proud to have shared a beer with him at Oshkosh. His Shrike Commander was miss-fueled with jet fuel instead of avgas because it was mistaken for the the larger turboprop model. Rather than blaming the individual refueler, he recognized that there was a systemic problem and developed an engineering solution. He proposed and the industry adopted a mutually incompatible standard of fuel nozzles/receptacles for jet fuel and avgas as a result. You can find some great YouTube material on him, or the film "Flying the Feathered Edge"
https://sierrahotel.net/blogs/news/a-life-lesson
https://en.wikipedia.org/wiki/Bob_Hoover#Hoover_nozzle_and_H...
https://www.imdb.com/title/tt2334694/
Here's an old timey video of Bob in his prime. At 8:55 he flys a barrel roll with one hand while pouring himself a glass of iced tea with the other. Hardest part was pouring the tea backhanded so the camera had a good view. Then he finishes with his trademark no-engine loop, roll, and landing.
https://www.youtube.com/watch?v=PT1kVmqmvHU&t=510s
Exactly. https://asteriskmag.com/issues/05/why-you-ve-never-been-in-a... is a great article illustrating this in the airline industry itself.
> When in reality the problem is that you had a system that allowed a typo to go all the way into production.
That's a typical root cause, and is exactly what should come out of good post-mortems.
But human nature is human nature...
Furthering the insinuation that everyone has the right to work every job. Sometimes people suck at their job.
As your sibling comments mentioned, there's a difference between giving a chance for someone to learn from a single mistake without punishment, and allowing them to make the same mistake twice without taking matters out of their hands after.
If it's a really critical role, the training will have realistic enough simulation for them to make countless mistakes before they leave the training environment. Then you can assess their level of risk safely.
I think there is more nuance to it than that. Not everything is a mistake, not every mistake is recoverable, and not all skills are trainable.
The fundamental goal is to distinguish between recoverable errors and those that are indicative of poor employee-role fit.
Mistakes are the problem, as they will always happen.
The point is to build a culture where you value teamwork and adjust and learn from failures.
This isn't an individual team problem, this is an organization problem.
It is impossible to hire infallible, all knowing employees.
But it is quite possible to enable communication and to learn from pas mistakes.
When you silence employees due to a fear of retribution bad things happen.
People need to feel safe with calling out the systemic problems that led to a failure. If that ends up being the wrong mixture of skills on a team or bad communication within a team that is different.
Everything in this report was a mistake, and not due to gross incompetence from a single person.
The E door bolts as an example was directly attributed to metrics that punished people if they didn't bypass review. The delivery timelines and defect rates were what management placed value on over quality and safety.
Consider the prisoner delema, which is resolved by communication, not choosing a better partner.
I don't disagree with what you said about this instance, but I'm trying to push back on the knee jerk sentiment that there are no bad employees only bad systems- There are both. cultures that are too permissive of bad actors degrade the system.
Part of maintaining quality culture is maintaining red lines around integrity.
Like I said above, not all errors are recoverable or honest mistakes.
I work in medicine and a classic example would be falsifying data. That should always be a red line, not a learning opportunity. You can add QA and systemic controls, but without out integrity, they are meaningless. I have seen places with a culture of indifference, where QA is checked out and doesn't do their job either.
Certainly nobody has ever thought about that before. In fact, there definitely isn't a second sentence in the definition of aviation's just culture that is being completely ignored in favour of weird devil's advocacy.
Oh wait.
I have no problem with the stated safety culture.
I simply agree that "that everyone has the right to work every job" is not a reasonable interpretation of them.
as stated above, a reasonable reader should understand:
Who is claiming that "everyone has the right to work every job", though? The only person to even bring up the sentence is someone who's handwringing about an interpretation that nobody was making to begin with.
This is why I called it weird devil's advocacy, because what exactly is the point of jumping to caution people about something they aren't doing?
Thats the parent in the thread we are posting in in. User Error-Logic replied, and I built upon their reply adding that:
You and others wanted to dive further.
This whole thread is missing the fact that the NTSB had a theory that transparency leads to safer airplanes, they tried it, and it works. People hesitate to self-report when it comes with punishment (fines, demotions, or just loss of face among peers). You need a formal “safe space” where early reporting is rewarded and late reporting is discouraged.
Safety is a lot about trust, and there is more than one kind of trust. At a minimum: are you capable of doing this thing I need you to do? Will you do this thing I need you to do?
It's not just the NTSB, it's part of things like the Toyota Production System. There's ample evidence to show both that punishment discourages safety and that lack of punishment encourages safety, across multiple industries.
Yes this is cross industry best practices.
Goodhart's law also applies, as in the case of the edoor bolts, Spirit intentionally bypassed safety controls to meet performance metrics.
The Mars Climate Orbiter is another example. While unit conversion was the scapegoat, the real cause of the crash is that when people noticed that there was a problem they were dismissed.
The Andon cord from the Toyota Production System wasn't present due to culture problems.
Same thing with impact scores in software reducing quality and customer value.
If you intentionally or through metrics incentivize cutting corners it will be the cost of quality and safety.
I am glad they called out the culture problem here. This is not something that is fixable under more controls, it requires cultural changes.
Challenger too. Multiple engineers warned them about the O-rings. They weren't just ignored, but were openly mocked by the NASA leadership. (https://allthatsinteresting.com/space-shuttle-challenger-dis...)
A decade later a senior engineer at NASA warned about a piece of foam striking Space Shuttle Columbia and requested they use existing military satellites to check for damage. She was ignored by NASA leadership, and following (coincidentally) a report by Boeing concluding nothing was wrong, another 7 people were killed by a piss-poor safety culture. (https://abcnews.go.com/Technology/story?id=97600&page=1)
But but but what about my intuition and gotcha questions about how this could never work in practice?
Just culture doesn't prevent you from firing someone who makes repeated mistakes.
In fact, Just Culture in itself provides the justification for this. As the next line says "However, those who act recklessly or take deliberate and unjustifiable risks will still be subject to disciplinary action". A person who repeated makes mistakes is an unjustifiable risk.
When a punishment is applied with more deliberation, it can also be more severe.
Why is severity desirable? Or if it's not desirable, so what?
Severity is desirable iff it's justified. I wouldn't ever sign off on a policy that says "you'll be fired for a single mistake" (that would be a severity of punishment out of proportion to the risk/underperformance).
But a policy that never provided for the possibility of termination (insufficient maximum severity) is also not desirable.
It's necessary if it's (necessary & efficient & justified); it's never desirable IMHO.
Doing severe things because they are justified is just acting out on a desire or drive - internal anger - but now we can 'justify' the target and feel ok about it. Lynch mobs think they are justified.
Designing severe things to be included as part of a process is a desirable property of that system if the severe thing is sometimes required.
No one is designing a formal system that includes lunch mobs. But a formal system of repercussions for employee behavior that does not include firing is an incomplete system.
It’s not that firing itself is ever desirable, but rather that its inclusion in a disciplinary progression is desirable.
You can really dumb it down to why didn’t you follow the checklist? If someone makes the same mistake after being corrected three times and the proper procedures exist for the worker to follow then the safety culture provides the structure and justification for their dismissal
No, you really need to smarten it up, and start off by making sure that your checklist is correct. Is it the correct checklist for the airplane model that you are building? Are all the right items on the checklist? Are they being done in the correct order? Do you have the correct validation/verification steps in your checklist? Does your checklist include all the parts that will need to be replaced? If the mechanic finds a quality issue while working the checklist and a job needs to be re-done, which checklists then need to be re-done? What other jobs are impacted by the rework?
All indications here (from the NTSB prelim and the widely reported whistleblower account) are that during rework for a minor manufacturing discrepancy, the mechanics on the shop floor followed bad manufacturing planning / engineering instructions to-the-letter, then the ball was dropped in error handling when the engineering instructions did not match the airplane configuration, because Boeing was using two different systems of record for error handling that did not communicate with each other except though manual coordination.
That's not the fault of the front-line assembly worker not following a checklist.
I agree with you. If the systems/procedures/checklists are bad it is not the fault of a front line worker.
I thought I was replying more to a parent comment addressing the inability to people go who repeatedly make mistakes, which is acceptable unless they are not following procedures.
I once destroyed $10k worth of aerospace equipment. I admitted it immediately and my only reprimand was that my boss asked me if I learned my lesson. (I did)
Once destroyed a industrial manufacturing site with a unfinished robot program that ran because I allowed myself to be distracted mid alterations.
And what happened?
I think the wording is clumsy, but this is analogous no-blame processes. The wording is just accounting for the possibility of wontonly malicious or recklessly negligent work quality. Think someone either sabotaging the product, or showing up to work very high or drunk.
This.
A mistake like "accidentally turning the machine off when it shouldn't be" is a fixable problem.
If someone has attitude like "fuck the checklist, I know better", it is not really a mistake, and that person should be rightfully fired or at least moved to a position where they cannot do any harm.
That's quite a leap from "unintentional" to "repeatedly."
Not at all: Systemic problems will result in repeated errors until the system is changed.
Who do you think came up with this rule, bleeding heart liberals’? Stop and think for a second, why does that rule exist?
You described a fantasy world, in the real world everyone makes mistakes, and if the mistakes are punished, then there are no mistakes because no one reports them. That is until the mistake is so catastrophic, it cannot be covered up- that’s how you get Chernobyl or Boeing max
Boeing max (if you mean the crashes caused by MCAS) wasn't due to a "mistake" not being reported, it was deliberate and intentional on the part of company management. The system was designed badly and without redundancy, and without any information available to the pilots about its very existence, specifically because management wanted it that way. It wasn't caused by some kind of accident.
Every sane organization implements this. Failure to do so leads to fear of reporting mistakes, and you get Boeing. This isn't news.
If it's possible for an employee to unintentionally make the same mistake twice, that's purely management's failure. It's impossible to make systems completely fool proof, but once you know of a specific deficiency in your process you fix it. If you've corrected the issue, it should take deliberate effort for someone to do it again. An organization that knows its processes are deficient but makes no changes and expects a different result is insane.
Ideally, as a result of the post-mortem, the same mistake shouldn't even be repeatable, because mechanisms should be introduced to prevent it.
And if someone keeps making new original mistakes, revealing vulnerabilities in your processes, I would say that it is a very valuable employee, a lucky pen-tester of sorts.
Wowwww never become a manager please.
I'd note that financial markets driven reorganizations are antithetical to elements 1-4 and this explains how Boeing managed to have a culture of safety but lose it (it's often put as MD management took but an article a bit back showed that this was part of the Boeing CEO seeing the financial writing on the wall). Uh, and that happened "under the watchful eyes" of the FAA.
The opposite of 1-4 could be described as the "culture of lies, ignorance and fear". Fear is a good strategy for getting people working hard (if not always well) and lies make fear universal. Compartmentalizing information is needed to allow more and more functions to be subcontracted. If the company is extracting maximum value from it's assets this year, it has no incentive to report problems that will only appear in the future - by the time the future rolls around, the share holders have their and the shell of the remaining company can be tossed away. etc.
Also, another HN commentator mentioned how eliminating a culture of lies and retaliation is once it's in place. There's never a guarantee that those revealing a problem won't be punished once regulators turn their backs.
And 5 is only useful once 1-4 are in place. Otherwise, it's a culture of flexibly hiding your shit in different places.
Edit: This article was on HN a while back. https://qz.com/1776080/how-the-mcdonnell-douglas-boeing-merg... Key quote: These decisions, made by Boeing CEO Phil Condit, were made with a close eye on the company’s bottom line ahead of a hotly anticipated commercial-jet boom. An ambitious program of cost-cutting, outsourcing, and digitalization had already begun.
If this idea could be explored in depth, and more-or-less codified as received wisdom about market players, it would be a great contribution to management "science" and economics. My 0,02€.
I'd say Learning Culture is also a problem.
Boeing has made numerous missteps in the last 15 years after being the world leader in airliners for around half a century. This only happens when knowledge about how to make a safe product is purposefully discarded and attempts to bring that knowledge back are intentionally ignored. In Boeing's case, it's due to desires for increased profits. They are unwilling to learn these lessons because it costs money that _may_ be there at quarter's end.
I thought this was critical:
As was noted by the purported insider, re: multiple overlapping systems of record/not-record, Boeing's actual processes themselves are badly in need of overhaul.
This feel like a clear example of where top-down + bottom-up independent read-back verification would have been useful.
I.e. management decides they're going to create Safety Process X using Systems A, B, and C. They do so, then circulate training (top-down). THEN you conduct independent interviews with employees at the bottom, to measure whether the new processes are understood at that level (bottom-up). If results aren't satisfactory, then add additional training or reengineer the processes.
Too often, it seems like this shit gets done at the VP PowerPoint level, and ground reality diverges without anyone noticing.
The map is not the world: interviews with a representative random sampling aren't hard.
I'm really at a loss on this news. All the employees at airlines in the US I know of have this drilled into them on a regular basis and it's just taken for granted that you report incidents when they happen (even when someone falls: report it!) and the incident will get investigated.
It just confounds me (but explains a lot) that the manufacturer of the aircraft the airlines operate does not share a similar safety culture given that they are in a similar ecosystem (airlines report issues to the manufacturer and the FAA/NTSB all the time)