The understanding of decision-making processes is critical in ensuring project success and safety. Project failures—and disasters—can result from the lack of understanding or implementation of sound principles.
Various decision-making processes have been presented previously in this column. In this discussion, I reviewed two books related to group decision making, Disastrous Decisions by Andrew Hopkins and Managing the Unexpected by Karl Weick and Kathleen Sutcliffe.
Macondo—Why Were Poor Decisions Made?
Hopkins presents a compelling analysis of why the Macondo blowout in the Gulf of Mexico in 2010 happened. Unlike the usual approach, in which an investigation ends with the identification of decision errors, Hopkins, a sociologist, takes it to the next level. He evaluates why the people involved made poor decisions.
Among his conclusions are the following: people were operating within a decentralized organizational structure (e.g., engineers reported to assets, rather than to technical leadership; their attention was focused on commercial performance); a one-sided understanding of safety existed (e.g., there was a greater emphasis on personnel safety [slips, trips, and falls] than on process safety); a limited view focused primarily on economic risk, but discounted safety risk; and a lack of understanding of the defense-in-depth system (described below) led to the failure of multiple safeguards.
Hopkins analyzes the failure of the defense-in-depth system by looking at the failure of the system itself, and not merely the failure of the barriers constituting the system.
The defense-in-depth system is better known as the “Swiss cheese” model. Given a hazard, we put barriers in place to prevent it from happening and/or mitigate its consequences. Each barrier has some probability of failure. The analogy is that each barrier is like a slice of Swiss cheese. The hazardous event is prevented or mitigated as long as the holes do not all line up.
In the Macondo case, barriers to the blowout included:
- Cement plug
- Bond log, which verifies the plug
- Well integrity test, which verifies that the well is sealed (not in communication with the reservoir)
- Mud returns measurement to detect well flow
- Blowout preventer (BOP) to seal the wellbore
In addition, there were barriers in place to mitigate the consequence of a blowout: - Diverter to divert production overboard
- Dampers to block gas ingress to the engine room
The Macondo personnel unwittingly, but systematically, defeated these barriers in a way that invalidated the defense-in-depth system.
Hopkins writes that the well design made the cement job more difficult than typical. At least four things could have gone wrong:
- Cement was not properly placed
- Possibility of the instability of the foam cement leading to voids in the set concrete
- Possibility of channeling
- Possibility of contamination of the cement via mixing with mud at the interface
The Macondo engineers convinced themselves either that only the first risk was significant, or that all four were effectively addressed by the fact that “returned mud equaled pumped mud.”
Seeing equal mud returns, the engineers declared the cement placement a success—which biased all future tests and decisions. For example, because the concrete placement was judged a success, no bond log was conducted.
Hopkins writes that the well unambiguously failed the initial well integrity test. When the water column was established between the BOP and the surface, and the surface pressure released, the surface pressure should have remained at zero. It did not, which was a clear indication that the well was not sealed.
Failure of Defense-in-Depth
Three barriers (cement plug, bond log, and well integrity test) failed. It is highly unlikely that three independent barriers will fail, and some may attribute the failures to “really bad luck.” But it was not bad luck. The three barriers were not independent, nor were the other barriers on the list that also failed. And yet, the engineers who designed the system had every reason to believe that they were independent.
The engineers who designed the well made it difficult to place the cement plug. They did not believe that they were taking a safety risk—they viewed it as an economic risk. They believed that a failed plug would be discovered and repaired by means of intervention; expensive but not unsafe. They probably anticipated that the bond log would detect any problems. We might say that they intentionally weakened one barrier, but had confidence that other barriers would protect them.
The decision to dispense with the bond log on an admittedly tricky cement job must be explained. Hopkins’ explanation is that decision making was by consensus. On the face of it, consensus decision making seems like a good idea. The problem is that if everyone is responsible, then no one is responsible. It is possible that no one really thought it through. The team that claimed that the cement job was a success also claimed that the bond log was unnecessary.
The well integrity test was not an independent barrier. The team that conducted it was biased by the belief that the cement job was good, and the workers appeared to be incapable of believing otherwise. When the initial test failed unambiguously, they searched for and found a way to confirm their bias about the cement job.
Three barriers went down because of a team’s consensus decision and the bias that followed from that decision. Other barriers were also bypassed. The “passed” integrity test yielded further bias, which likely led to failure to monitor mud returns.
Hopkins offers other valuable insights. In my opinion, everyone in the industry could benefit from reading his book.
Highly Resilient Organizations
I turn my sight now to Weick and Sutcliffe’s book. In the 1980s, a team of researchers at the University of California at Berkley conducted a study of highly resilient organizations (HROs) that operated hazardous technology, yet seemed to have few accidents.
Using the results of the research, Weick and Sutcliffe identified five features that set apart HROs:
- Preoccupation with failure rather than success. It is easy to become complacent because serious accidents are rare. HROs guard against complacency by focusing on potential failure and always being on the lookout for cues that something is going wrong.
- Reluctance to simplify. Cues that something is going wrong can be hard to distinguish from “noise.” HROs are mindful of this and do not readily dismiss cues of potential problems. Everyone is encouraged to identify and report potential problems.
- Sensitivity to operations. Managers at HROs keep abreast of what is happening “on the ground.” The best and most timely information comes from the front line.
- Commitment to resilience. HROs plan for the unexpected. Minor problems do not get out of control because there are plans in place to deal with them. People on the front line know that they have the authority to do whatever is necessary to deal with an urgent problem.
- Deference to expertise rather than bureaucracy. Emergencies must be managed by the people with the expertise to understand and deal with the problem. Management must yield to the experts during an emergency.
Comparisons
How does the Macondo team measure up against these criteria? I identify the first two features of HROs as strikingly different at Macondo. First, far from being preoccupied with failure, the Macondo team was preoccupied with success, looking for arguments to rescue success from the jaws of failure. Secondly, the team sought simple answers to complex problems.
As in most major accidents, the story of Macondo is a story of group error rather than individual error. No single decision caused the blowout, and no single person would have made all the decisions that led to the disaster. Individuals and groups made a series of decisions that made sense to them based on what they knew at the time. Many of the errors we can identify today are the result of hindsight and sensemaking.
The particular series of errors made in the Macondo case will likely never be repeated because of what has been learned.
The more important lessons to be learned are how to avoid similar group decision errors in dissimilar situations in the future. In that respect, there are many things we can learn from Hopkins, and Weick and Sutcliffe.
I conclude with two recommendations. We should be wary of consensus decision making. If everyone is responsible, then no one is responsible. When a group makes a decision, it is possible that no one has thought it through.
We should assume failure and insist that the tests prove success. Because if we assume success, our natural confirmation bias may lead to acceptance of ambiguous test results.
For Further Reading
Hopkins, A. 2012. Disastrous Decisions: The Human and Organisational Causes of the Gulf of Mexico Blowout. CCH Australia.
Weick, K.E. and Sutcliffe, K.M. 2007. Managing the Unexpected: Resilient Performance in an Age of Uncertainty, second edition. Jossey-Bass.
Howard Duhon is the systems engineering manager and a principal with Gibson Applied Technology and Engineering (GATE). Throughout his career, he has had an interest in the study of decision theory and in the application of that knowledge to improve project execution. He can be reached at hduhon@gateinc.com.