| The Investigation Process Research Resource Site |
A Pro Bono site with hundreds of resources for Investigation Investigators
|Home Page||Site Guidance||FAQs||Old News||Site inputs||Forums|
to advance the
Search site for::
Launched Aug 26 1996.
By Ludwig Benner, Jr. and Robert Sweginnis
How would you like to compete in a bowling match where a curtain quickly drops down just in front of the foul line after you deliver your ball down the alley? You never know whether you scored a strike with the first ball, or whether you missed the pins completely. If you don’t find out how you score, what’s the sense of starting the ball rolling? That’s not the kind of game most of us would like to play. We much prefer a game in which we can follow our scores with each roll.
Compelling evidence suggests that, in too many instances, system safety — as currently practiced — has one attribute comparable to bowling behind such a curtain. System safety analyses are prepared, quantified and analyzed with increasing frequency. But, how often is a long-term feedback loop established to convey the “system safety score” achieved by the analyst?
THE OPEN LOOP FLAW
The notion of what constitutes a system safety “closed loop” illuminates this problem area. In system theory, a feedback loop is conceived as the information flow from the system output back to the system. This “feedback” is used to change the operation of the system, to correct undesired output.
A completed feedback loop is often termed a “closed loop” by system safety specialists. Two widely held views of what constitutes a “closed loop” can be observed in the system safety field. One is the notion that when a safety recommendation to control a hazard is reported to have been implemented, the recommendation file is closed and the safety loop is closed. A second view is that if the design performs as expected during prototype or preproduction or component tests, this information confirms the analysis and closes the loop for the safety analyst.
Both views reflect inadequate feedback loops.
If we consider the bowling pins as hazards, how many hazard pins did our system safety analysis ball really knock off our workplace alley? What contribution, in the real world, did our system safety analysis efforts make to our team’s score and season record at the alley? Evidence suggests that many system safety practitioners, after providing their safety analyses and opinions, don’t know how the game or the season ended. Furthermore, they may not be looking to find out. Unfortunately, few managers seem to be asking them to do so.
What is the evidence supporting these conclusions? Among other things, the evidence includes:
Among recent system safety textbooks, Brown’s 1976 book  is silent on the matter. Hammer’s widely used 1980 book  uses the term “closing the loop” in the sense of ensuring that corrective action is taken after hazard analyses find hazards that are not properly controlled. He devotes 3 lines in a 320 page textbook to using accident investigations to update hazard analyses. The 1981 NUREG 0492 Fault Tree Handbook  does not address this issue. Malasky’s 1982 book  does not mention the hazard analysis review function in connection with accident investigations. Clemens’ article about system safety methods in a 1 982 Hazard Prevention article  mentions no accident investigation methods or linkages for closing the loop. These popular publications are representative of the system safety literature. Of the system safety publications we have surveyed, only Roland and Moriarty’s 1983  system safety engineering book speaks directly to this issue.
From the accident investigation perspective, a review of 39 accident investigation manuals disclosed that NONE of the manuals required or even suggested that predictive system safety analyses be compared with what occurred. No system safety plan we found required that analyses be tested during accident investigations, or updated by new information from an accident When this has been done during investigations, however, the results have been significant, as will be discussed shortly.
Department of Defense MIL-STD-882A clearly illustrates this missing linkage. When one compares 882A and accident investigation manuals used in the Department, one finds that no common model of the accident phenomenon is used. Further, the definitions in the investigation manuals are used in a different context from those in 882A. The bottom line is that accident data and 882A outputs are in different terms, so the two systems do not routinely help each other.
The system safety loop must not be considered closed until the predicted safety risks and safety control system performance levels, as determined by system safety analyses, have been validated or upgraded with ongoing system performance information. Did these predictive analyses accurately predict operating disruptions, mishap scenarios and losses resulting from the hazards identified by accident investigators? Was the assessment of safety risk presented to appropriate decision makers during early safety reviews accurate, and is the system now performing as it was originally advertised? Although the vast majority of the information used to validate system safety will be generated after a system is operational, few such direct comparisons are reported in the literature.
Prototype and pre-production tests do expose their fair share of dings. bangs and glitches, but what happens after that phase? By “closing” the loop too soon in a system’s life cycle, system safety practitioners cannot determine if their analytical “models’ successfully predicted real world safety and accident performance, or how their models might be upgraded promptly and efficiently to do so.
For each mishap after a system is operational, significant questions should be raised by investigators and system safety analysts like:
If you answer these questions "no" in an investigation, your system safety program probably is afflicted with the "open loop flaw."
In the accident at hand, proper methods often disclose potential SYSTEM flaws. But did the investigation discover the SYSTEM SAFETY program’s “open loop” flaw? Has the risk inherent in this “open loop” flaw remained overlooked, unevaluated and uncorrected, and if so how often have accident problems been repeated because the system safety program flaw exists. This open loop has cascading effects on a system’s performance as risks continue to be overlooked, escape evaluation and produce serious consequences.
HOW CAN YOU TELL IF YOU HAVE A PROBLEM?
The following set of questions can help safety managers and their organization’s senior executives determine if their safety programs contain the "open loop flaw."
IS THIS FLAW WORTH FIXING?
The answer is a resounding YES! Closing the loop has been very worthwhile when it has been done. Several examples illustrate the benefits.
Several years ago, the United States Air Force noted that a new aircraft was crashing at a rate which exceeded that predicted in system safety analyses. A special independent review team consisting of both Air Force (operational, design/acquisition and logistic) members and airframe and engine contractor members was formed. Its purpose was to ensure that all reasonable corrective actions were being accomplished. One of the areas examined by the review team was the airframe/engine system safety program. The team considered it both active and aggressive. It had all the basic elements necessary for effective identification, evaluation and elimination or control of hazards. But, the team’s report also noted the system safety analyses had not identified the causes for four engine-related crashes. Although the four accidents were admittedly difficult to predict, comparing the original system safety analyses with the real world accidents showed that the system safety program needed a mid-course correction. The first hazard analysis ball had not knocked down all the hazard pins and another ball was necessary.
One of the engine failures involved a small flaw in an engine compressor disc which grew through fatigue until the disc failed catastrophically. Although the fault tree had identified this failure mode, the predicted failure rate was so low that, if it were valid, the failure was not expected. Another engine quit when a feedback/control cable subsystem experienced a fatigue induced by faults during assembly/maintenance. Without the information provided by the cable, the compressor variable vanes were improperly positioned and the engine stagnated. The fault tree model did not address the assembly/maintenance fault modes which led to failure. In addition, although similar engines in other types of aircraft had experienced this failure, responsible management organizations were not altered.
The third loss occurred after a cylindrical pin was left out during the manufacture of an engine component which was unique to the new aircraft. Missing the pin, a valve component was free to rotate and cause the engine to stagnate. The fault tree had not been carried down to a level which could have detected this fault. It had been assumed (erroneously) that normal and back-up systems were totally redundant below a certain level. In addition, the team noted that Failure Mode and Effects analyses did not address missing parts.
The fourth loss involved an internal engine bolt backing out of its hole and starting a reaction which resulted in catastrophic engine failure. When the team reviewed the fault tree analysis, it discovered the analysis did not go to the piece part level for components such as bolts.
Note that this was considered a good safety program, well funded, staffed and directed. But, like good bowlers, it sometimes failed to knock down all the pins with the first ball. Armed with the information from the real world, however, the program was able to zero in on the pins still standing. In all, six new system safety program initiatives were implemented.
The logistic managers of another older aircraft also took advantage of real world performance feedback to improve their system safety program. The aircraft involved was designed in the late ‘5Os and produced in the ‘60s. The original development program did not include a system safety program. Although the aircraft’s safety record was relatively good, program managers felt that many of the causes of accidents which continued to haunt the
Program should be the subject of system safety analyses. Fault trees were developed for the main problem area, the flight control system, and for a secondary problem area, the landing gear. Therefore, maintenance repair and overhaul procedures and critical parts redesign changes were implemented. These fault trees have also proven useful as an accident investigation tool.
In a different field, a new design of certain types of railroad cars was subjected to special safety analyses and safety tests during design and prototype development. It passed all tests. After the equipment was put into service, indications of new types of problems arose in accidents. The earlier tests, however, had satisfied the designers. The new information from accident investigations about the adverse safety performance of the cars did not produce a reconsideration of the original safety analyses and tests. Continuing accidents and outside pressures on the industry eventually forced implementation of a safety retrofitting program at a cost estimated to be over $200,000,000.
A related issue concerned emergency response actions in the accidents involving these cars. The guidelines for these actions, developed by thoughtful individuals with extensive experience and implemented through the resultant training, were followed faithfully by response officials. The results were disastrous. Numerous responders were fatally or seriously injured. As the toll mounted, the guidelines were reviewed, and with the benefit of re-analysis, were substantially improved. The result no known fatalities or disabling injuries since the old guidelines were changed to incorporate the feedback from good accident investigations, and the improved guidelines were implemented.
All the above cases have one common factor. The system safety plan lacked routine updates of the original safety analyses based on accident investigations. In some cases, lives and resources were unnecessarily lost in subsequent accident.
One of the problems that surfaces when this is attempted is the incompatibility between predictive safety analysis methods and the methods needed to investigate and analyze accidents. Fault tree methods dominate the former activity, while other events sequencing methods dominate the latter. Fault tree has been found to have narrow applications in accident investigations, and advanced investigative analysis methods have not been widely tried for predictive analyses. Compatible methods will be needed in the interests of efficiency if for no other reason.
WHAT SHOULD BE DONE?
The examples suggest several specific actions by the system safety community. These actions affect changes to safety program requirements and to system safety analysis and accident investigation practices.
The first step that must be taken is to acknowledge that the open loop flaw” exists and merits action.
Other steps that should then be taken include:
ABOUT THE AUTHORS:
Ludwig Benner Jr. is an adjunct faculty member and Field Instructor for the University of Southern California’s Institute of Safety and Systems Management. He was with the National Transportation Safety Board, as Chief of its Hazardous Materials Division. He directed numerous accident investigations, studies, evaluations of safeguards and procedures in hazardous materials transportation safety. He received his Chemical Engineering degree from Carnegie Institute of Technology. He is a registered Professional Safety Engineer. He has testified on safety matters before the U.S. Congress, and served on two Virginia Legislative Study Commissions, National Academy of Sciences committees and panels, and on several Federal agency safety projects and advisory groups. He is a Fellow of the System Safety Society.
 U.S. Department of Defense, “Military Standard System Safety Program Requirements,” MIL-STD-882A, June 28, 1977
 Brown, D.B., Systems Analysis and Design for Safety, Prentice Hall, Englewood Cliffs, N.J. 1976
 Hammer, W., Product Safety Management and Engineering, Prentice Hall, Englewood Cliffs, N.J. 1980.
 Vesley, W.E. et al, Fault Tree Handbook, NUREG 0492, U.S. Nuclear Regulatory Commission, Washington, D.C. 1981.
 Malasky, S., System Safety: Technology and Application, Garland STPM Press, New York 1982.
 Clemens, P.L., A Compendium of Hazard Identification & Evaluation Techniques for System Safety Application, Sverdrup Technology, Inc., AEDC Group, Arnold Air Force Station, TN 1981.
 Roland, H.E. and Moriarty, B., System Safety Engineering and Management. John Wiley, New York, 1983
Return to beginning of paper