The Investigation Process Research Resource Site
A Pro Bono site with hundreds of resources for Investigation Investigators
Home Page Site Guidance FAQs Old News Site inputs Forums
. . . .. . . . . . . . last updated 8/8/09

INVESTIGATING
INVESTIGATIONS

to advance the
State-of-the-Art of
investigations, through
investigation process
research.



Research Resources:

Search site for::


Launched Aug 26 1996.

 

Foreword:.

This is the second of two papers from this publication dealing with investigations posted at this internet site. This paper describes the MORT safety assurance program, including investigation-related research findings, which can serve as a model for investigating management practices during investigations. The other paper by L Benner addresses hypothesis generation during investigations, and how that can be done. It also has the table of contents for the source publication.

Though fourty years old, the papers provide useful insights into the thinking of the times for investigation researchers.

Source: National Bureaus of Standards SPECIAL PUBLICATION 482


Rare Event/Accident Research Methodology

Proceedings of a Workshop held at the National Bureau of Standards

Gaithersburg, Maryland, May 26-28, 1976

Edited by V. J. Pezoldt

Institute for Applied Technology
National Bureau of Standards
Washington, D.C. 20234

U.S. DEPARTMENT OF COMMERCE
Juanita M. Kreps, Secretary
Dr. Sidney Harman, Under Secretary
Jordan J. Baruch, Assistant Secretary for Science and Technology

NATIONAL BUREAU OF STANDARDS,
Ernest Ambler, Acting Director

Issued July 1977


INVESTIGATIVE METHODS USEFUL IN SAFETY

William G. Johnson

Energy Research and Development Administration

This paper was prepared by WILLIAM G. JOHNSON and discussed at the workshop by ROBERT EICHER. Mr. Eicher has worked closely with Johnson at the Energy Research and Development Administration (ERDA) (formerly AEC) where he is Special Hazards Engineer in the Division of Safety, Standards, and Compliance. Mr. Johnson, retired general manager of the National Safety Council, is the author of MORT-The Management Oversight and Risk Tree, which he prepared for the Atomic Energy Commission, Division of Occupational Safety.

MORT is a major output of ERDA's continuing development of a safety management methodology for reducing accident rates. Application of MORT has been primarily in occupational safety, however, many of the techniques and methods employed can be useful in consumer product technology as well. The paper presented here serves, in large part, to introduce and annotate MORT for the workshop participants.

For the past six years ERDA (formerly AEC) has been developing a safety management methodology to augment its basic physical research goals. AEC had had safety programs and records equal to the best practices of private industry and government. Nevertheless, the goal of the development work was stated as "an order of magnitude reduction in rates and risks."

The investigative method used in the developing "superlative safety assurance systems" for ERDA was as follows:

  1. Conceptual framework -- development of a trial synthesis from the best organizational practices, aerospace system safety methods, safety related research, and behavioral, organizational and managerial sciences;
  2. Further development from analysis or reinvestigation of accidents;
  3. Trial in an organization (Aerojet Nuclear), and then restatement of the synthesis;
  4. Application, training, and technical assistance agency-wide;
  5. Ongoing development, especially at Aerojet, but also as numerous laboratories and production facilities.

The primary outputs have been two:

  • MORT Oversight and Risk Tree, U. S. Atomic Energy Commission, February 12, 1973, SAN 821-2.
  • Accident/Incident Investigation, Energy Research and Development Administration August 1, 1975, ERDA-76-20
  • In progress is Workbook on Measurement of Safety Assurance Programs, A review draft is scheduled for May and a pilot training draft for the Fall, 1976. The Workbook will offer numerous examples of the measurement data believed needed, augmenting the hundreds of specific questions already posed in MORT. A method culminating many specific measurements into assessment of eight broad system criteria will be proposed.

    A growing number of supportive publications are coming from System Safety Development Center at Aerojet:

    Occupancy-Use Readiness Manual,

    Human Factors in Design,

    A Contractor Guide to Advance Preparation for Accident Investigation,

    MORT User's Manual (to be published shortly)

    In addition a variety of training aids and experimental forms are of interest as to methodology.

    Organizational research in safety has the following obstacles:

    1. The rarity of major events,

    2. Difficulty of cross-context comparisons of complex organizations,

    3. Paucity of basic research or relevant data,

    4. Lack of research orientation among safety practitioners.

    The synthesis has supplied a conceptual framework wherein investigation becomes more searching and testing of control programs is more nearly possible.

    In the course of the six-year project a wide variety of research and investigation methods have been assimilated, and some new methods have been developed.

    The three years subsequent to publication of MORT have included widespread examination and use of the systems by managers and scientists of many disciplines. The usage has confirmed the basic 1973 report, and resulted in extension and development rather than drastic change.

    The relevance of the "superlative safety systems" for consumer product technology lies in two areas:

    1. Product safety up to the point of consumer use is amenable to the same control processes (given the substitution of appropriate use and user data, consumer rather than employee);

    2. Some elements of the system may have relevance or usefulness in the consumer use phase, but because of the obvious differences between controllable employee behavior and typical consumer behavior, the use phase requires separate evaluation of any given study method for a control or technique.

    * * *

    For the convenience of Workshop participants, a reprint on MORT from the Journal of Safety Research, March 1975, is attached as Appendix A. Also, ERDA has made available copies of the above publications to Workshop participants.

    This paper could be called a "Smorgasbord of Investigative Methods." The intent is to call out the developments within the project briefly. Then, if the reader is interested, the texts or references can be consulted. Many specific measurements and experiments are described in the text.

    The order of listing, with a few exceptions, follows the order of the MORT text.

    I. Introduction.

    Among other concepts, the significance of codes, standards and regulations is presented. They are minimal. Useful and necessary to put a floor under performance, but not the route to the optimum performance desired by all.

    Top

    II. What Procedures Hazards

    1.-Accident/Incident definition. An essentially new definition is presented {p25) We developed for functional use in design of safety measures. (Short definitions may be superior for tabulating events, but do not lead into essential preventive aspects.) Also Hazard and Risk are defined.

    2. Energy and Barriers. The Gibson-Haddon concepts are simple and objective, and have many advantages for prevention analysis. Systematic, sequential analysis of possible barriers has repeatedly produced innovative safety devices of superior efficacy.

    3. Frequency-Severity Matrices. Different accident sources typically have different slope lines. A slope less than the 45deg. line of balance is a danger sign, and appears to have catastrophe prediction value.

    4. Error as Accident Cause. The substantial literature on error measurement and on error reduction methods can be brought to bear on safety.

      1. Rigby's "error tolerance limits" (p. 52) useful in studying degrees of control, provide macabre humor when a limit turns out to be "forensic."
      2. The Surry decision model (p 54) has been under careful observation. While it seems conceptually correct, its practical significance seems limited under circumstances of typically subtle danger buildup often conditioned by changes remote in time, and the blinding speeds of accident occurrence. If non-injury errors or unreviewed changes were seen as danger signals, the Surry scheme could be useful. (See McKie, p. 61, for a natural history note.)
      3. The extensive reliability and error prevention work of ERDA's weapon production program is essentially product safety. Much in the way of useful concepts and data on error-provocative situations has emerged. A key point in the work of Sandia Lab is the need for error rates of the general form: errors/ opportunities, i.e., the need for user data as well as accident data.
    5. The Role of Change. This is one of the most important MORT developments. The basic concept of Kepner-Tregoe in a stable system, change is the cause of the trouble- -was initially studied by NSC and has been under intensive study by ERDA. Change analysis is a most powerful technique for identifying obscure causes. It is particularly useful in product quality assurance.

    (The form on page 67 is not a form to be filled out. The left hand tabs are intended only to be indicative of the event-related factors to be inserted.)

    Most (perhaps all) serious accidents have one or more changes, usually detectable.

    Equally powerful as a preventive medium is "Change Based Potential Problem Analysis." (The form on page 69, as now used, has a column "Effects of Change" inserted before "Preventive Counter-Change.")

    This analysis could be significant in design of a revised model of a product. Call out all differences (e.g., as irrelevant as color), then analyze Effects of Change. It often turns out that so-called irrelevant changes have significance. This inexpensive, perceptive form of analysis should be a requirement on every project and for every significant change.

    The effects of changes are directional and exponential - - quite a challenge.

    6. Sequences in Accident Causation. MORT (1973) is essentially just descriptive of a phenomenon lengthy sequences.

    The use of sequence as an analytic device was developed by Benner and Wakeland for NTSB. Following their leadership the ERDA Accident/Incident Investigation Manual (AIM), pages 4-3 to 4-8, and Appendix I, discuss the method, "Events and Causal Factors Sequence Diagram," and show illustrative cases. The sequence diagram is the usual focal point of analysis.

    Sequence diagrams, coded by MORT codes, now seem a realistic possibility for causal information coding and retrieval.

    7. The Role of Risk Management. The scientific literature is summarized, but provides little practical material. The concept which associates risk with the profitable, creative, and "fun?? side of a line of balance is articulated. The simplest useful model of risk assessment is described.

    Top

    III. How to Reduce Hazards

    8. Integrating System Safety with Present Best Practices. System safely, as ~ is project related. Insofar as product safety is a project by project effort, system safety can be a recommended approach. The text provided by Willie Hammer[1] (a Workshop participant) is a valuable guide.

    The need to integrate system safety with the best organizational practices arises from the ongoing, continuous operation of the organization, not usually seen as merely as series of projects.

    The MORT synthesis incorporates system safety with numerous references (Hammer's text was not then available). However, by now MORT also incorporates many methods and criteria not customarily found in system safety, e.g., change analysis, independent review, procedure criteria, the full spectrum of human factors concerns, ongoing monitoring and audit systems, and the basic management policy and implementation factor.

    The basic position is that MORT and project system safety are noncompetitive. Start with whichever one seems appropriate for an ongoing organization or a project and then add the other.

    9. Method vs Content. Analytic or observational method (with good people) has repeatedly shown that it is Fast and Cheap, and that it can find some deficiencies not usually revealed by hardware, technical or process specialists. Examples again are error, change, and sequence analysis. 10. Safety, efficiency and performance are Congruous. Congruous. This tenet can be illustrated with ~ beliefs in some business circles. It is not scientifically proven. This carries over to consumer products which commonly have energy use or conservation methods of performing work.

    If organizational safety program is redefined as those elements likely to improve safety and performance, the mutual reinforcement is enhanced. Increased positive emphasis on safety is then supported by management. Emphasis solely on codes, standards and regulation will not suffice, nor does it tend to build management support.

    An inherent relation between energy, control, and performance (p. 109) underlies much 6f what we undertake in modern society. When control is not in scale, performance suffers and accidents result.

    11. A Safety System as a General Management System. We reinvent nature's biological safety mechanism with essential feedback as a "safety system," or a wide variety of managerial systems.

    The beauty of the simple, six element system (p. 113) is that under it we can tree (successive elaborations of essential detail) everything that must be said about safety methodology. For example:

    Simple - six elements - p. 113

    Next level of detail - p. 114

    More detail - MORT



    Congruous with general
    management - p. 128
    More detail - MORT

    Note a tree is a sequence - left to right and top to bottom.

    A tier in a tree is a process - p. 194.

    Additional trees can give more detail, e.g., Independent Review tree, Exhibits 8 plus 4-7 and 9-12, or for other subjects, Exhibits 3 and 12

    12. General Safety Program Theses. A simple set of propositions reflects what has gone before and what will follow.

    IV. MORT. The appendix to this paper provides the necessary discussion. Seems complex, remember that it must be "necessary and sufficient," and also provide redundant controls.

    Top

    V. Management Implementation of the Safety System. Ten elements susceptible to measurement and evaluation are listed and described. The elements are correct and basic if the product is a reactor, a process facility or a product used by others (industry or government) with a strong concern for safety.

    The management criteria have not been tested on a consumer product organization. Product safety specialists have opinions that the management elements for product safety are similar, if not identical. A study of product-related management systems could bring about major improvements in product safety. This thesis will be repeated in the Hazard Analysis part.

    21. Risk Assessment System. Models varying from simple to complex are presented. Probability goals for safety are now feasible and practical.

    ERDA now has several active investigations in analyzing risks in transportation of hazardous materials, using the NTSB model. This model, converted to general organizational problems, is shown (p. 219).

    Top

    VI. Hazard Analysis Process. The Hazard Analysis Process must be conceptualized and defined. The failure to do so is probably the most glaring single weakness in present-day professional safety work.

    22. System_Safety_and Hazard_Anaylsis. Two concepts, Life Cycle and Safety Precedence Sequence, were well articulated by NASA (but not AEC at the time).

    System safety costs (perhaps 5% of engineering costs, and a tiny fraction of total production costs) are essentially small. Managers and engineers commonly see them as expensive; this has never been shown. What can be expensive are the hardware or control systems shown necessary by analyses; then ~ may be necessary and well based.

    23. Hazard Analysis Process_Defined. The listing referred to on page 235 is "MORT IV," an improvement over the MORT charts in the text, but similar to the large chart which accompanied the Safety Research Journal (MORT V).

    The need for the elements in this process has been confirmed in spilled blood and piles of rubbish and ashes. Lack of articulation of such a process is the grave weakness in product and other design.

    To facilitate initiation of an improved process, the simplification shown in Figure 1 was developed. It shows the "big six" of hazard analysis which will merit brief comment here:

    a. Information search
    b. Codes, standards and regulations;
    c. Accident/incident data. Without any desire to relieve manufacturers, the difficult matter of nationwide product data collection and retrieval maybe a valuable service for government action.
    d. Change Analysis, already discussed.
    e. Failure Mode and Probability/Consequences analysis of residual priority problem lists.
    f. Human Factors Review, covered in MORT chapter 26. The MORT tree analysis shown follows common practices, but has demonstrated no efficacy in upgrading investigations or reviews by untrained -personnel. We fell back on a classification of skill levels, a low threshold of a "cookbook11 or a sensitizing course such as the Aerojet manual cited at the outset. The skill applied to product design is easily classifiable, and the threshold should be a minimum requirement in modern society. Here standards for design will not suffice.
    g. MORT analysis.
    h. Nonnormal operating modes--start up, shut down, repair, failure, anomalies, etc.--search is a test of imagination.

    The final level of required system safety analysis is negotiable from a scaling mechanism--big problems, big analysis--and for a major system safety effort. Hammer is, again, the best guide.

    We can now return to some of the other major elements in Figure 1.

    Good Design Organization. (Chapter 27). Strangely, a general format of a design process, onto which a safety process can be easily fastened, is apparently lacking. The points of interactions,

    particularly early safety input, should be defined.

    The role of reliability and quality assurance are understated in MORT (1973), especially for product safety.

    What is called for is a three-level investigation or audit of a product design function:

    a. Basic design organization.

    b. Quality assurance aspects (p. 281).

    c. Hazard analysis process (p. 235).

    Only the three in combination could give assurance of an error-free design process.

    Independent Review. This concept was apparently invented by the AEC, and it is a powerful factor in detecting oversights and Omissions.

    Trade-offs are not a functional point of design clearly called out in MORT. Safety is a frequent loser in trade-off sessions because the predictive safety data for cost/benefit comparisons are weak relative to other concerns. Therefore, the injunction to always put in the values is made, which will be clear when serious accidents occur (p. 256).

    Historical Note. The automobile industry can under excruciating pain ~ Senate hearings in 1963 and 1964. Its management policies and implementation, its safety research, its hazard analysis process, and its trade-offs (style for safety) were found less than adequate (LTA in MORT). One company president has said he wishes he knew prior to 1963 what he now knows. The lesson for other manufacturers seems clear- -audit and measure processes against ideals, or public agencies will.

    Top

    VII. The Work Flow Process

    Operational Readiness is a test not covered in MORT, but not has the Aerojet guidelines listed above. The process therein shows analytic trees which, with slight adaptation, could be used to determine that a manufacturer was ready to produce a safe, trouble-free product.

    However, a simple guideline for universal use is the Nertney Wheel (p 254).

    With one exception, Procedures, the factors of Supervision, Employee Training and Performance Motivation seem inappropriate for consumer product technology.

    Procedures (chapter 32). Tested criteria for evaluating procedures are listed. (Seven of ten procedures in a well-run organization flunked the test.) It seems likely most manufacturers' procedures would show at least as high a failure rate.

    New View on a Human Factors Process. MORT divides the human aspects into several ~ as relevant to a MORT process. To tie it all together, Figure 2 was developed. The following numbered notes explain the steps in the process. (Other steps are clear, or see MORT index.)

    1. Any number of tests and guides. See special bibliography in MORT.
    2. See Procedures above and MORT, Chapter 32.
    3. Behavior change - Appendix H, Innovation Diffusion, and Appendix I, Acceptance of Proceduralized Systems.
    4. Mager and Pipe, Analyzing Performance Problems, Lear Siegler, Inc. Fearon Publishers, Belmont, California 1970

    It should be most fruitful to measure the Human Factors Process as it would bear on designers in an organization. They are people, and need support and assistance.

    Top

    VIII. Information Systems.

    The specific criteria listed 343 would not likely be filled by a product data system. Again, government may have a role to fill. The manufacturer does have certain inescapable obligations.

    36. Technical Information. Can be well organized and approach comprehensive change

    37. Monitoring Systems. Some can be developed.

    1. Error sampling is possible by field study.
    2. Failure reporting, especially from controlled groups, such as customer service.
    3. Critical Incident Studies (called RSO's in MORT because they are a nuclear industry!). This is a powerful and perceptive tool for product design.
    4. Accident/incident reporting systems.

    Data reduction for management or design use is required.

    38. Accident Investigation. See the new Manual. A few thorough, in depth investigations, including a manufacturer's investigations of the role of his systems, will provide more useful data than superficial investigations of many hundreds or thousands of accidents. ERDA investigations of serious events reveal on the order of forty organizational improvement possibilities per case, about half being system improvements.

    Audits of design, production, quality assurance and other relevant programs by high level, independent groups (internal or external) have been shown to be powerful and searching methods of discovering needed improvements, particularly when good audit and process criteria are used.

    39. The Organization's Information System. This section extends earlier comments and findings. Primary is the finding that EDP coded systems in industry have produced information of modest diagnostic value, but almost no information for action decisions.

    40. National Information System. Many pieces and parts are available but not ~ a designer is placed under an unwarranted handicap. Governmental initiative seems necessary to make diverse information stores easily accessible to smaller organizations.

    41. Measurement Techniques.

    1. The Frequency-Severity matrix is further developed (p. 420).
    2. Extreme Value_Predictions (p. 426) are proving to be a useful means of estimating the probability of more serious events from past "worst event" data <[2]
    3. Simplistic rates commonly used in industry to measure industrial safety performance are of s light value (p. 432). Also see Appendix K for results of an NSC Symposium on Measurement, pertinent for many safety-related problems.

    Top

    In summary, the experiences of the last six years suggest the following needs:

    1. A compilation or synthesis of the best available practices to form an ideal measurement yardstick.

    2. Systematic investigation of accidents in sufficient depth to modify and improve the ideal system. (A 11state of the art" for the next three.)

    3. Pilot tests and evaluations of the system.

    4. Research and evaluation under conditions where variables are at least known, even if not quantified.

    5. Implementation is a long-term project, five to ten years. However, short-term gains result from each improvement; differences can be seen within a year.

    These needs apply directly to product safety and commercial transportation, and perhaps other fields as well.

    [1] Hammer, Willie, Handbook of System and Product Safety, Prentice Hall, 1972.

    [2] *Gumbel, Emil J. "Statistical theory of Extreme Values and Some Practical Applications," National Bureau of Standards, Applied Mathematics Series, 1954.

    Go to Benner paper from same source.