This chapter provides principles on how to conduct incident investigations. Incident investigations consist of identifying the underlying causes (sometimes called the “root” causes) in a chain of events leading to an accident, including initiating events and failure in the mitigation, the lessons to be learnt and ways to prevent similar accidents in the future. The investigation should not be limited to determining the immediate or apparent cause(s). Investigations can be conducted both by industry and by public authorities.
OECD Guiding Principles for Chemical Accident Prevention, Preparedness and Response - Third Edition
10. Incident investigations
Abstract
“Incident investigation is a process for reporting, tracking, and investigating incidents that includes a formal process for investigating incidents, including staffing, performing, documenting, and tracking investigations of process safety incidents and the trending of incident and incident investigation data to identify recurring incidents. This process also manages the resolution and documentation of recommendations generated by the investigations.” (CCPS, 2022[1])
General principles
These general principles apply to investigations by both industry and public authorities. Investigations by different parties may have different objectives (for example, public authorities might be doing an investigation for purposes of enforcement). Nevertheless, investigations by industry and public authorities have a number of common elements, in particular with respect to methodologies to be used. Generally, industry-initiated investigations will be conducted separately from those initiated by public authorities, although joint investigations may be possible.
Investigate all incidents
All incidents involving hazardous substances should be investigated (Box 10.1).
Identify the root causes, develop recommendations to prevent recurrence and ensure implementation of the recommendations
The emphasis in conducting investigations should be on identifying the “root” causes in a chain of events leading to an accident, including initiating events and failure in the mitigation.
Finding the “root cause” of an incident is going to the point of determining the cause(s), as far as this is possible, which, if corrected, will prevent the recurrence of events that could lead to the same or a similar accident/near miss.
The objectives of root cause investigations should be to:
Determine why the incident(s) happened – what were the underlying cause(s), contributing cause(s) and chain of events.
Develop plans for corrective action to be taken by management in order to prevent related or similar incidents. The recommendations from investigations should be specific so that they can lead to corrections of technology, procedures or management systems. Generally, an investigation will lead to multiple recommendations for actions (i.e. no individual action will usually be sufficient): these should be prioritised and balanced to achieve the best level of safety possible.
Implement the plans. There should be an adequate follow-up to an investigation in order to verify that corrective actions have been taken and that they were implemented as intended.
Establish protocols for conducting investigations
Protocols should be established for conducting root cause investigations. The protocols should:
Specify the steps in the investigation process.
Identify the roles and responsibilities of the individuals involved in the investigation and how organisations will interact.
A team should be established for the investigation:
All members of the investigation team should have the appropriate knowledge, competency and experience to carry out investigations and to fulfil their identified roles and responsibilities.
The team should have a diverse membership with participants from different disciplines, with different skills, including members with human factors expertise and those with knowledge of the specific installation subject to the investigation. These could be employees involved with the operation and maintenance of the installation and their representatives.
The leadership of the team should as far as possible be independent of the operational unit under investigation.
Consideration should be given to the use of third parties, such as consultants, to manage or carry out the investigation or parts of the investigation, to evaluate the findings and help ensure the quality of the results as well as the recommendations set out in the report.
The appropriate point for stopping the investigation should be identified to help ensure that it is not stopped prematurely or unnecessarily lengthened.
Prepare an investigation report
Investigation reports should be prepared and should include, as a minimum, a factual chronology of the events leading up to the accident/near miss, a statement of the underlying (or root) causes and contributing causes, and recommendations for follow-up actions. The report should also document which theoretical causes of the accident have been discounted and why.
A basic agreed framework and use of common terminology for preparing investigation reports should be developed in order to facilitate sharing of information related to investigations. As far as practicable, terminology across sectors should be harmonised at an international level to allow improvements in data sharing, accident investigation techniques and communication of lessons learnt.
Review and improve the investigation process regularly
Following an investigation, there should be a review of the investigation process.
Methods and approaches used in investigations of incidents should be developed, improved and shared. This should include training in their application.
Box 10.1. Incident investigation
An investigation should be a fact-finding activity to learn from experience, not an exercise designed to allocate blame or liability. However, the conclusions of an investigation may lead to enforcement activities on the part of public authorities. Those involved should be made aware of this. There should be full co-operation between the operational staff at the installation and those involved in the investigation.
The emphasis when conducting investigations should be on identifying the underlying causes (sometimes called the “root” causes) in a chain of events leading to an accident, including initiating events and failure in the mitigation, the lessons to be learnt and ways to prevent similar accidents in the future. The investigation should not be limited to determining the immediate or apparent cause(s).
It should be recognised that accidents are generally the final stage of a long sequence of events in which there is a complex interplay between failures in technical, human and organisational systems.
Where “human factors” are involved, the cause should not simply be recorded as such. Rather, investigators should determine exactly what elements contributed to any human error. Such elements could include, e.g. boredom, stress, overwork or insufficient training. Other root causes could be: the system was not sufficiently error-tolerant; the operating procedures were not made available in written form or were not kept up-to-date; the procedures were not realistic, created difficult circumstances or called for illogical actions by the operator; there was poor ergonomic or system/technology design; the process design did not provide the operator with enough data or provided too much data to expect an appropriate response; staffing was insufficient; there was undue pressure on the operator or manager to sacrifice safety to higher productivity; or a reorganisation or a change in staff was not properly managed. Human factors are not limited to operator errors but may occur at different points in the hierarchy of the enterprise including, for example, at the level of those responsible for maintenance, management of change or permit-to-work systems, or at the level of supervisors and management. Examples of human factors, in addition to operator errors, can involve: problems with the transmission of knowledge, especially when experienced specialists retire; the complexity of the system, including process design and engineering; the ageing of plants and related repairs, without adequate maintenance and inspection; and the need to cope with changes in organisation or technology, including automation.
The procedure for root cause investigations of accidents should be systematic, thorough and fair. The procedures should consist of five main phases:
The first phase is before there is access to the accident site when a number of steps can be taken to further the investigation including: organising the investigation team; interviewing eyewitnesses; organising an information and tracking system; organising lists of factors which might have influenced the event; developing the preliminary list of scenarios; co-ordinating with the emergency response team to ensure the preservation of evidence; undertaking investigations outside the restricted areas; preparing for large volumes of information; taking aerial photographs; and recovering and preserving evidence including electronic data and paper documents.
The second phase consists of the initial site visit when it is important to document the condition of the site, revise investigation plans and identify time-sensitive evidence.
The third phase is during the ongoing investigation when the focus will be on recovery of evidence, reconstruction, analysis, testing and simulation of scenarios, and systematically affirming or denying scenarios.
The fourth phase involves the preparation of the investigation report and recommendations, which should be completed in a timely manner to avoid delays in the application of improvements.
The fifth phase is about the dissemination and communication of results.
In designing and implementing investigations, efforts should be made to address possible constraints, or challenges, to conduct effective investigations, such as:
The destruction or deterioration of evidence over time due to climatic and atmospheric conditions, the memory distortion of witnesses due to time, perception and other psychological factors, and the fact that the investigation occurs under stressful circumstances and may last for a number of months.
Limiting the possible scenarios examined and thereby biasing the collection of evidence to try to match the chosen scenarios.
Laws designed to promote public access to information, as well as laws to protect confidential business information, that can present hurdles to the collection and sharing of relevant evidence.
Constraints due to limited financial or human resources available, relative to the complexity of the investigation.
Insufficient trust among parties involved.
Insurance and liability issues.
Taking actions to make the site safe.
Principles to industry
Management of a hazardous installation should ensure that there is a prompt investigation and thorough analysis of all accidents and near misses involving hazardous substances.
Management of hazardous installations should adopt internal standards establishing clear guidance concerning the nature of the investigations that should be carried out, the individuals who should be involved and the criteria to be used to determine the extent of investigations for different types of incidents.
Management should encourage the identification and disclosure of near misses by establishing an atmosphere of trust, where employees do not fear being blamed, and by sending consistent messages to all employees regarding the importance of such disclosures. Management should establish a simple procedure for reporting near misses when identified.
The investigation and reporting process (either internal or third-party) should make recommendations to those individuals who have the authority and the resources to take any corrective actions.
Management should ensure that investigations are documented and the reports published.
The results of investigations of accidents and near misses (including recommendations and lessons learnt) should be shared throughout the enterprise, with other enterprises and with other relevant stakeholders, with due regard for the protection of confidential business information, in order to help avoid the same or similar problems in the future. Such reports can also be used in support of education and training activities.
Management should share relevant aspects of the investigation reports with public authorities. It is in the best interest of all parties to make the relevant aspects of the investigation reports publicly available, to the extent possible. Enterprises should seek to share key information about lessons learnt through available national and international databases or clearinghouses.
To help maintain a corporate memory, investigation reports and lessons learnt from incidents should be appropriately stored and easily available.
Management should seek out and use relevant experience of other enterprises with respect to investigations from sources such as accident reports, on the websites of enterprises, through national and international databases and in other accessible sources of information.
Principles to public authorities
Public authorities should ensure that accidents are investigated. The investigation may be carried out by different authorities depending on the legal regime.
Accidents with significant adverse effects on health, the environment or property, as well as other accidents that have the potential to provide significant insights for reducing risks should be investigated.
Investigations may also be carried out if it is suspected that a law or a regulation has been violated.
Investigations carried out by public authorities should be unbiased and trustworthy so that the public can have confidence in the outcomes.
Where more than one agency (national, regional and/or local) is involved in investigations, it is important that the activities of these agencies are co‑ordinated as far as possible with a clear definition of responsibilities.
Public authorities should consider which stakeholders should be involved in incident investigations and reviews of investigation reports.
Where appropriate, particularly following significant accidents, the investigation may be conducted by a group of experts that includes different individuals than those responsible for inspection of the installation and enforcement of the control framework (for example, a specially designated commission).
Public authorities should establish the criteria by which they will determine priorities for investigations (i.e. which accidents should be investigated and to what extent), taking into account resource constraints.
The selection criteria should be chosen to make the most effective use of resources and allow for timely action and results.
In this regard, public authorities should consider such factors as the history of similar accidents, the extent of damage to health, the environment and property, the number of facilities that use the process(es) involved in the accident and the likelihood that new information will result in improvements in safety, as well as the level of public concern.
Investigations should be documented and relevant information from the reports should be published in a form that will protect confidential and legal information, to inform other relevant stakeholders of the lessons learnt so that the safety of hazardous installations can be improved.
The reports should include sufficient background information to enable the investigation results to be useful in other situations.
The reports should include conclusions resulting from the analysis of accident data.
Public authorities should disseminate such reports to the industrial organisations within their country that might benefit from the lessons learnt from the investigation.
Public authorities should facilitate the sharing of investigation reports with industry and in an international context using for example chemical accidents databases (Box 10.2) and, in particular, to improve information sharing concerning causes of accidents.
Public authorities should actively communicate the results and lessons learnt with the affected local population and community representatives.
Public authorities should be responsible for ensuring that appropriate action is taken in light of the recommendations set out in investigation reports.
Adequate resources should be provided to public authorities to carry out their responsibilities with respect to accident investigations and the dissemination of related information.
Box 10.2. Examples of chemical accidents databases
eMARS (https://emars.jrc.ec.europa.eu/)
The European Major Accident Reporting System (eMARS) database is maintained by the European Commission at the European Union Joint Research Centre (JRC), Major Accident Hazards Bureau. The database is the implementation of Article 21, No. 4, of the Seveso III Directive.
eNatech (https://enatech.jrc.ec.europa.eu/)
The aim of this database is to systematically collect information on Natural Hazard Triggered Technological Accidents (Natech) that occur worldwide and allow the searching and analysis of Natech accident reports for lesson-learning purposes.
The ZEMA database is the central reporting system for the German Major Accidents Ordinance.
The ARIA database is maintained by BARPI – an office of the French Ministry for Ecological Transition and Solidarity (i.e. Environment Ministry). It contains more than 50 000 entries. These cover a wide range of technological accidents from within France and elsewhere.
TUKES VARO (http://varo.tukes.fi/)
The current web version of the VARO register was published in 2013. The VARO register contains information collected by the Finnish Safety and Chemicals Agency (Tukes) from various sources on accidents that have occurred in Finland.
Failure Knowledge Database (http://www.shippai.org/fkd/en/index.html)
The Failure Knowledge Database has been developed by the Japan Science and Technology Agency (JST) and is available in Japanese and English. It covers a range of technologies and the accident reports are classified according to these technologies.
Relational Information System for Chemical Accidents Database (RISCAD) (https://riscad.aist-riss.jp/)
RISCAD is a Japanese database developed by the National Institute of Advanced Industrial Science and Technology (AIST) and the JST. The interface and content are provided in Japanese and English. The reports cover not only the chemical industry but also other technologies such as coal mining.
United States Chemical Safety Board (CSB) (http://www.csb.gov)
The U.S. CSB does not run a database in the strict sense of the word. However, from its website, it is possible to find information including investigation reports, videos and animations of accidents investigated by the CSB.
Source: IChemE (2020[2]), “Accidents Databases – A review”, Loss Prevention Bulletin, No. 275.
References
[1] CCPS (2022), Introduction to Incident Investigation, Center for Chemical Process Safety, https://www.aiche.org/ccps/introduction-incident-investigation.
[2] IChemE (2020), Accidents Databases – A review.