The FMEA method involves a task force which is designed to identify business risks and to search for means of prevention and correction. This method of assessing failures was developed for the industrial sector but sometimes applies to information systems. Indeed, through FMEA, you can rank cyber risks from insignificant to unacceptable and then take all the necessary preventive measures. Yet, is this collaborative risk assessment method enough to build an IT security strategy?
FMEA was first created by the U.S Army in the 1940s. It was later theorised, in the 1960s, by the American company McDonnell Douglas. FMEA focuses on the list of components of an item in order to collect data on its failures, as well as on frequency and consequences of those failures. It has been used by NASA, by the US arms industry, and by car manufacturers such as Toyota, Ford, Nissan, Peugeot, and BMW.
FMEA stands for “Failure Mode and Effect Analysis” (FMECA, “Failure Mode Effects and Criticality Analysis”, may sometimes be used to include criticality analysis). This process is used to obtain a predictive analysis of the reliability of a system. It is based on:
Like the HAZOP analysis, FMEA advantageously offers an exhaustive functional analysis as part of a comprehensive quality approach aimed at reaching maximum operating safety. It is also carried out by task forces which bring together different skills.
FMEA easily applies to IT systems as part of the risk management of cybersecurity breaches. Its main objective is indeed to detect security or reliability deficiencies of a system.
The theory generally differentiates between two types of FMEA:
Some analysts add to these two traditional types of FMEA, Machinery FMEA which focuses on the production chain, FMEA-MSR which purpose is to analyse failures that occurred when the product was used by the customer.
More generally, FMEA applies to systems, in some instances, it applies to information systems.
In general, FMEA meets the expectations of companies who want to ensure the reliability, maintainability and security of a system or product. It is also a process which qualifies your organisation for certifications, and it ensures compliance with certain documents. Here are a few fields of application:
The main objective of FMEA is to design preventive or corrective actions. This is an approach based on deduction. It systematises failure modes in the operation of a product or a system, by analysing the causes and effects of those failures. It helps reduce the potential risks linked to a system – cyber risks related to your information systems, for instance.
Companies that use FMEA to ensure their computer security aim to continuously improve their information system in order to limit failure occurrences. They examine the consequences of cyber security failures by performing tests. Then, they rank the various cyber risks they identified, by examining their frequency, severity and detectability. This is why it works well with cyber risk mapping methods.
When applied to cybersecurity, FMEA can be broken down into different stages: preparation, analysis and follow-up. This non-prescriptive approach gives latitude to the task force to operate. This is why it also tends to yield rather subjective results, those are more useful for managing the risks than for preventing them. Indeed, FMEA is a method based on an “expert” qualitative assessment, it uses ordinal scales as well as nominal scales to estimate the frequency and gravity of failures, which is the reason why in many cases, it is difficult to rerun the analysis and obtain the same results.
The first steps of FMEA involve preparing the groundwork for analysis:
1 / Assembling the task force around 5 to 10 multidisciplinary expert profiles
This group must in particular consist of a manager, for example the Director of Information Systems, capable of making decisions and initiating the proposed actions. It includes developers, but also participants from other departments likely to be affected by cybersecurity breaches: communication, quality, maintenance, supplier, clients, etc.
2 / Launch of FMEA
The task force meets to define the cyber-security issues in regard to the execution of FMEA. It details the objectives of the method, the documentary resources available and the various actors of the analysis. This step also provides an opportunity to establish a methodology for rating the risk criticality indices detailed below, as well as the criticality threshold deemed inadmissible for the task force. As with other qualitative approaches, it is establishing those scales which can end up being the hardest and the most difficult to replicate from an analysis to another.
Once the action of the task force has been set up, it is time to analyse the risks:
3 / Analysis of failure modes and their effects
The task force identifies past and potential cybersecurity breaches by schematising the computer system. This graphic can take the form of a tree structure. It is used to detail the risks, according to the degree of precision set by the company. The logic of the analysis should be orientated towards the identification of failure modes related to how the information system is expected to operate:
4 / Risk assessment of identified failures.
Severity, frequency of occurrence and detection indices are used to determine the criticality of risks:
Criticality (C) = Occurrence (O) × Severity (S) × Detection (D).
Criticality, is also often referred to as RPN, for Risk Priority Number.
Each company establishes its own rating grid. A simple method consists in assigning an index number from 1 to 4 to each criterion:
When the cyber risks criticality threshold is reached, the task force needs to come up with corrective actions.
However, this rating grid is based on subjective perceptions of the task force members. So, it would not perform really well in another company. Indeed, FMEA is a qualitative rather than quantitative cybersecurity approach. Because they are based on nominal or ordinal scales, its results are estimations that cannot systematically be verified and do not depend on a mathematical probability.
For a quantitative and projective approach, the VaR (Value at Risk) method is more appropriate. It consists in first calculating expected financial loss amounts in the event of a cyberattack, then linking those numbers to the probability of occurrence of a cyber risk in a given timeframe.
Corrective actions go hand in hand with regular monitoring of changes in the criticality of cyber risks:
5 / Corrective actions must reduce the criticality of the failures. As part of a strategic analysis of cyber risks, this may involve training employees, resorting to a firewall or antivirus supplier, or making changes to internal regulations concerning cybersecurity.
6 / The task force lists the critical risks which should be monitored and tested on a regular basis. Each corrective measure also calls for a person in charge, someone responsible for implementing the action plan and regularly assessing failure modes - causes - effects - criticality of risks.
Corrective actions should also be enforced until all criticality indices are below established thresholds. Criticality should therefore be regularly recalculated, with the new values of Occurrence, Severity and Detection. Ideally, all high Severity indices should also be combined with low Frequency and Detection indices.
FMEA is either used before the launch of a system so as to avoid failures or after having identified real failures in order to consider corrective measures.
FMEA makes it possible to detect risks of failure and, by extension, to detect and qualitatively assess given malicious threats. It can then be used to formulate a first remedial action plan. Justifying and prioritising investments will however call for a quantitative approach.
FMEA originally was a support method to a quality approach in the industrial sector. It is based on the detection of failure modes which correspond to a level of criticality calculated taking into account the occurrence, the possibilities of detection and the severity of the risk. Conversely, it is not a suitable tool for a quantitative forecast of potential financial losses associated with a given cyber risk.
related to Cyber Risk Quantification and cybersecurity