Is FMEA suitable for cyber risk analysis?

Definition of the FMEA method?

FMEA was first created by the U.S Army in the 1940s. It was later theorised, in the 1960s, by the American company McDonnell Douglas. FMEA focuses on the list of components of an item in order to collect data on its failures, as well as on frequency and consequences of those failures. It has been used by NASA, by the US arms industry, and by car manufacturers such as Toyota, Ford, Nissan, Peugeot, and BMW.

‍

What is FMEA?

FMEA stands for “Failure Mode and Effect Analysis” (FMECA, “Failure Mode Effects and Criticality Analysis”, may sometimes be used to include criticality analysis). This process is used to obtain a predictive analysis of the reliability of a system. It is based on:

Identifying the potential “failure modes” of a product / system / process, the consequences of which are likely to affect its proper functioning;
Assessing the risks associated with the appearance of failures, according to a criticality index;
Conceptualising preventive measures and corrective actions to be carried out either during the design of the system or during its operation.

Like the HAZOP analysis, FMEA advantageously offers an exhaustive functional analysis as part of a comprehensive quality approach aimed at reaching maximum operating safety. It is also carried out by task forces which bring together different skills.

FMEA easily applies to IT systems as part of the risk management of cybersecurity breaches. Its main objective is indeed to detect security or reliability deficiencies of a system.

‍

Different types of Failure Mode and Effect Analysis

The theory generally differentiates between two types of FMEA:

Design FMEA seeks to measure the reliability and safety of a product upstream of its design;
This same analysis, when applied to processes, is called Process FMEA. It must ensure the quality of the product during its production.

Some analysts add to these two traditional types of FMEA, Machinery FMEA which focuses on the production chain, FMEA-MSR which purpose is to analyse failures that occurred when the product was used by the customer.

More generally, FMEA applies to systems, in some instances, it applies to information systems.

‍

Whom is FMEA addressed to?

In general, FMEA meets the expectations of companies who want to ensure the reliability, maintainability and security of a system or product. It is also a process which qualifies your organisation for certifications, and it ensures compliance with certain documents. Here are a few fields of application:

Design FMEA is used in the manufacturing industry to create construction plans and schematics for the purpose of obtaining patents;
Process FMEA helps to calibrate quality control;
Machinery FMEA is useful for establishing production line maintenance guides;
Analysing the risks associated with flows helps to design inventory management plans.

‍

What are the benefits from Failure Mode and Effect Analysis?

The main objective of FMEA is to design preventive or corrective actions. This is an approach based on deduction. It systematises failure modes in the operation of a product or a system, by analysing the causes and effects of those failures. It helps reduce the potential risks linked to a system – cyber risks related to your information systems, for instance.

Companies that use FMEA to ensure their computer security aim to continuously improve their information system in order to limit failure occurrences. They examine the consequences of cyber security failures by performing tests. Then, they rank the various cyber risks they identified, by examining their frequency, severity and detectability. This is why it works well with cyber risk mapping methods.

‍

‍

We review the most common methods for analyzing cyber risk. Our goal is to help you implement an effective risk management strategy.

How does FMEA apply to cybersecurity?

When applied to cybersecurity, FMEA can be broken down into different stages: preparation, analysis and follow-up. This non-prescriptive approach gives latitude to the task force to operate. This is why it also tends to yield rather subjective results, those are more useful for managing the risks than for preventing them. Indeed, FMEA is a method based on an “expert” qualitative assessment, it uses ordinal scales as well as nominal scales to estimate the frequency and gravity of failures, which is the reason why in many cases, it is difficult to rerun the analysis and obtain the same results.

Preparing the scope of failure analysis

The first steps of FMEA involve preparing the groundwork for analysis:

1 / Assembling the task force around 5 to 10 multidisciplinary expert profiles

This group must in particular consist of a manager, for example the Director of Information Systems, capable of making decisions and initiating the proposed actions. It includes developers, but also participants from other departments likely to be affected by cybersecurity breaches: communication, quality, maintenance, supplier, clients, etc.

2 / Launch of FMEA

The task force meets to define the cyber-security issues in regard to the execution of FMEA. It details the objectives of the method, the documentary resources available and the various actors of the analysis. This step also provides an opportunity to establish a methodology for rating the risk criticality indices detailed below, as well as the criticality threshold deemed inadmissible for the task force. As with other qualitative approaches, it is establishing those scales which can end up being the hardest and the most difficult to replicate from an analysis to another.

Analysing cyber risks and their criticality

Once the action of the task force has been set up, it is time to analyse the risks:

3 / Analysis of failure modes and their effects

The task force identifies past and potential cybersecurity breaches by schematising the computer system. This graphic can take the form of a tree structure. It is used to detail the risks, according to the degree of precision set by the company. The logic of the analysis should be orientated towards the identification of failure modes related to how the information system is expected to operate:

For each identified failure mode, possible causes are to be sought ;
The major impacts on IT users of each case of cause/failure should also be researched.
Ultimately, the task force needs to identify the most likely detection signals for each cause/failure combination.

4 / Risk assessment of identified failures.

Severity, frequency of occurrence and detection indices are used to determine the criticality of risks:

Criticality (C) = Occurrence (O) × Severity (S) × Detection (D).

Criticality, is also often referred to as RPN, for Risk Priority Number.

Each company establishes its own rating grid. A simple method consists in assigning an index number from 1 to 4 to each criterion:

Criterion D measures the capacity of failure detection: 1 for elementary, 2 for easy, 3 for average and 4 for delicate;
The O indicates the probability of the failure occurring, from lowest to highest.
Criterion S rates the severity of the failure’s impact on the IT, from minor to severe.
The higher the criticality C, the more serious the failure mode represents. In the case of a rating of indices from 1 to 4, C represents a serious risk from index 25.

When the cyber risks criticality threshold is reached, the task force needs to come up with corrective actions.

However, this rating grid is based on subjective perceptions of the task force members. So, it would not perform really well in another company. Indeed, FMEA is a qualitative rather than quantitative cybersecurity approach. Because they are based on nominal or ordinal scales, its results are estimations that cannot systematically be verified and do not depend on a mathematical probability.

For a quantitative and projective approach, the VaR (Value at Risk) method is more appropriate. It consists in first calculating expected financial loss amounts in the event of a cyberattack, then linking those numbers to the probability of occurrence of a cyber risk in a given timeframe.

‍

Planning corrective actions and ensuring follow-up

Corrective actions go hand in hand with regular monitoring of changes in the criticality of cyber risks:

5 / Corrective actions must reduce the criticality of the failures. As part of a strategic analysis of cyber risks, this may involve training employees, resorting to a firewall or antivirus supplier, or making changes to internal regulations concerning cybersecurity.

6 / The task force lists the critical risks which should be monitored and tested on a regular basis. Each corrective measure also calls for a person in charge, someone responsible for implementing the action plan and regularly assessing failure modes - causes - effects - criticality of risks.

Corrective actions should also be enforced until all criticality indices are below established thresholds. Criticality should therefore be regularly recalculated, with the new values of Occurrence, Severity and Detection. Ideally, all high Severity indices should also be combined with low Frequency and Detection indices.