Root Cause of Alerts
Intelligent decision aids created by ThinkTank Maths help human operators of complex systems to deal with overwhelming alert floods, reducing alert levels by a factor of 10.
Being a systems operator for a large Retail Bank is a challenging job that requires years of training and experience. Systems operators have to be constantly vigilant while monitoring a flood of alerts from the bank’s vast server estate that runs customer-facing services. They have to distinguish between important and irrelevant messages and escalate serious problems to support services as quickly as possible. However, the complexity of the computer infrastructure required in banking has increased to the point where the operators are faced with an endless flood of mostly low-priority alerts.
The bank’s Group Architecture team asked ThinkTank Maths (TTM) to explore mathematical ways to stem the alert flood and to provide a decision aid to the operators.
Based on our Trusted Reasoning Architecture, TTM was able to implement an intelligent decision aid that filtered out unnecessary alerts and highlighted the underlying issue to the operator. As an example, multiple alerts can result from the same underlying alert: failure of a server that causes its client machines to crash creates a flood of alerts that take up all of the operator’s attention – even though all of them have a single root cause.
Experiments showed that such an approach reduced alert levels by a factor of 10 — giving the operators the breathing space they need to keep the bank’s critical infrastructure up and running.