Complex software intensive systems, especially distributed systems,
generate logs for troubleshooting. The logs are text messages recording
system events, which can help engineers determine the system’s runtime
status. This paper proposes a novel approach named ADR (stands for
Anomaly Detection by workflow Relations) that employs matrix nullspace
to mine numerical relations from log data. The mined relations can be
used for both offline and online anomaly detection and facilitate fault
diagnosis. We have evaluated ADR on log data collected from two
distributed systems, HDFS (Hadoop Distributed File System) and BGL (IBM
Blue Gene/L supercomputers system). ADR successfully mined 87 and 669
numerical relations from the logs and used them to detect anomalies with
high precision and recall. For online anomaly detection, ADR employs PSO
(Particle Swarm Optimization) to find the optimal sliding windows’ size
and achieves fast anomaly detection.
The experimental results confirm that ADR is effective for both offline
and online anomaly detection.