ADmM: Anomaly Detection for Microservice Systems with Incomplete Metrics

20260 citationsJournal Articlehybrid Open Access

Authors

Zekun Zhang · Wuhan University

Jian Wang · Wuhan University

Bing Li · Wuhan University

Liuxiaoxiao Zhang · AIR Worldwide (United States)

Yina Liu · Wuhan University

Patrick C. K. Hung · Ontario Tech University

Abstract

The rapid development of the internet has led to an exponential increase in the scale of computing, storage, networking, and service resources. Traditional monolithic architectures are increasingly insufficient for managing these complexities. In contrast, microservice architectures have emerged as the mainstream solution with their inherent flexibility in deployment and scalability. To ensure system reliability, modern microservice architectures rely heavily on observability data, including logs, metrics, and traces. However, challenges such as network instability, service instance restarts, and system overloads frequently lead to intermittent loss of metric data. These missing data points impede comprehensive assessments of system health, significantly threatening system stability and reliability. To address the above challenge, we propose an anomaly detection model, ADmM, which integrates logs, metrics, and traces. ADmM first extracts template-level and semantic-level features from multimodal inputs. Then, a multi-scale autoencoder module is applied to impute missing metrics. For anomaly detection, the model represents microservice dependencies as a directed acyclic graph and leverages a graph neural network to learn generative patterns from normal system behavior. By measuring the deviation between observed values and reconstructed values, ADmM assigns anomaly scores to identify anomalies. Experiments conducted on three open-source benchmarks demonstrate that ADmM outperforms state-of-the-art methods across multiple anomaly detection metrics. Notably, it achieves F1-Score improvements of 5.77%, 5.48%, and 2.16% in scenarios with 40% incomplete metrics.

Topics & Keywords

Software System Performance and Reliability Network Security and Intrusion Detection Cloud Computing and Resource Management

UN Sustainable Development Goals

Industry, innovation and infrastructure

Publication Details

Published in: ACM Transactions on the Web

DOI: 10.1145/3801729

Field-Weighted Citation Impact: 0.00