Search for a command to run...
Summary In large‐scale and distributed actor systems, there are situations where processing messages within one of the actors fails, often due to failures that had occurred earlier in the system. In such cases, tracing down the origin of the failure is difficult since existing monitoring tools only provide ways to collect metrics and statistical information about system execution. In this paper, we describe a new tool for tracing distributed actor systems, Akka Tracing Tool , a library that allows users to generate a trace graph of messages. To address the distributed nature of the environment, we proposed an efficient data collection mechanism based on the one‐way replication technique implemented in CouchDB, a popular document database. The tool was evaluated in a distributed environment of up to 50 nodes set up in the Amazon Web Services (AWS) computing cloud on a real application: car traffic simulation. The measured overhead when tracing all messages was between 39% to 45% on average. The library also proved to be scalable with respect to the number of nodes in the actor system and to be user‐friendly. Owing to these properties, we expect that the tool can simplify finding errors and speed up the development process of actor systems.
Published in: Concurrency and Computation Practice and Experience
Volume 30, Issue 22
DOI: 10.1002/cpe.4637