Search for a command to run...
Given a large graph, how to generate a compact summary graph that is configurable by the user and supports multiple graph queries with either no loss or with high accuracy ? The ever growing size of graph datasets makes the above question on graph summarization very pertinent. Although, there are several approaches, there does not exist a configurable graph summarization method that offers high compression along with support for multiple graph queries on the summary graph with high accuracy, and allows the user to configure the summarization based on: (1) lossless or lossy summarization, (2) amount of tolerable neighborhood loss, (3) the type of loss it can tolerate, in terms of false positive edges (i.e., extra edges), false negative edges (i.e., missing edges), or neither, in both the (a) reconstructed graph and the (b) query answers. To overcome these limitations, we propose a novel graph summarization framework CGS (Configurable Graph Summarizer) that builds upon the idea of aggregating nodes with common neighborhoods . The CGS framework consists of three summarization variants, CGS-E, CGS-I and CGS-U. While CGS-E is a lossless scheme, CGS-I and CGS-U are lossy schemes that allow reconstruction of the input graph with no false positive edges and no false negative edges, respectively. To bound the graph reconstruction loss, we introduce a user-specified parameter neighborhood loss tolerance threshold , that limits the maximum loss allowed in the neighborhood of each node. This allows graph reconstruction and neighborhood query evaluation with either no loss or with bounded loss guarantees . This, in turn, enables retrieval of multiple graph queries such as shortest path and reachability queries with either no loss or with fairly high accuracy. Empirical evaluation on several synthetic and real-world graphs shows that CGS offers superior summarization than the state-of-the-art methods, and can answer graph queries with fairly high accuracy and efficiency. The implementation code and the datasets are available at https://github.com/sonaelzasimon/CGS_Configurable_Graph_Summarization .