Search for a command to run...
Subgraph matching is a core primitive across a number of disciplines, ranging from data mining, databases, information retrieval, computer vision to natural language processing. Despite decades of efforts, it is still highly challenging to balance between the matching accuracy and the computational efficiency, especially when the query graph and/or the data graph are large. In this paper, we propose an index-based algorithm (G-FINDER) to find the top-k approximate matching subgraphs. At the heart of the proposed algorithm are two techniques, including (1) a novel auxiliary data structure (LOOKUP-TABLE) in conjunction with a neighborhood expansion method to effectively and efficiently index candidate vertices, and (2) a dynamic filtering and refinement strategy to prune the false candidates at an early stage. The proposed G-FINDER bears some distinctive features, including (1) generality, being able to handle different types of inexact matching (e.g., missing nodes, missing edges, intermediate vertices) on node attributed and/or edge attributed graphs or multigraphs; (2) effectiveness, achieving up to 30% Fl-Score improvement over the best known competitor; and (3) efficiency, scaling near-linearly w.r.t. the size of the data graph as well as the query graph.