Search for a command to run...
The neighbourhood function N(t) of a graph G gives, for each t, the number of\npairs of nodes <x, y> such that y is reachable from x in less that t hops. The\nneighbourhood function provides a wealth of information about the graph (e.g.,\nit easily allows one to compute its diameter), but it is very expensive to\ncompute it exactly. Recently, the ANF algorithm (approximate neighbourhood\nfunction) has been proposed with the purpose of approximating NG(t) on large\ngraphs. We describe a breakthrough improvement over ANF in terms of speed and\nscalability. Our algorithm, called HyperANF, uses the new HyperLogLog counters\nand combines them efficiently through broadword programming; our implementation\nuses overdecomposition to exploit multi-core parallelism. With HyperANF, for\nthe first time we can compute in a few hours the neighbourhood function of\ngraphs with billions of nodes with a small error and good confidence using a\nstandard workstation. Then, we turn to the study of the distribution of the\nshortest paths between reachable nodes (that can be efficiently approximated by\nmeans of HyperANF), and discover the surprising fact that its index of\ndispersion provides a clear-cut characterisation of proper social networks vs.\nweb graphs. We thus propose the spid (Shortest-Paths Index of Dispersion) of a\ngraph as a new, informative statistics that is able to discriminate between the\nabove two types of graphs. We believe this is the first proposal of a\nsignificant new non-local structural index for complex networks whose\ncomputation is highly scalable.\n