Finding dense substructures in a graph is a fundamental graph mining operation, with applications
in bioinformatics, social networks, and visualization to name a few. Yet most standard
formulations of this problem (like clique, quasi-clique, k-densest subgraph) are NP-hard.
Furthermore, the goal is rarely to find the "true optimum", but to identify many (if not all) dense
substructures, understand their distribution in the graph, and ideally determine
relationships among them. Current dense subgraph finding algorithms usually
optimize some objective, and only find a few such subgraphs without providing any structural
relations.
We define the nucleus decomposition of a graph, which represents the graph
as a forest of nuclei. Each nucleus is a subgraph where smaller cliques
are present in many larger cliques. The forest of nuclei is a hierarchy by containment,
where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei
can have limited intersections, which allows for discovery of overlapping dense subgraphs.
With the right parameters, the nucleus decomposition generalizes the classic notions of k-cores and k-trusses.
We give provable efficient algorithms for nucleus decompositions, and empirically
evaluate their behavior in a variety of real graphs. The tree of nuclei consistently
gives a global, hierarchical snapshot of dense substructures, and
outputs dense subgraphs of higher quality than other state-of-the-art solutions.
Our algorithm can process graphs with tens of millions of edges in less than an hour.
If you use Nucleus Decomposition, please cite:
Latest release: Nucleus-3-6-2015
Dataset: data examples