TDAlab @ GA Tech.

Software

S³G²: A Scalable Shell Sequence Graph Generator

General information

Graphs are commonly used to model the relationships between various entities. These graphs can be enormously large and thus, scalable graph analysis has been the subject of many research efforts. However, the lack of publicly available data is a major setback. Many researchers have since focused their efforts to generate realistic synthetic graph generators to alleviate this problem and significant progress has been made on scalable graph generation that preserve some important graph properties (e.g., degree distribution, clustering coefficients). In this work, we study how to sample a graph from the space of graphs with a given shell distribution. Shell distribution is related to the k-core, which is the largest subgraph where each vertex is connected to at least k other vertices. A k-shell is the subset of vertices that are in k-core but not (k+1)-core, and the shell distribution comprises the sizes of these shells. Core decompositions are widely used to extract information from graphs and to speed up other computations. We present a scalable graph generator that, given a shell decomposition of a graph, generates a random graph that conforms to it.

S³G² is a software package, implementing a sequential, shared memory parallel, and a distributed memory version of a Shell Sequence Graph Generator. The main sequential idea is proposed by [Karwa et al.’17]. We focus on preserving the statistical properties of the sequential implementation [Karwa et al.’17]. The approaches are ultimately the same, aside from the intermediate steps to reach there, thus, any claim made about the graphs generated by [Karwa et al.’17] applies to the implementations in this package as well. The software package generates binary CSR format graphs with directed edges. The MPI implementation can also generate directed edge lists per processor. These can be concatenated and reported. with the bash script runAndMerge.sh.

The S³G² consists of 3 parts. Sequential is a new C++ implementation of [Karwa et al.’17] with some minor optimizations.

Shmem has the implementation for shared memory NUMA systems. It uses C++ Threads library and some utilities from C++17. It has been tested with G++-8.3 and G++-9.2.

MPI has the distributed memory implementation. Has been successfully compiled and run with Intel MPI, OpenMPI-3/4, and MVAPICH-2.2/2.3

The library relies on some open-source and public packages:

parallel_hashmap
SPart (directly embedded to the source folder at the time of usage).

[Karwa et al.’17] V. Karwa, M. J. Pelsmajer, S. Petrovic ́, D. Stasi, and D. Wilburne, “Statistical models for cores decomposition of an undirected random graph,” Electron. J. Statist., vol. 11, no. 1, pp. 1949–1982, 2017.

If you use S³G², please cite:

M. Yusuf Özkaya, M. Fatih Balin, Ali Pinar, Ümit V. Çatalyürek,
"A scalable graph generation algorithm to sample over a given shell distribution",
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Workshop on Graphs, Architectures, Programming, and Learning, May 2020. to appear .

Download

Latest release: S3G2 (Updated 03/02/2020)

If you have any questions or comments, please contact TDALab.

Software

S3G2: A Scalable Shell Sequence Graph Generator

General information

Download

S³G²: A Scalable Shell Sequence Graph Generator