Article
Version 1
Preserved in Portico This version is not peer-reviewed
A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations
Version 1
: Received: 22 September 2023 / Approved: 27 September 2023 / Online: 27 September 2023 (15:10:25 CEST)
A peer-reviewed article of this Preprint also exists.
Firmli, S.; Chiadmi, D. A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations. Data 2023, 8, 166. Firmli, S.; Chiadmi, D. A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations. Data 2023, 8, 166.
Abstract
The graph model enables a broad range of analysis, thus graph processing is an invaluable tool in data analytics. At the heart of every graph processing system lies a concurrent graph data structure that stores the graph. Such a data structure needs to be highly efficient for both graph algorithms and queries. Due to the continuous evolution, the sparsity, and the scale-free nature of real-world graphs, graph processing systems face the challenge of providing an appropriate graph data structure that enables both fast analytical workloads and low-memory fast graph mutations. Existing graph structures offer a hard tradeoff between read-only performance, update friendliness, and memory consumption upon updates. In this paper, we introduce CSR++, a new graph data structure that removes these tradeoffs and enables both fast read-only analytics and quick and memory-friendly mutations. CSR++ combines ideas from CSR, the fastest read-only data structure, and adjacency lists to achieve the best of both worlds. We compare CSR++ to CSR, adjacency lists from the Boost Graph Library, as well as state-of-the-art update-friendly graph structures: LLAMA, STINGER, GraphOne, and Teseo. In our evaluation, which is based on popular graph processing algorithms executed over real-world graphs, we show that CSR++ remains close to CSR in read-only concurrent performance (within 10% on average), while significantly outperforming CSR (by an order of magnitude) and LLAMA (by almost 2×) with frequent updates. We also show that both CSR++’s update throughput and analytics performance exceed that of several state-of-the-art graph structures, while maintaining low memory consumption when the workload includes updates.
Keywords
Data Structures; Concurrency; Graph Processing; Graph Mutations
Subject
Computer Science and Mathematics, Data Structures, Algorithms and Complexity
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment