Version 1
: Received: 24 September 2024 / Approved: 24 September 2024 / Online: 25 September 2024 (17:49:39 CEST)
How to cite:
Hong, A.; Boucher, C. Enhancing Data Compression: Recent Innovations in LZ77 Algorithms. Preprints2024, 2024091877. https://doi.org/10.20944/preprints202409.1877.v1
Hong, A.; Boucher, C. Enhancing Data Compression: Recent Innovations in LZ77 Algorithms. Preprints 2024, 2024091877. https://doi.org/10.20944/preprints202409.1877.v1
Hong, A.; Boucher, C. Enhancing Data Compression: Recent Innovations in LZ77 Algorithms. Preprints2024, 2024091877. https://doi.org/10.20944/preprints202409.1877.v1
APA Style
Hong, A., & Boucher, C. (2024). Enhancing Data Compression: Recent Innovations in LZ77 Algorithms. Preprints. https://doi.org/10.20944/preprints202409.1877.v1
Chicago/Turabian Style
Hong, A. and Christina Boucher. 2024 "Enhancing Data Compression: Recent Innovations in LZ77 Algorithms" Preprints. https://doi.org/10.20944/preprints202409.1877.v1
Abstract
The burgeoning volume of genomic data, fueled by advances in sequencing technologies,
demands efficient data compression solutions. Traditional algorithms like Lempel-Ziv77 (LZ77) have
been foundational in offering lossless compression, yet they often fall short when applied to the
highly repetitive structures typical of genomic sequences. This review delves into the evolution of
LZ77 and its derivatives, exploring specialized algorithms such as prefix-free parsing, AVL grammars,
and LZ-based methods tailored for genomic data. Innovations in this field have led to enhanced
compression ratios and processing efficiencies by leveraging the intrinsic redundancy within genomic
datasets. We critically examine a spectrum of LZ77-based algorithms, including newer adaptations
for external and semi-external memory settings, and contrast their efficacy in managing large-scale
genomic data. Additionally, we discuss the potential of these algorithms to facilitate the construction
of data structures such as compressed suffix trees, crucial for genomic analyses. This paper aims to
provide a comprehensive guide on the current landscape and future directions of data compression
technologies, equipping researchers and practitioners with insights to tackle the escalating data
challenges in genomics and beyond.
Keywords
LZ77; Data compression; Data structures; suffix Array; Burrows Wheeler Transform
Subject
Computer Science and Mathematics, Data Structures, Algorithms and Complexity
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.