Electronic Resource

Compression Algorithm for Colored de Bruijn Graphs

التفاصيل البيبلوغرافية
العنوان: Compression Algorithm for Colored de Bruijn Graphs
المؤلفون: Amatur Rahman and Yoann Dufresne and Paul Medvedev, Rahman, Amatur, Dufresne, Yoann, Medvedev, Paul
بيانات النشر: Schloss Dagstuhl – Leibniz-Zentrum für Informatik 2023
نوع الوثيقة: Electronic Resource
مستخلص: A colored de Bruijn graph (also called a set of k-mer sets), is a set of k-mers with every k-mer assigned a set of colors. Colored de Bruijn graphs are used in a variety of applications, including variant calling, genome assembly, and database search. However, their size has posed a scalability challenge to algorithm developers and users. There have been numerous indexing data structures proposed that allow to store the graph compactly while supporting fast query operations. However, disk compression algorithms, which do not need to support queries on the compressed data and can thus be more space-efficient, have received little attention. The dearth of specialized compression tools has been a detriment to tool developers, tool users, and reproducibility efforts. In this paper, we develop a new tool that compresses colored de Bruijn graphs to disk, building on previous ideas for compression of k-mer sets and indexing colored de Bruijn graphs. We test our tool, called ESS-color, on various datasets, including both sequencing data and whole genomes. ESS-color achieves better compression than all evaluated tools and all datasets, with no other tool able to consistently achieve less than 44% space overhead.
مصطلحات الفهرس: colored de Bruijn graphs, disk compression, k-mer sets, simplitigs, spectrum-preserving string sets, InProceedings, Text, doc-type:ResearchArticle, publishedVersion
DOI: 10.4230.LIPIcs.WABI.2023.17
URL: https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.17
Is Part Of LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)
الاتاحة: Open access content. Open access content
https://creativecommons.org/licenses/by/4.0/legalcode
ملاحظة: application/pdf
English
Other Numbers: DEDAG oai:drops-oai.dagstuhl.de:18643
doi:10.4230/LIPIcs.WABI.2023.17
urn:nbn:de:0030-drops-186434
1402193350
المصدر المساهم: SCHLOSS DAGSTUHL LEIBNIZ ZENTRUM GMBH
From OAIster®, provided by the OCLC Cooperative.
رقم الانضمام: edsoai.on1402193350
قاعدة البيانات: OAIster
الوصف
DOI:10.4230.LIPIcs.WABI.2023.17