Electronic Resource

Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time

التفاصيل البيبلوغرافية
العنوان: Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time
المؤلفون: Sebastian Schmidt and Jarno N. Alanko, Schmidt, Sebastian, Alanko, Jarno N.
بيانات النشر: Schloss Dagstuhl – Leibniz-Zentrum für Informatik 2022
نوع الوثيقة: Electronic Resource
مستخلص: A fundamental operation in computational genomics is to reduce the input sequences to their constituent k-mers. For maximum performance of downstream applications it is important to store the k-mers in small space, while keeping the representation easy and efficient to use (i.e. without k-mer repetitions and in plain text). Recently, heuristics were presented to compute a near-minimum such representation. We present an algorithm to compute a minimum representation in optimal (linear) time and use it to evaluate the existing heuristics. For that, we present a formalisation of arc-centric bidirected de Bruijn graphs and carefully prove that it accurately models the k-mer spectrum of the input. Our algorithm first constructs the de Bruijn graph in linear time in the length of the input strings (for a fixed-size alphabet). Then it uses a Eulerian-cycle-based algorithm to compute the minimum representation, in time linear in the size of the output.
مصطلحات الفهرس: Spectrum preserving string sets, Eulerian cycle, Suffix tree, Bidirected arc-centric de Bruijn graph, k-mer based methods, InProceedings, Text, doc-type:ResearchArticle, publishedVersion
DOI: 10.4230.LIPIcs.WABI.2022.2
URL: https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.2
Is Part Of LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)
الاتاحة: Open access content. Open access content
https://creativecommons.org/licenses/by/4.0/legalcode
ملاحظة: application/pdf
English
Other Numbers: DEDAG oai:drops-oai.dagstuhl.de:17036
doi:10.4230/LIPIcs.WABI.2022.2
urn:nbn:de:0030-drops-170361
1358732491
المصدر المساهم: SCHLOSS DAGSTUHL LEIBNIZ ZENTRUM GMBH
From OAIster®, provided by the OCLC Cooperative.
رقم الانضمام: edsoai.on1358732491
قاعدة البيانات: OAIster
الوصف
DOI:10.4230.LIPIcs.WABI.2022.2