Academic Journal

Tabular deep learning: a comparative study applied to multi-task genome-wide prediction

التفاصيل البيبلوغرافية
العنوان: Tabular deep learning: a comparative study applied to multi-task genome-wide prediction
المؤلفون: Yuhua Fan, Patrik Waldmann
المصدر: BMC Bioinformatics, Vol 25, Iss 1, Pp 1-20 (2024)
بيانات النشر: BMC, 2024.
سنة النشر: 2024
المجموعة: LCC:Computer applications to medicine. Medical informatics
LCC:Biology (General)
مصطلحات موضوعية: Tabular data, Multi-trait, Genome-wide prediction (GWP), Non-linear models, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
الوصف: Abstract Purpose More accurate prediction of phenotype traits can increase the success of genomic selection in both plant and animal breeding studies and provide more reliable disease risk prediction in humans. Traditional approaches typically use regression models based on linear assumptions between the genetic markers and the traits of interest. Non-linear models have been considered as an alternative tool for modeling genomic interactions (i.e. non-additive effects) and other subtle non-linear patterns between markers and phenotype. Deep learning has become a state-of-the-art non-linear prediction method for sound, image and language data. However, genomic data is better represented in a tabular format. The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports successful results on various datasets. Tabular deep learning applications in genome-wide prediction (GWP) are still rare. In this work, we perform an overview of the main families of recent deep learning architectures for tabular data and apply them to multi-trait regression and multi-class classification for GWP on real gene datasets. Methods The study involves an extensive overview of recent deep learning architectures for tabular data learning: NODE, TabNet, TabR, TabTransformer, FT-Transformer, AutoInt, GANDALF, SAINT and LassoNet. These architectures are applied to multi-trait GWP. Comprehensive benchmarks of various tabular deep learning methods are conducted to identify best practices and determine their effectiveness compared to traditional methods. Results Extensive experimental results on several genomic datasets (three for multi-trait regression and two for multi-class classification) highlight LassoNet as a standout performer, surpassing both other tabular deep learning models and the highly efficient tree based LightGBM method in terms of both best prediction accuracy and computing efficiency. Conclusion Through series of evaluations on real-world genomic datasets, the study identifies LassoNet as a standout performer, surpassing decision tree methods like LightGBM and other tabular deep learning architectures in terms of both predictive accuracy and computing efficiency. Moreover, the inherent variable selection property of LassoNet provides a systematic way to find important genetic markers that contribute to phenotype expression.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1471-2105
Relation: https://doaj.org/toc/1471-2105
DOI: 10.1186/s12859-024-05940-1
URL الوصول: https://doaj.org/article/5bb474cef61943dcb602bd7f1aae00de
رقم الانضمام: edsdoj.5bb474cef61943dcb602bd7f1aae00de
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:14712105
DOI:10.1186/s12859-024-05940-1