Report
Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles
العنوان: | Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles |
---|---|
المؤلفون: | Fiosina, Jelena, Fiosins, Maksims, Bonn, Stefan |
المصدر: | Lecture Notes in Computer Science, 11490 (2019) |
سنة النشر: | 2019 |
المجموعة: | Computer Science Quantitative Biology |
مصطلحات موضوعية: | Quantitative Biology - Genomics, Computer Science - Machine Learning, Quantitative Biology - Quantitative Methods |
الوصف: | The lack of well-structured annotations in a growing amount of RNA expression data complicates data interoperability and reusability. Commonly - used text mining methods extract annotations from existing unstructured data descriptions and often provide inaccurate output that requires manual curation. Automatic data-based augmentation (generation of annotations on the base of expression data) can considerably improve the annotation quality and has not been well-studied. We formulate an automatic augmentation of small RNA-seq expression data as a classification problem and investigate deep learning (DL) and random forest (RF) approaches to solve it. We generate tissue and sex annotations from small RNA-seq expression data for tissues and cell lines of homo sapiens. We validate our approach on 4243 annotated small RNA-seq samples from the Small RNA Expression Atlas (SEA) database. The average prediction accuracy for tissue groups is 98% (DL), for tissues - 96.5% (DL), and for sex - 77% (DL). The "one dataset out" average accuracy for tissue group prediction is 83% (DL) and 59% (RF). On average, DL provides better results as compared to RF, and considerably improves classification performance for 'unseen' datasets. |
نوع الوثيقة: | Working Paper |
DOI: | 10.1007/978-3-030-20242-2_14 |
URL الوصول: | http://arxiv.org/abs/1909.11943 |
رقم الانضمام: | edsarx.1909.11943 |
قاعدة البيانات: | arXiv |
DOI: | 10.1007/978-3-030-20242-2_14 |
---|