Academic Journal

Contamination Survey of Insect Genomic and Transcriptomic Data.

التفاصيل البيبلوغرافية
العنوان: Contamination Survey of Insect Genomic and Transcriptomic Data.
المؤلفون: Zhou, Jiali1 (AUTHOR) 15395870850@163.com, Zhang, Xinrui1 (AUTHOR), Wang, Yujie1 (AUTHOR), Liang, Haoxian1 (AUTHOR), Yang, Yuhao1 (AUTHOR), Huang, Xiaolei1 (AUTHOR) huangxl@fafu.edu.cn, Deng, Jun1 (AUTHOR) huangxl@fafu.edu.cn
المصدر: Animals (2076-2615). Dec2024, Vol. 14 Issue 23, p3432. 10p.
مصطلحات موضوعية: *INSECT surveys, *NUCLEOTIDE sequencing, *DATABASES, *INSECTS, *HYMENOPTERA
مستخلص: Simple Summary: The ignorance of data quality such as data contamination will cause incorrect conclusions and misdirection. Insects are the most diverse group of animals, and the data are increasing rapidly. Although some researchers are aware of the existence of contamination, they mainly detect contamination for individual species and lack systematic evaluation of Insecta data. Here, this study highlights the serious issue of contamination in public databases and emphasizes the importance of verifying data quality before researchers re-use them. The rapid advancement of high-throughput sequencing has led to a great increase in sequencing data, resulting in a significant accumulation of contamination, for example, sequences from non-target species may be present in the target species' sequencing data. Insecta, the most diverse group within Arthropoda, still lacks a comprehensive evaluation of contamination prevalence in public databases and an analysis of potential contamination causes. In this study, COI barcodes were used to investigate contamination from insects and mammals in GenBank's genomic and transcriptomic data across four insect orders. Among the 2796 WGS and 1382 TSA assemblies analyzed, contamination was detected in 32 (1.14%) WGS and 152 (11.0%) TSA assemblies. Key findings from this study include the following: (1) TSA data exhibited more severe contamination than WGS data; (2) contamination levels varied significantly among the four orders, with Hemiptera showing 9.22%, Coleoptera 3.48%, Hymenoptera 7.66%, and Diptera 1.89% contamination rates; (3) possible causes of contamination, such as food, parasitism, sample collection, and cross-contamination, were analyzed. Overall, this study proposes a workflow for checking the existence of contamination in WGS and TSA data and some suggestions to mitigate it. [ABSTRACT FROM AUTHOR]
قاعدة البيانات: Academic Search Index
الوصف
تدمد:20762615
DOI:10.3390/ani14233432