Academic Journal

Co-Inference of Data Mislabelings Reveals Improved Models in Genomics and Breast Cancer Diagnostics

التفاصيل البيبلوغرافية
العنوان: Co-Inference of Data Mislabelings Reveals Improved Models in Genomics and Breast Cancer Diagnostics
المؤلفون: Gerber, Susanne, Pospisil, Lukas, Sys, Stanislav, Hewel, Charlotte, Torkamani, Ali, Horenko, Illia
المصدر: Frontiers in Artificial Intelligence ; volume 4 ; ISSN 2624-8212
بيانات النشر: Frontiers Media SA
سنة النشر: 2022
المجموعة: Frontiers (Publisher - via CrossRef)
الوصف: Mislabeling of cases as well as controls in case–control studies is a frequent source of strong bias in prognostic and diagnostic tests and algorithms. Common data processing methods available to the researchers in the biomedical community do not allow for consistent and robust treatment of labeled data in the situations where both, the case and the control groups, contain a non-negligible proportion of mislabeled data instances. This is an especially prominent issue in studies regarding late-onset conditions, where individuals who may convert to cases may populate the control group, and for screening studies that often have high false-positive/-negative rates. To address this problem, we propose a method for a simultaneous robust inference of Lasso reduced discriminative models and of latent group-specific mislabeling risks, not requiring any exactly labeled data. We apply it to a standard breast cancer imaging dataset and infer the mislabeling probabilities (being rates of false-negative and false-positive core-needle biopsies) together with a small set of simple diagnostic rules, outperforming the state-of-the-art BI-RADS diagnostics on these data. The inferred mislabeling rates for breast cancer biopsies agree with the published purely empirical studies. Applying the method to human genomic data from a healthy-ageing cohort reveals a previously unreported compact combination of single-nucleotide polymorphisms that are strongly associated with a healthy-ageing phenotype for Caucasians. It determines that 7.5 % of Caucasians in the 1000 Genomes dataset (selected as a control group) carry a pattern characteristic of healthy ageing.
نوع الوثيقة: article in journal/newspaper
اللغة: unknown
DOI: 10.3389/frai.2021.739432
DOI: 10.3389/frai.2021.739432/full
الاتاحة: http://dx.doi.org/10.3389/frai.2021.739432
https://www.frontiersin.org/articles/10.3389/frai.2021.739432/full
Rights: https://creativecommons.org/licenses/by/4.0/
رقم الانضمام: edsbas.62E52B43
قاعدة البيانات: BASE
ResultId 1
Header edsbas
BASE
edsbas.62E52B43
938
3
Academic Journal
academicJournal
937.599182128906
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsbas&AN=edsbas.62E52B43&custid=s6537998&authtype=sso
FullText Array ( [Availability] => 0 )
Array ( [0] => Array ( [Url] => http://dx.doi.org/10.3389/frai.2021.739432# [Name] => EDS - BASE [Category] => fullText [Text] => View record in BASE [MouseOverText] => View record in BASE ) )
Items Array ( [Name] => Title [Label] => Title [Group] => Ti [Data] => Co-Inference of Data Mislabelings Reveals Improved Models in Genomics and Breast Cancer Diagnostics )
Array ( [Name] => Author [Label] => Authors [Group] => Au [Data] => <searchLink fieldCode="AR" term="%22Gerber%2C+Susanne%22">Gerber, Susanne</searchLink><br /><searchLink fieldCode="AR" term="%22Pospisil%2C+Lukas%22">Pospisil, Lukas</searchLink><br /><searchLink fieldCode="AR" term="%22Sys%2C+Stanislav%22">Sys, Stanislav</searchLink><br /><searchLink fieldCode="AR" term="%22Hewel%2C+Charlotte%22">Hewel, Charlotte</searchLink><br /><searchLink fieldCode="AR" term="%22Torkamani%2C+Ali%22">Torkamani, Ali</searchLink><br /><searchLink fieldCode="AR" term="%22Horenko%2C+Illia%22">Horenko, Illia</searchLink> )
Array ( [Name] => TitleSource [Label] => Source [Group] => Src [Data] => Frontiers in Artificial Intelligence ; volume 4 ; ISSN 2624-8212 )
Array ( [Name] => Publisher [Label] => Publisher Information [Group] => PubInfo [Data] => Frontiers Media SA )
Array ( [Name] => DatePubCY [Label] => Publication Year [Group] => Date [Data] => 2022 )
Array ( [Name] => Subset [Label] => Collection [Group] => HoldingsInfo [Data] => Frontiers (Publisher - via CrossRef) )
Array ( [Name] => Abstract [Label] => Description [Group] => Ab [Data] => Mislabeling of cases as well as controls in case–control studies is a frequent source of strong bias in prognostic and diagnostic tests and algorithms. Common data processing methods available to the researchers in the biomedical community do not allow for consistent and robust treatment of labeled data in the situations where both, the case and the control groups, contain a non-negligible proportion of mislabeled data instances. This is an especially prominent issue in studies regarding late-onset conditions, where individuals who may convert to cases may populate the control group, and for screening studies that often have high false-positive/-negative rates. To address this problem, we propose a method for a simultaneous robust inference of Lasso reduced discriminative models and of latent group-specific mislabeling risks, not requiring any exactly labeled data. We apply it to a standard breast cancer imaging dataset and infer the mislabeling probabilities (being rates of false-negative and false-positive core-needle biopsies) together with a small set of simple diagnostic rules, outperforming the state-of-the-art BI-RADS diagnostics on these data. The inferred mislabeling rates for breast cancer biopsies agree with the published purely empirical studies. Applying the method to human genomic data from a healthy-ageing cohort reveals a previously unreported compact combination of single-nucleotide polymorphisms that are strongly associated with a healthy-ageing phenotype for Caucasians. It determines that 7.5 % of Caucasians in the 1000 Genomes dataset (selected as a control group) carry a pattern characteristic of healthy ageing. )
Array ( [Name] => TypeDocument [Label] => Document Type [Group] => TypDoc [Data] => article in journal/newspaper )
Array ( [Name] => Language [Label] => Language [Group] => Lang [Data] => unknown )
Array ( [Name] => DOI [Label] => DOI [Group] => ID [Data] => 10.3389/frai.2021.739432 )
Array ( [Name] => DOI [Label] => DOI [Group] => ID [Data] => 10.3389/frai.2021.739432/full )
Array ( [Name] => URL [Label] => Availability [Group] => URL [Data] => http://dx.doi.org/10.3389/frai.2021.739432<br />https://www.frontiersin.org/articles/10.3389/frai.2021.739432/full )
Array ( [Name] => Copyright [Label] => Rights [Group] => Cpyrght [Data] => https://creativecommons.org/licenses/by/4.0/ )
Array ( [Name] => AN [Label] => Accession Number [Group] => ID [Data] => edsbas.62E52B43 )
RecordInfo Array ( [BibEntity] => Array ( [Identifiers] => Array ( [0] => Array ( [Type] => doi [Value] => 10.3389/frai.2021.739432 ) ) [Languages] => Array ( [0] => Array ( [Text] => unknown ) ) [Titles] => Array ( [0] => Array ( [TitleFull] => Co-Inference of Data Mislabelings Reveals Improved Models in Genomics and Breast Cancer Diagnostics [Type] => main ) ) ) [BibRelationships] => Array ( [HasContributorRelationships] => Array ( [0] => Array ( [PersonEntity] => Array ( [Name] => Array ( [NameFull] => Gerber, Susanne ) ) ) [1] => Array ( [PersonEntity] => Array ( [Name] => Array ( [NameFull] => Pospisil, Lukas ) ) ) [2] => Array ( [PersonEntity] => Array ( [Name] => Array ( [NameFull] => Sys, Stanislav ) ) ) [3] => Array ( [PersonEntity] => Array ( [Name] => Array ( [NameFull] => Hewel, Charlotte ) ) ) [4] => Array ( [PersonEntity] => Array ( [Name] => Array ( [NameFull] => Torkamani, Ali ) ) ) [5] => Array ( [PersonEntity] => Array ( [Name] => Array ( [NameFull] => Horenko, Illia ) ) ) ) [IsPartOfRelationships] => Array ( [0] => Array ( [BibEntity] => Array ( [Dates] => Array ( [0] => Array ( [D] => 01 [M] => 01 [Type] => published [Y] => 2022 ) ) [Identifiers] => Array ( [0] => Array ( [Type] => issn-locals [Value] => edsbas ) [1] => Array ( [Type] => issn-locals [Value] => edsbas.oa ) ) [Titles] => Array ( [0] => Array ( [TitleFull] => Frontiers in Artificial Intelligence ; volume 4 ; ISSN 2624-8212 [Type] => main ) ) ) ) ) ) )
IllustrationInfo