Report
Subtle Data Crimes: Naively training machine learning algorithms could lead to overly-optimistic results
العنوان: | Subtle Data Crimes: Naively training machine learning algorithms could lead to overly-optimistic results |
---|---|
المؤلفون: | Shimron, Efrat, Tamir, Jonathan I., Wang, Ke, Lustig, Michael |
سنة النشر: | 2021 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Machine Learning |
الوصف: | While open databases are an important resource in the Deep Learning (DL) era, they are sometimes used "off-label": data published for one task are used for training algorithms for a different one. This work aims to highlight that in some cases, this common practice may lead to biased, overly-optimistic results. We demonstrate this phenomenon for inverse problem solvers and show how their biased performance stems from hidden data preprocessing pipelines. We describe two preprocessing pipelines typical of open-access databases and study their effects on three well-established algorithms developed for Magnetic Resonance Imaging (MRI) reconstruction: Compressed Sensing (CS), Dictionary Learning (DictL), and DL. In this large-scale study we performed extensive computations. Our results demonstrate that the CS, DictL and DL algorithms yield systematically biased results when naively trained on seemingly-appropriate data: the Normalized Root Mean Square Error (NRMSE) improves consistently with the preprocessing extent, showing an artificial increase of 25%-48% in some cases. Since this phenomenon is generally unknown, biased results are sometimes published as state-of-the-art; we refer to that as subtle data crimes. This work hence raises a red flag regarding naive off-label usage of Big Data and reveals the vulnerability of modern inverse problem solvers to the resulting bias. Comment: 16 pages, 7 figures, two tables. Submitted to a journal |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2109.08237 |
رقم الانضمام: | edsarx.2109.08237 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |