Amalur: Data Integration Meets Machine Learning

التفاصيل البيبلوغرافية
العنوان: Amalur: Data Integration Meets Machine Learning
المؤلفون: Hai, Rihan, Koutras, Christos, Ionescu, Andra, Li, Ziyu, Sun, Wenbo, van Schijndel, Jessie, Kang, Yan, Katsifodimos, Asterios
سنة النشر: 2022
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Databases
الوصف: The data needed for machine learning (ML) model training, can reside in different separate sites often termed data silos. For data-intensive ML applications, data silos pose a major challenge: the integration and transformation of data demand a lot of manual work and computational resources. With data privacy and security constraints, data often cannot leave the local sites, and a model has to be trained in a decentralized manner. In this work, we present a vision on how to bridge the traditional data integration (DI) techniques with the requirements of modern machine learning. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness and efficiency of ML models. We analyze two common use cases over data silos, feature augmentation and federated learning. Bringing data integration and machine learning together, we highlight the new research opportunities from the aspects of systems, representations, factorized learning and federated learning.
Comment: Accepted at ICDE2023 -- Special track (Vision)
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2205.09681
رقم الانضمام: edsarx.2205.09681
قاعدة البيانات: arXiv