Building data curation processes with crowd intelligence

التفاصيل البيبلوغرافية
العنوان: Building data curation processes with crowd intelligence
المؤلفون: Chen, Tianwa, Han, Lei, Demartini, Gianluca, Indulska, Marta, Sadiq, Shazia
بيانات النشر: Springer International Publishing
سنة النشر: 2020
المجموعة: The University of Queensland: UQ eSpace
مصطلحات موضوعية: Data curation, Data quality, Crowd intelligence, 1403 Business and International Management, 1404 Management Information Systems, 1710 Information Systems, 1802 Information Systems and Management, 2207 Control and Systems Engineering, 2611 Modelling and Simulation
الوصف: Data curation processes constitute a number of activities, such as transforming, filtering or de-duplicating data. These processes consume an excessive amount of time in data science projects, due to datasets often being external, re-purposed and generally not ready for analytics. Overall, data curation processes are difficult to automate and require human input, which results in a lack of repeatability and potential errors propagating into analytical results. In this paper, we explore a crowd intelligence-based approach to building robust data curation processes. We study how data workers engage with data curation activities, specifically related to data quality detection, and how to build a robust and effective data curation process by learning from the wisdom of the crowd. With the help of a purpose-designed data curation platform based on iPython Notebook, we conducted a lab experiment with data workers and collected a multi-modal dataset that includes measures of task performance and behaviour data. Our findings identify avenues by which effective data curation processes can be built through crowd intelligence.
نوع الوثيقة: conference object
اللغة: English
تدمد: 1865-1348
1865-1356
Relation: orcid:0000-0002-5135-0313; orcid:0000-0002-7311-3693; orcid:0000-0002-2156-4097; orcid:0000-0001-6739-4145
الاتاحة: https://espace.library.uq.edu.au/view/UQ:f21b940
رقم الانضمام: edsbas.CE80519A
قاعدة البيانات: BASE