Academic Journal

How to Host An Effective Data Competition: Statistical Advice for Competition Design and Analysis.

التفاصيل البيبلوغرافية
العنوان: How to Host An Effective Data Competition: Statistical Advice for Competition Design and Analysis.
المؤلفون: Anderson‐Cook, Christine M., Myers, Kary L., Lu, Lu, Fugate, Michael L., Quinlan, Kevin R., Pawley, Norma
المصدر: Statistical Analysis & Data Mining; Aug2019, Vol. 12 Issue 4, p271-289, 19p
مصطلحات موضوعية: DESIGN competitions, STATISTICS, SUPERVISED learning, RADIOACTIVE substances, URBAN ecology (Sociology), COMPETITION (Biology), COMPETITION (Psychology)
مستخلص: Data competitions rely on real‐time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a postcompetition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well‐solved. The postcompetition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual subquestions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment. [ABSTRACT FROM AUTHOR]
Copyright of Statistical Analysis & Data Mining is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Complementary Index
الوصف
تدمد:19321864
DOI:10.1002/sam.11404