Academic Journal

To crop or not to crop: Comparing whole‐image and cropped classification on a large dataset of camera trap images.

التفاصيل البيبلوغرافية
العنوان: To crop or not to crop: Comparing whole‐image and cropped classification on a large dataset of camera trap images.
المؤلفون: Gadot, Tomer1 (AUTHOR) tomerg@google.com, Istrate, Ștefan1 (AUTHOR), Kim, Hyungwon1 (AUTHOR), Morris, Dan1 (AUTHOR), Beery, Sara1,2 (AUTHOR), Birch, Tanya1 (AUTHOR), Ahumada, Jorge3 (AUTHOR)
المصدر: IET Computer Vision (Wiley-Blackwell). Dec2024, Vol. 18 Issue 8, p1193-1208. 16p.
مصطلحات موضوعية: OBJECT recognition (Computer vision), IMAGE recognition (Computer vision), IMAGE converters, IMAGE analysis, COMPUTER vision
مستخلص: Camera traps facilitate non‐invasive wildlife monitoring, but their widespread adoption has created a data processing bottleneck: a camera trap survey can create millions of images, and the labour required to review those images strains the resources of conservation organisations. AI is a promising approach for accelerating image review, but AI tools for camera trap data are imperfect; in particular, classifying small animals remains difficult, and accuracy falls off outside the ecosystems in which a model was trained. It has been proposed that incorporating an object detector into an image analysis pipeline may help address these challenges, but the benefit of object detection has not been systematically evaluated in the literature. In this work, the authors assess the hypothesis that classifying animals cropped from camera trap images using a species‐agnostic detector yields better accuracy than classifying whole images. We find that incorporating an object detection stage into an image classification pipeline yields a macro‐average F1 improvement of around 25% on a large, long‐tailed dataset; this improvement is reproducible on a large public dataset and a smaller public benchmark dataset. The authors describe a classification architecture that performs well for both whole and detector‐cropped images, and demonstrate that this architecture yields state‐of‐the‐art benchmark accuracy. [ABSTRACT FROM AUTHOR]
Copyright of IET Computer Vision (Wiley-Blackwell) is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Business Source Index
الوصف
تدمد:17519632
DOI:10.1049/cvi2.12318