Electronic Resource

Three Methods for Occupation Coding Based on Statistical Learning

التفاصيل البيبلوغرافية
العنوان: Three Methods for Occupation Coding Based on Statistical Learning
المؤلفون: Gweon, Hyukjun, Schonlau, Matthias, Kaczmirek, Lars, Blohm, Michael, Steiner, Stefan
المصدر: Journal of Official Statistics; 33; 1; 101-122
بيانات النشر: DEU 2019-02-28T09:53:50Z 2019-02-28T09:53:50Z 2017
نوع الوثيقة: Electronic Resource
مستخلص: Occupation coding, an important task in official statistics, refers to coding a respondent's text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS), we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining) is preferable to one based on exact string matches.
مصطلحات الفهرس: Sozialwissenschaften, Soziologie, Social sciences, sociology, anthropology, Automated coding; Machine learning; ISCO-88, Erhebungstechniken und Analysetechniken der Sozialwissenschaften, Methods and Techniques of Data Collection and Data Analysis, Statistical Methods, Computer Methods, official statistics, ALLBUS, occupation, algorithm, method, coding, Codierung, Beruf, Algorithmus, amtliche Statistik, Methode, journal article, Zeitschriftenartikel
URL: https://www.ssoar.info/ssoar/handle/document/61576
https://doi.org/10.1515/JOS-2017-0006
الاتاحة: Open access content. Open access content
Creative Commons - Attribution-Noncommercial-No Derivative Works 4.0
Creative Commons - Namensnennung, Nicht kommerz., Keine Bearbeitung 4.0
Other Numbers: DEGES oai:gesis.izsoz.de:document/61576
2001-7367
1256789892
المصدر المساهم: LEIBNIZ INST FOR THE SOCIAL SCIS GESIS
From OAIster®, provided by the OCLC Cooperative.
رقم الانضمام: edsoai.on1256789892
قاعدة البيانات: OAIster