Academic Journal

Connecting firm's web scraped textual content to body of science:Utilizing microsoft academic graph hierarchical topic modeling

التفاصيل البيبلوغرافية
العنوان: Connecting firm's web scraped textual content to body of science:Utilizing microsoft academic graph hierarchical topic modeling
المؤلفون: Hajikhani, Arash, Suominen, Arho, Ashouri, Sajad, Pukelis, Lukas, Schubert, Torben, Notten, Ad, Cunningham, Scott
المصدر: Hajikhani , A , Suominen , A , Ashouri , S , Pukelis , L , Schubert , T , Notten , A & Cunningham , S 2022 , ' Connecting firm's web scraped textual content to body of science : Utilizing microsoft academic graph hierarchical topic modeling ' , MethodsX , vol. 9 , 101650 . https://doi.org/10.1016/j.mex.2022.101650
سنة النشر: 2022
المجموعة: Maastricht University Research Publications
مصطلحات موضوعية: atira/keywords/jel_classifications/o32, o32 - Management of Technological Innovation and R&D, atira/keywords/jel_classifications/o31, o31 - Innovation and Invention: Processes and Incentives, atira/keywords/jel_classifications/o34, o34 - Intellectual Property Rights, Natural language processing, Economic classification scheme, Knowledge transformation, Web scraping
الوصف: This paper demonstrates a method to transform and link textual information scraped from companies' websites to the scientific body of knowledge. The method illustrates the benefit of Natural Language Processing (NLP) in creating links between established economic classification systems with novel and agile constructs that new data sources enable. Therefore, we experimented on the European classification of economic activities (known as NACE) on sectoral and company levels. We established a connection with Microsoft Academic Graph hierarchical topic modeling based on companies' website content. Central to the operationalization of our method are a web scraping process, NLP and a data transformation/linkage procedure.
نوع الوثيقة: article in journal/newspaper
اللغة: English
DOI: 10.1016/j.mex.2022.101650
الاتاحة: https://cris.maastrichtuniversity.nl/en/publications/dded6a30-af5e-4794-9fb8-6983836fa1f9
https://doi.org/10.1016/j.mex.2022.101650
Rights: info:eu-repo/semantics/openAccess
رقم الانضمام: edsbas.6375AB5B
قاعدة البيانات: BASE
الوصف
DOI:10.1016/j.mex.2022.101650