Dissertation/ Thesis
Using Exploratory Data Analysis and Support Vector Machine to Build Media Classifiers on Sport News
العنوان: | Using Exploratory Data Analysis and Support Vector Machine to Build Media Classifiers on Sport News |
---|---|
Alternate Title: | 運用資料探勘及支持向量機建立運動新聞媒體分類器 |
المؤلفون: | Chu, Cheng-Wei, 褚承威 |
Thesis Advisors: | 薛慧敏 |
سنة النشر: | 2018 |
المجموعة: | National Digital Library of Theses and Dissertations in Taiwan |
الوصف: | 106 News is a report which show a situation of a problem, event or process at that time. In the past, newspapers are the most common media for spreading news. As the Internet and social media grow rapidly, people’s habits have changed. Nowadays, a majority of people prefers to read digital news instead of news in paper. This study aims to develop a classifier of digital news to predict the newspaper publisher of the news. Over four thousands news articles of sport category published by the four major Taiwanese newspapers: United Daily News, Apple Daily, China Times, Liberty Times, in December, 2017, are collected as training data. Commonly every item of digital news is formed by a title, text content and photos. Hence, the first and the essential step of the analysis is input variable (feature) quantification from available information of news. Moreover, to explore the routine of every newspaper and to improve the computational efficiency, an initial exploratory data analysis (EDA) on the input variables is conducted and relative important variables are selected for classifier development. For the text data, the term frequency-inverse document frequency (TF-IDF) is applied for a keywords selection method. Then, we use these selected variables to build newspaper classifiers by support vector machine (SVM). In our study, we find that a simple classifier based on 19 non-text input variables can achieve a high accuracy. Among them, the image dimensions are the most critical variables. On the other hand, when only considering text information, we observe that few text variables can have excellent classification results. |
Original Identifier: | 106NCCU5337017 |
نوع الوثيقة: | 學位論文 ; thesis |
وصف الملف: | 38 |
الاتاحة: | http://ndltd.ncl.edu.tw/handle/aw43vv |
رقم الانضمام: | edsndl.TW.106NCCU5337017 |
قاعدة البيانات: | Networked Digital Library of Theses & Dissertations |
الوصف غير متاح. |