Dissertation/ Thesis

Structured Data Processing on MapReduce in NoSQL Database

التفاصيل البيبلوغرافية
العنوان: Structured Data Processing on MapReduce in NoSQL Database
Alternate Title: MapReduce於非關聯式資料庫之結構化資料處理
المؤلفون: Lin, Hung-Pin, 林弘斌
Thesis Advisors: Chung, Yeh-Ching, 鍾葉青
سنة النشر: 2012
المجموعة: National Digital Library of Theses and Dissertations in Taiwan
الوصف: 100
As the rapidly data exploration in recent years, data store and processing are getting more attentions to extract the important information. To find a scalable solution to process the large scale data is a critical issue in either the relational data base system or the emerging NoSQL database. Since Google published some techniques they have successfully operated in their corporation, a great impact was given on the literature of distributed data store and processing such that a brand new paradigm was step forwarded; so-called Cloud Computing. MapReduce is one of the critical techniques to process the massive data in parallel. With the inherent scalability and fault-tolerance, MapReduce is attractive to the large-scale data processing. Using MapReduce to support the SQL or SQL-like queries has been presented in several studies. Most of the previous works focus on the Hadoop distributed file system. However, from the view point of some enterprises, the data resided in a database may be frequently changed as the update occurs. Accordingly, we need a flexible data store as Bigtable or HBase not only to place the data over a scale-out storage system, but also to manipulate the changeable data in a transparent way. In this thesis, we propose a systematical method using MapReduce for the structured data processing in NoSQL database. We exploit the HBase as the underlying NoSQL database to analyze some major manipulation languages of the ANSI SQL and provide the corresponding queries to manipulate the data residing in the NoSQL database. To organize the data with less complexity, we also introduce a remapping strategy to translate the data model from the relational database to the NoSQL database. Experimental results show that our approaches can outperform the conventional approach in terms of the efficiency and the scalability in large scale data sets.
Original Identifier: 100NTHU5392106
نوع الوثيقة: 學位論文 ; thesis
وصف الملف: 26
الاتاحة: http://ndltd.ncl.edu.tw/handle/58850616709254764470
رقم الانضمام: edsndl.TW.100NTHU5392106
قاعدة البيانات: Networked Digital Library of Theses & Dissertations