Academic Journal

Compact full-text indexing . . .

التفاصيل البيبلوغرافية
العنوان: Compact full-text indexing . . .
المؤلفون: Jinru He, Hao Yan, Torsten Suel
المساهمون: The Pennsylvania State University CiteSeerX Archives
المصدر: http://cis.poly.edu/~suel/papers/archive.pdf.
سنة النشر: 2009
المجموعة: CiteSeerX
مصطلحات موضوعية: search engines, inverted index compression, versioned documents
الوصف: We study the problem of creating highly compressed fulltext index structures for versioned document collections, that is, collections that contain multiple versions of each document. Important examples of such collections are Wikipedia or the web page archive maintained by the Internet Archive. A straightforward indexing approach would simply treat each document version as a separate document, such that index size scales linearly with the number of versions. However, several authors have recently studied approaches that exploit the significant similarities between different versions of the same document to obtain much smaller index sizes. In this paper, we propose new techniques for organizing and compressing inverted index structures for such collections. We also perform a detailed experimental comparison of new techniques and the existing techniques in the literature. Our results on an archive of the English version of Wikipedia, and on a subset of the Internet Archive collection, show significant benefits over previous approaches.
نوع الوثيقة: text
وصف الملف: application/pdf
اللغة: English
Relation: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.158.2944; http://cis.poly.edu/~suel/papers/archive.pdf
الاتاحة: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.158.2944
http://cis.poly.edu/~suel/papers/archive.pdf
Rights: Metadata may be used without restrictions as long as the oai identifier remains attached to it.
رقم الانضمام: edsbas.1B5CB7AB
قاعدة البيانات: BASE