Academic Journal

Solving Data Imbalance in Text Classification With Constructing Contrastive Samples

التفاصيل البيبلوغرافية
العنوان: Solving Data Imbalance in Text Classification With Constructing Contrastive Samples
المؤلفون: Xi Chen, Wei Zhang, Shuai Pan, Jiayin Chen
المصدر: IEEE Access, Vol 11, Pp 90554-90562 (2023)
بيانات النشر: IEEE, 2023.
سنة النشر: 2023
المجموعة: LCC:Electrical engineering. Electronics. Nuclear engineering
مصطلحات موضوعية: Data imbalance, contrastive learning, data augmentation, hard negative samples, text classification, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
الوصف: Contrastive learning (CL) has been successfully applied in Natural Language Processing (NLP) as a powerful representation learning method and has shown promising results in various downstream tasks. Recent research has highlighted the importance of constructing effective contrastive samples through data augmentation. However, current data augmentation methods primarily rely on random word deletion, substitution, and cropping, which may introduce noisy samples and hinder representation learning. In this article, we propose a novel approach to address data imbalance in text classification by constructing contrastive samples. Our method involves the use of a Label-indicative Component to generate high-quality positive samples for the minority class, along with the introduction of a Hard Negative Mixing strategy to synthesize challenging negative samples at the feature level. By applying supervised contrastive learning to these samples, we are able to obtain superior text representations, which significantly benefit text classification tasks with imbalanced data. Our approach effectively mitigates distributional biases and promotes noise-resistant representation learning. To validate the effectiveness of our method, we conducted experiments on benchmark datasets (THUCNews, AG’s News, 20NG) as well as the imbalanced FDCNews dataset. The code for our method is publicly available at the following GitHub repository: https://github.com/hanggun/CLDMTC.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2169-3536
Relation: https://ieeexplore.ieee.org/document/10225302/; https://doaj.org/toc/2169-3536
DOI: 10.1109/ACCESS.2023.3306805
URL الوصول: https://doaj.org/article/5c96aa58ba084e5fa1400fae0ceb4c81
رقم الانضمام: edsdoj.5c96aa58ba084e5fa1400fae0ceb4c81
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:21693536
DOI:10.1109/ACCESS.2023.3306805