FashionFAE: Fine-grained Attributes Enhanced Fashion Vision-Language Pre-training

التفاصيل البيبلوغرافية
العنوان:	FashionFAE: Fine-grained Attributes Enhanced Fashion Vision-Language Pre-training
المؤلفون:	Huang, Jiale, Gao, Dehong, Zhang, Jinxia, Zhan, Zechao, Hu, Yang, Wang, Xin
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computer Vision and Pattern Recognition
الوصف:	Large-scale Vision-Language Pre-training (VLP) has demonstrated remarkable success in the general domain. However, in the fashion domain, items are distinguished by fine-grained attributes like texture and material, which are crucial for tasks such as retrieval. Existing models often fail to leverage these fine-grained attributes from both text and image modalities. To address the above issues, we propose a novel approach for the fashion domain, Fine-grained Attributes Enhanced VLP (FashionFAE), which focuses on the detailed characteristics of fashion data. An attribute-emphasized text prediction task is proposed to predict fine-grained attributes of the items. This forces the model to focus on the salient attributes from the text modality. Additionally, a novel attribute-promoted image reconstruction task is proposed, which further enhances the fine-grained ability of the model by leveraging the representative attributes from the image modality. Extensive experiments show that FashionFAE significantly outperforms State-Of-The-Art (SOTA) methods, achieving 2.9% and 5.2% improvements in retrieval on sub-test and full test sets, respectively, and a 1.6% average improvement in recognition tasks. Comment: 5 pages, Accepted by ICASSP2025, full paper
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2412.19997
رقم الانضمام:	edsarx.2412.19997
قاعدة البيانات:	arXiv

View record in Arxiv

الوصف
الوصف غير متاح.