التفاصيل البيبلوغرافية
العنوان: |
GRNet: a graph reasoning network for enhanced multi-modal learning in scene text recognition. |
المؤلفون: |
Jia, Zeguang1 (AUTHOR), Wang, Jianming2 (AUTHOR), Jin, Rize3 (AUTHOR) |
المصدر: |
Computer Journal. Dec2024, Vol. 67 Issue 12, p3239-3250. 12p. |
مصطلحات موضوعية: |
*LANGUAGE models, *TEXT recognition, *SEMANTICS, *RECOGNITION (Psychology), *FORECASTING |
مستخلص: |
Recent advancements in scene text recognition have predominantly focused on leveraging textual semantics. However, an over-reliance on linguistic priors can impede a model's ability to handle irregular text scenes, including non-standard word usage, occlusions, severe distortions, or stretching. The key challenges lie in effectively localizing occlusions, perceiving multi-scale text, and inferring text based on scene context. To address these challenges and enhance visual capabilities, we introduce the Graph Reasoning Model (GRM). The GRM employs a novel feature fusion method to align spatial context information across different scales, beginning with a feature aggregation stage that extracts rich spatial contextual information from various feature maps. Visual reasoning representations are then obtained through graph convolution. We integrate the GRM module with a language model to form a two-stream architecture called GRNet. This architecture combines pure visual predictions with joint visual-linguistic predictions to produce the final recognition results. Additionally, we propose a dynamic iteration refinement for the language model to prevent over-correction of prediction results, ensuring a balanced contribution from both visual and linguistic cues. Extensive experiments demonstrate that GRNet achieves state-of-the-art average recognition accuracy across six mainstream benchmarks. These results highlight the efficacy of our multi-modal approach in scene text recognition, particularly in challenging scenarios where visual reasoning plays a crucial role. [ABSTRACT FROM AUTHOR] |
قاعدة البيانات: |
Academic Search Index |