Academic Journal

GRNet: a graph reasoning network for enhanced multi-modal learning in scene text recognition.

التفاصيل البيبلوغرافية
العنوان: GRNet: a graph reasoning network for enhanced multi-modal learning in scene text recognition.
المؤلفون: Jia, Zeguang1 (AUTHOR), Wang, Jianming2 (AUTHOR), Jin, Rize3 (AUTHOR)
المصدر: Computer Journal. Dec2024, Vol. 67 Issue 12, p3239-3250. 12p.
مصطلحات موضوعية: *LANGUAGE models, *TEXT recognition, *SEMANTICS, *RECOGNITION (Psychology), *FORECASTING
مستخلص: Recent advancements in scene text recognition have predominantly focused on leveraging textual semantics. However, an over-reliance on linguistic priors can impede a model's ability to handle irregular text scenes, including non-standard word usage, occlusions, severe distortions, or stretching. The key challenges lie in effectively localizing occlusions, perceiving multi-scale text, and inferring text based on scene context. To address these challenges and enhance visual capabilities, we introduce the Graph Reasoning Model (GRM). The GRM employs a novel feature fusion method to align spatial context information across different scales, beginning with a feature aggregation stage that extracts rich spatial contextual information from various feature maps. Visual reasoning representations are then obtained through graph convolution. We integrate the GRM module with a language model to form a two-stream architecture called GRNet. This architecture combines pure visual predictions with joint visual-linguistic predictions to produce the final recognition results. Additionally, we propose a dynamic iteration refinement for the language model to prevent over-correction of prediction results, ensuring a balanced contribution from both visual and linguistic cues. Extensive experiments demonstrate that GRNet achieves state-of-the-art average recognition accuracy across six mainstream benchmarks. These results highlight the efficacy of our multi-modal approach in scene text recognition, particularly in challenging scenarios where visual reasoning plays a crucial role. [ABSTRACT FROM AUTHOR]
قاعدة البيانات: Academic Search Index
الوصف
تدمد:00104620
DOI:10.1093/comjnl/bxae085