GRNet: a graph reasoning network for enhanced multi-modal learning in scene text recognition.

التفاصيل البيبلوغرافية
العنوان:	GRNet: a graph reasoning network for enhanced multi-modal learning in scene text recognition.
المؤلفون:	Jia, Zeguang¹ (AUTHOR), Wang, Jianming² (AUTHOR), Jin, Rize³ (AUTHOR)
المصدر:	Computer Journal. Dec2024, Vol. 67 Issue 12, p3239-3250. 12p.
مصطلحات موضوعية:	LANGUAGE models, TEXT recognition, SEMANTICS, RECOGNITION (Psychology), *FORECASTING
مستخلص:	Recent advancements in scene text recognition have predominantly focused on leveraging textual semantics. However, an over-reliance on linguistic priors can impede a model's ability to handle irregular text scenes, including non-standard word usage, occlusions, severe distortions, or stretching. The key challenges lie in effectively localizing occlusions, perceiving multi-scale text, and inferring text based on scene context. To address these challenges and enhance visual capabilities, we introduce the Graph Reasoning Model (GRM). The GRM employs a novel feature fusion method to align spatial context information across different scales, beginning with a feature aggregation stage that extracts rich spatial contextual information from various feature maps. Visual reasoning representations are then obtained through graph convolution. We integrate the GRM module with a language model to form a two-stream architecture called GRNet. This architecture combines pure visual predictions with joint visual-linguistic predictions to produce the final recognition results. Additionally, we propose a dynamic iteration refinement for the language model to prevent over-correction of prediction results, ensuring a balanced contribution from both visual and linguistic cues. Extensive experiments demonstrate that GRNet achieves state-of-the-art average recognition accuracy across six mainstream benchmarks. These results highlight the efficacy of our multi-modal approach in scene text recognition, particularly in challenging scenarios where visual reasoning plays a crucial role. [ABSTRACT FROM AUTHOR]
قاعدة البيانات:	Academic Search Index

الوصف
تدمد:	00104620
DOI:	10.1093/comjnl/bxae085