Academic Journal

A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion

التفاصيل البيبلوغرافية
العنوان: A Hybrid CNN-Transformer Network for Object Detection in Optical Remote Sensing Images: Integrating Local and Global Feature Fusion
المؤلفون: Youxiang Huang, Donglai Jiao, Xingru Huang, Tiantian Tang, Guan Gui
المصدر: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol 18, Pp 241-254 (2025)
بيانات النشر: IEEE, 2025.
سنة النشر: 2025
المجموعة: LCC:Ocean engineering
LCC:Geophysics. Cosmic physics
مصطلحات موضوعية: Convolutional neural networks (CNNs), feature fusion, local and global attention (LGA), optical remote sensing images (RSIs), vision transformer, Ocean engineering, TC1501-1800, Geophysics. Cosmic physics, QC801-809
الوصف: Remote sensing images (RSIs) object detection is important in natural disaster management, urban planning and resource exploration. However, due to the large differences between RSIs and natural images (NIs), most of the existing object detectors for NIs cannot be directly used to process RSIs. Most existing models based on convolutional neural networks (CNNs) require additional design of specific attentional modules to relate small targets in RSIs to global positional relationships. In contrast, transformer-based models had to add modules to obtain more detailed information. This imposes additional computational overheads for deployment on edge devices. To solve the above-mentioned problem, we propose a hybrid CNN and transformer model (DConvTrans-LKA) to enhance the model's ability to acquire features and design a fusion of local and global attention mechanisms to fuse local features and global location information. To better fuse the feature and location information extracted by the model, we introduce a feature residual pyramid network to enhance the model's ability to fuse multiscale feature maps. Finally, we conduct experiments in three representative optical RSI datasets (NWPU VHR-10, HRRSD, and DIOR) to verify the effectiveness of our proposed DConvTrans-LKA method. The experimental results show that our proposed method reaches 61.7%, 82.1%, and 61.3% at mAP at 0.5, respectively, further demonstrating the potential of our proposed method in RSI object detection tasks.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1939-1404
2151-1535
Relation: https://ieeexplore.ieee.org/document/10721373/; https://doaj.org/toc/1939-1404; https://doaj.org/toc/2151-1535
DOI: 10.1109/JSTARS.2024.3483253
URL الوصول: https://doaj.org/article/075bf58b679d40af9b13bf3cadb5b2a4
رقم الانضمام: edsdoj.075bf58b679d40af9b13bf3cadb5b2a4
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:19391404
21511535
DOI:10.1109/JSTARS.2024.3483253