Partial Visual-Tactile Fused Learning for Robotic Object Recognition
العنوان: | Partial Visual-Tactile Fused Learning for Robotic Object Recognition |
---|---|
المؤلفون: | Tao Zhang, Jiahua Dong, Yang Cong, Dongdong Hou |
المصدر: | IEEE Transactions on Systems, Man, and Cybernetics: Systems. 52:4349-4361 |
بيانات النشر: | Institute of Electrical and Electronics Engineers (IEEE), 2022. |
سنة النشر: | 2022 |
مصطلحات موضوعية: | Modality (human–computer interaction), business.industry, Computer science, Cognitive neuroscience of visual object recognition, ENCODE, Linear subspace, Computer Science Applications, Human-Computer Interaction, Control and Systems Engineering, Completeness (order theory), Learning methods, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Encoder, Software, Subspace topology |
الوصف: | Currently, visual-tactile fusion learning for robotic object recognition has achieved appealing performance, due to the fact that visual and tactile data can offer complementary information. However: 1) the distinct gap between vision and touch makes it difficult to fully explore the complementary information, which would further lead to performance degradation and 2) most of the existing visual-tactile fused learning methods assume that visual and tactile data are complete, which is often difficult to be satisfied in many real-world applications. In this article, we propose a partial visual-tactile fused (PVTF) framework for robotic object recognition to address these challenges. Specifically, we first employ two modality-specific (MS) encoders to encode partial visual-tactile data into two incomplete subspaces (i.e., visual subspace and tactile subspace). Then, a modality gap mitigated (MGM) network is adopted to discover modality-invariant high-level label information, which is utilized to generate gap loss and further help updating the MS encoders for relatively consistent visual and tactile subspaces generation. In this way, the huge gap between vision and touch is mitigated, which would further contribute to mine the complementary visual-tactile information. Finally, to achieve data completeness and complementary visual-tactile information exploration simultaneously, a cycle subspace leaning technique is proposed to project the incomplete subspaces into a complete subspace by fully exploiting all the obtainable samples, where complete latent representations with maximum complementary information can be learned. A lot of comparative experiments conducted on three visual-tactile datasets validate the advantage of the proposed PVTF framework, by comparing with state-of-the-art baselines. |
تدمد: | 2168-2232 2168-2216 |
DOI: | 10.1109/tsmc.2021.3096235 |
URL الوصول: | https://explore.openaire.eu/search/publication?articleId=doi_________::b83c98961057fa120599f159e16935ac https://doi.org/10.1109/tsmc.2021.3096235 |
Rights: | CLOSED |
رقم الانضمام: | edsair.doi...........b83c98961057fa120599f159e16935ac |
قاعدة البيانات: | OpenAIRE |
تدمد: | 21682232 21682216 |
---|---|
DOI: | 10.1109/tsmc.2021.3096235 |