.:: Natural Sciences Publishing ::.

Login

New user?

Applied Mathematics & Information Sciences

An International Journal

AMIS Home

For Authors

Editorial Board

Publication Ethics

Author Self-Archiving

Processing Charges

Submit an Article

Content

Forthcoming Papers

Subscription

Content


	Volumes > Volume 20 > No. 3


	AI-Powered Robotics Framework for Assistive Navigation Industrial Automation and Smart Service Applications

	PP: 697-708

	doi:10.18576/amis/200310

	Author(s)

	Ahmed I. Taloba, Inam A. Alanazi, Osama R. Shahin, Islam Abdalla Mohamed Abass, Loay F. Hussein, Waleed M. Abdelfattah, Nicu Bizon, Mohammed Sallah, Khaled Bedair,

	Abstract

	The increasing importance of intelligent assistive and industrial robotic systems has indicated the necessity to have structures that can comprehend intricate surroundings and human orders simultaneously. Literature mostly utilizes unimodal perception, including either vision-only or speech-only modality, which in dynamic tasks tend to have less understanding of context and lower performance on tasks. To overcome these constraints, this paper will present a new multimodal fusion system that incorporates both vision and speech inputs through YOLO to identify objects, GPT to provide semantic information, and LXMERT to cross-modally align information to facilitate effective context-responsive robot decision-making. It was trained in Python with PyTorch and Hugging Face Transformers and tested on the COCO 2017 dataset in vision and the Arabic Speech Commands dataset in speech. Image normalization, data augmentation, silence trimming and MFCC feature extraction were used as preprocessing steps, which guaranteed high quality inputs in feature extraction and fusion. The trained RL agent with fused embeddings demonstrated significant success rate gains, with a total success of 83-92% on all navigation, object picking, and obstacle avoidance tasks, which is relatively 8-10% accuracy improvement over unimodal baselines. The successful object-command correspondence was verified using the help of attention heatmaps that ensures the reliability and interpretability of the decision-making. These findings indicate that multimodal fusion can improve task performance and generalization at the same level of explainability. The proposed framework is scalable, flexible, and can provide a future course of development of autonomous robotic systems that can successfully perform context-related tasks. The cross-modal embeddings reinforcement learning will become an influential and realistic model that will certainly make a significant impact in the future of assistive and industrial robotics research.

Home

Copyright naturalspublishing.com. All Rights Reserved