Login New user?  
Applied Mathematics & Information Sciences
An International Journal
               
 
 
 
 
 
 
 
 
 
 
 
 
 

Content
 

Volumes > Volume 20 > No. 3

 
   

AI-Powered Robotics Framework for Assistive Navigation Industrial Automation and Smart Service Applications

PP: 697-708
doi:10.18576/amis/200310        
Author(s)
Ahmed I. Taloba, Inam A. Alanazi, Osama R. Shahin, Islam Abdalla Mohamed Abass, Loay F. Hussein, Waleed M. Abdelfattah, Nicu Bizon, Mohammed Sallah, Khaled Bedair,
Abstract
The increasing importance of intelligent assistive and industrial robotic systems has indicated the necessity to have structures that can comprehend intricate surroundings and human orders simultaneously. Literature mostly utilizes unimodal perception, including either vision-only or speech-only modality, which in dynamic tasks tend to have less understanding of context and lower performance on tasks. To overcome these constraints, this paper will present a new multimodal fusion system that incorporates both vision and speech inputs through YOLO to identify objects, GPT to provide semantic information, and LXMERT to cross-modally align information to facilitate effective context-responsive robot decision-making. It was trained in Python with PyTorch and Hugging Face Transformers and tested on the COCO 2017 dataset in vision and the Arabic Speech Commands dataset in speech. Image normalization, data augmentation, silence trimming and MFCC feature extraction were used as preprocessing steps, which guaranteed high quality inputs in feature extraction and fusion. The trained RL agent with fused embeddings demonstrated significant success rate gains, with a total success of 83-92% on all navigation, object picking, and obstacle avoidance tasks, which is relatively 8-10% accuracy improvement over unimodal baselines. The successful object-command correspondence was verified using the help of attention heatmaps that ensures the reliability and interpretability of the decision-making. These findings indicate that multimodal fusion can improve task performance and generalization at the same level of explainability. The proposed framework is scalable, flexible, and can provide a future course of development of autonomous robotic systems that can successfully perform context-related tasks. The cross-modal embeddings reinforcement learning will become an influential and realistic model that will certainly make a significant impact in the future of assistive and industrial robotics research.

  Home   About us   News   Journals   Conferences Contact us Copyright naturalspublishing.com. All Rights Reserved