Login New user?  
Applied Mathematics & Information Sciences
An International Journal
               
 
 
 
 
 
 
 
 
 
 
 
 
 

Content
 

Volumes > Volume 19 > No. 5

 
   

Transcriptomic Pattern Analysis in Breast Cancer Patients: A Machine Learning Approach

PP: 1183-1192
doi:10.18576/amis/190517
Author(s)
Nata ́lia Padre, Monique Borges Seixas, Paulo Victor dos Santos, Glaucia Maria Bressan, Heron Oliveira dos Santos Lima, Marcella Scoczynski,
Abstract
Breast cancer remains one of the leading causes of mortality among women and is one of the most prevalent cancers worldwide. Advancing accurate diagnosis, treatment, and prevention requires a deeper understanding of the genetic alterations involved in tumorigenesis. Integrating genomic and transcriptomic data offers a powerful approach to uncover the molecular mechanisms underlying the disease. Transcriptomic profiling, which involves sequencing RNA to analyze gene expression, enables the identification of biomarkers for disease progression and supports the discovery of novel therapeutic targets. However, transcriptomic studies often include a mix of cancerous and non-cancerous cells, requiring robust analytical methods to distinguish between them and ensure meaningful interpretation. In this study, we propose a comprehensive pipeline for transcriptomic analysis using gene expression data from The Cancer Genome Atlas (TCGA). The dataset is pre-processed and normalized using the TCGAbiolinks package within the R software environment. Machine learning algorithms are employed to classify samples as tumor or normal tissue. Seven models are evaluated, with Random Forest and Radial Basis Function Support Vector Machine (RBF SVM) achieving the highest performance. RBF SVM reached an accuracy of 99.55%, precision of 99.55%, recall of 99.64%, and F1-score of 99.64%, while Random Forest obtained an accuracy of 99.38%, precision of 99.38%, recall of 99.50%, and F1-score of 99.50%. Stratified 5-fold cross-validation confirmed the models’ robustness, showing low variance across folds. Feature selection is performed to enhance interpretability, and five key genes were identified: ENSG00000152256.14 (PDK1), ENSG00000155875.15 (SAXO1), ENSG00000165194.15 (PCDH19), ENSG00000176884.16 (GRIN1), and ENSG00000180910.8 (TTTY11). These genes are further investigated using Ensembl for biological interpretation highlighting PDK1, involved in cancer metabolism; SAXO1, linked to cytoskeletal stability; PCDH19, associated with cell adhesion; GRIN1, related to glutamate signaling; and TTTY11, a pseudogene with potential regulatory roles. This study highlights the potential of machine learning in transcriptomic data analysis and offers a framework for identifying key biomarkers, contributing to precision oncology in breast cancer research.

  Home   About us   News   Journals   Conferences Contact us Copyright naturalspublishing.com. All Rights Reserved