Login New user?  
Applied Mathematics & Information Sciences
An International Journal
               
 
 
 
 
 
 
 
 
 
 
 
 
 

Content
 

Volumes > Volume 19 > No. 4

 
   

Optimized Learning Methods for Classifying Breast Cancer Subtypes Based on Gene Expression Data

PP: 819-830
doi:10.18576/amis/190408
Author(s)
Ana Beatriz M. Valentin, Glaucia M. Bressan, Elisaˆngela Ap. S. Lizzi, Leonardo Canuto Jr.,
Abstract
Understanding the characteristics of tumors and breast cancer subtypes from gene expression data is crucial to aid in cancer type identification, obtain a more accurate diagnosis, and promptly direct appropriate treatment. In this context, the objective of this study is to apply machine learning and deep learning methods for the multi-class classification of genes associated with breast cancer, considering gene expression datasets, and to evaluate the predictive performance of these methods. The dataset used is obtained from The Cancer Genome Atlas repository and are preprocessed for data treatment and the application of dimensionality reduction techniques due to the high number of variables. Initially, principal component analysis was used to reduce the dimensionality of the data. Next, different traditional machine learning methods such as Logistic Regression, Support Vector Machine, and Random Forest are applied, along with deep learning models such as Multilayer Perceptron (MLP) and Convolutional Neural Network (CNN). To enhance the performance of these models, the Optuna library is used for hyperparameter optimization, evaluating the performance of the algorithms both with and without optimization. Performance comparison among the algorithms showed that Support Vector Machine achieved high accuracy. However, the MLP and CNN models, especially when optimized with Optuna, also showed competitive results. Optimization adjusted crucial parameters such as learning rate and number of layers, which resulted in significant performance improvements. Although Random Forest was less affected by optimization, MLP and CNN showed substantial gains. The analysis highlighted that hyperparameter optimization can be essential to improve the accuracy of the classifier. An analysis of feature importance was conducted in order to study which genes have the greatest relevance in the classification task.

  Home   About us   News   Journals   Conferences Contact us Copyright naturalspublishing.com. All Rights Reserved