|
|
 |
| |
|
|
|
Comparative Analysis of River Water Quality in Asia and the Arab Region Using Machine Learning-Based Classification and Prediction Models |
|
|
|
PP: 1501-1525 |
|
|
doi:10.18576/amis/190620
|
|
|
|
Author(s) |
|
|
|
Hassan Shaheed,
Mohd Hafiz Zawawi,
Gasim Hayder,
Karim Sherif Mostafa,
Norbaya Sidek,
Mohamed A. Hafez,
|
|
|
|
Abstract |
|
|
| Water quality is a key pillar of environmental stability and human health, but it is increasingly threatened by a variety of contaminants. These pollutants disturb water balance and pose serious public health problems. As the challenges of water quality increase, the simulation and forecasting of water quality are becoming essential to address contamination problems. Recent research has focused on the development of comprehensive models capable of classifying and predicting water quality using advanced machine-learning algorithms to classify water quality index (WQI0 values according to predefined parameters. Synthetic pollution index (SPI) and the water quality index (WQI) are among the most often used methods for classifying and reflecting the quality of the water and pollution risk in a given area [46]. Data were gathered from local rivers in Malaysia, Iraq, and India. The WQI was calculated using 32 parameters, including temperature, dissolved oxygen, hardness, pH, coliforms, and chloride concentrations. Pre-processing of the data involved class imbalance, outliers, and standardization. An automated water quality assessment system has been developed using a hybrid approach combining CatBoost, Support Vector Machine (SVM), Naive Bayes, and Light-Blight Gradient (LGBM) regression models with Random Forest (RF), EML Regressor, Decision Tree (DT), and M5 Model Tree (MLM) regression. CatBoost achieved the highest classification accuracy of 94.55 percent, with a Matthews correlation coefficient (MCC) of 93.31 percent for the Malaysian dataset. For regression analysis, the M5 model tree had a superior predictive performance for most datasets and had strong results for the metrics MAE, MSE, and R2, while Random Forest had better results for the Malaysian dataset. The results highlight the spatial diversity of water quality in the study regions and confirm the ability of machine learning models to support water quality management based on the data. The proposed models showed high accuracy and reliability in the WQI classification. |
|
|
|
|
 |
|
|