Air Quality Forecasting Based on Socio-Economic and Environmental Indicators: Combining Statistical and Machine Learning Techniques
Main Article Content
Abstract
This research aims to model and predict greenhouse gas (GHG) emissions in Saudi Arabia by examining their association with crucial socio-economic and environmental factors. Utilizing annual data from 1980 to 2023, the study focuses on three emission variables as dependent variables: carbon dioxide (CO₂) emissions from the power sector, methane (CH₄) emissions from the power sector, and nitrous oxide (N₂O) emissions from industrial activities. The independent variables include agricultural land area, urban population, GDP growth, exports, trade openness, foreign direct investment, and manufacturing output. A comparative assessment of various modeling approaches Ordinary Least Squares (OLS), Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net (Enet), Random Forest (RF), and a new hybrid method that merges Elastic Net and Random Forest (ENRF) was performed. The performance of the models was evaluated based on Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). The results indicated that the ENRF model consistently surpassed both traditional and machine learning techniques, achieving the lowest MSE and RMSE values. The outcomes underscore the efficacy of hybrid statistical and machine learning models in reliably predicting emissions and informing environmental policy in complex, big data contexts.
Article Details
References
- E. Koçak, Comprehensive Evaluation of Machine Learning Models for Real-World Air Quality Prediction and Health Risk Assessment by AirQ+, Earth Sci. Informatics 18 (2025), 447. https://doi.org/10.1007/s12145-025-01941-7.
- A. El-Sheikh, M.R. Abonazel, M.C. Ali, A Review of Penalized Regression and Machine Learning Methods in High-Dimensional Data, Egypt. Stat. J. 69 (2025), 250-261. https://doi.org/10.21608/esju.2025.368665.1080.
- A.A. El-Sheikh, M.C. Ali, M.R. Abonazel, Development of Two Methods for Estimating High-Dimensional Data in the Case of Multicollinearity and Outliers, Int. J. Anal. Appl. 22 (2024), 187. https://doi.org/10.28924/2291-8639-22-2024-187.
- K. Aunan, M.H. Hansen, S. Wang, Introduction: Air Pollution in China, China Q. 234 (2017), 279-298. https://doi.org/10.1017/s0305741017001369.
- I. Kristianto Singgih, Air Quality Prediction in Smart City’s Information System, Int. J. Informatics, Inf. Syst. Comput. Eng. 1 (2020), 35–46. https://doi.org/10.34010/injiiscom.v1i1.4020.
- Z. Farhadi, H. Bevrani, M. Feizi-Derakhshi, Improving Random Forest Algorithm by Selecting Appropriate Penalized Method, Commun. Stat. - Simul. Comput. 53 (2022), 4380-4395. https://doi.org/10.1080/03610918.2022.2150779.
- Z. Cai, M. Zafferani, A. Hargrove, Ensemble Learning-Based Quantitative Structure-Activity Relationship Platform Predicts Binding Behavior of RNA-Targeted Small Molecules, ChemRxiv (2021). http://doi.org/10.33774/chemrxiv-2021-czl9p.
- A.P. Singh, D. Vashisth, S. Srivastava, Random Forest Regressor for Layered Earth Data Inversion, in: Fall Meeting 2019, American Geophysical Union, #S53D-0483, (2019). https://ui.adsabs.harvard.edu/abs/2019AGUFM.S53D0483V/abstract.
- M.B. de Ávila, M.M. Xavier, V.O. Pintro, W.F. de Azevedo, Supervised Machine Learning Techniques to Predict Binding Affinity. A Study for Cyclin-Dependent Kinase 2, Biochem. Biophys. Res. Commun. 494 (2017), 305-310. https://doi.org/10.1016/j.bbrc.2017.10.035.
- C. Venkataraman, M. Brauer, K. Tibrewal, P. Sadavarte, Q. Ma, et al. Source Influence on Emission Pathways and Ambient PM2.5 Pollution Over India (2015–2050), Atmospheric Chem. Phys. 18 (2018), 8017-8039. https://doi.org/10.5194/acp-18-8017-2018.
- Y. Wu, A. Shiledar, Y. Li, J. Wong, S. Feng, et al. Air Quality Monitoring Using Mobile Microscopy and Machine Learning, Light: Sci. Appl. 6 (2017), e17046-e17046. https://doi.org/10.1038/lsa.2017.46.
- D. Zhang, S.S. Woo, Real Time Localized Air Quality Monitoring and Prediction Through Mobile and Fixed IoT Sensing Network, IEEE Access 8 (2020), 89584-89594. https://doi.org/10.1109/access.2020.2993547.
- S. Zheng, R.P. Singh, Y. Wu, C. Wu, A Comparison of Trace Gases and Particulate Matter Over Beijing (China) and Delhi (India), Water Air, Soil Pollut. 228 (2017), 181. https://doi.org/10.1007/s11270-017-3360-2.
- M.M. Rahman, M.S. Rahman, S.R. Chowdhury, A. Elhaj, S.A. Razzak, et al. Greenhouse Gas Emissions in the Industrial Processes and Product Use Sector of Saudi Arabia—An Emerging Challenge, Sustainability 14 (2022), 7388. https://doi.org/10.3390/su14127388.
- J. Binsuwadan, L. Alotaibi, H. Almugren, The Role of Agriculture in Shaping CO2 in Saudi Arabia: A Comprehensive Analysis of Economic and Environmental Factors, Sustainability 17 (2025), 4346. https://doi.org/10.3390/su17104346.
- E.E.M. Ebrahim, M.R. Abonazel, A.E.A. Ahmed, S. Abdel-Rahman, W.A.A. Albeltagy, Analysis of the Economic and Environmental Factors Affecting Co2 Emissions in Egypt: A Proposed Dynamic Econometric Model, Int. J. Energy Econ. Polic. 15 (2025), 152-165. https://doi.org/10.32479/ijeep.19222.
- E.E.M. Ebrahim, M.R. Abonazel, O.A. Shalaby, W.A.A. Albeltagy, Studying the Impact of Socioeconomic and Environmental Factors on Nitrogen Oxide Emissions: Spatial Econometric Modeling, Int. J. Energy Econ. Polic. 15 (2025), 248-259. https://doi.org/10.32479/ijeep.18300.
- A.E. Hoerl, R.W. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics 12 (1970), 55-67. https://doi.org/10.2307/1267351.
- H. Zou, T. Hastie, Regularization and Variable Selection via the Elastic Net, J. R. Stat. Soc. Ser. B: Stat. Methodol. 67 (2005), 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x.
- R. Tibshirani, Regression Shrinkage and Selection via the Lasso, J. R. Stat. Soc. Ser. B: Stat. Methodol. 58 (1996), 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
- J.H. Friedman et al., glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models, R Package Version 1-1, (2010). https://cran.r-project.org/web/packages/glmnet/glmnet.pdf.
- L. Breiman, Random Forests, Mach. Learn. 45 (2001), 5-32. https://doi.org/10.1023/a:1010933404324.
- S. RColorBrewer, M.A. Liaw, Package ‘randomForest’, Univ. Calif., Berkeley, (2018). https://cran.r-project.org/web/packages/randomForest/randomForest.pdf.
- Y.M. Abd Algani, M. Ritonga, B.K. Bala, M.S. Al Ansari, M. Badr, A.I. Taloba, Machine Learning in Health Condition Check-Up: An Approach Using Breiman's Random Forest Algorithm, Measurement: Sensors 23 (2022), 100406. https://doi.org/10.1016/j.measen.2022.100406.
- M.C. Ali, E.E.M. Ebrahim, M.R. Abonazel, Air Quality Forecasting Using a Modified Statistical Approach: Combining Statistical and Machine Learning Methods, Int. J. Innov. Res. Sci. Stud. 8 (2025), 1321-1335. https://doi.org/10.53894/ijirss.v8i4.8061.
- A.A. EL-Sheikh, M.R. Abonazel, M.C. Ali, Proposed Two Variable Selection Methods for Big Data: Simulation and Application to Air Quality Data in Italy, Commun. Math. Biol. Neurosci. 2022 (2022), 16. https://doi.org/10.28919/cmbn/6908.
- M.R. Abonazel, A New Biased Estimation Class to Combat the Multicollinearity in Regression Models: Modified Two--Parameter Liu Estimator, Comput. J. Math. Stat. Sci. 4 (2025), 316-347. https://doi.org/10.21608/cjmss.2025.347818.1096.
- M.R. Abonazel, I. Dawoud, M.N. Al-Ghamdi, R.A. Farghali, Developing the Generalized Dawoud-Kibria Estimator for the Multinomial Logistic Model: Simulation Study and Application, Sci. Afr. 29 (2025), e02803. https://doi.org/10.1016/j.sciaf.2025.e02803.
- M.R. Abonazel, New Modified Two-Parameter Liu Estimator for the Conway–maxwell Poisson Regression Model, J. Stat. Comput. Simul. 93 (2023), 1976-1996. https://doi.org/10.1080/00949655.2023.2166046.
- M.N. Al-Ghamdi, M.R. Abonazel, I. Dawoud, Z.Y. Algamal, A.R. Azazy, A New Estimator of the Gamma Regression Model: Theory, Simulation, and Application to Body Fat Data, Commun. Math. Biol. Neurosci. 2025, (2025), 53. https://doi.org/10.28919/cmbn/9149.
- M.R. Abonazel, E.E.M. Ebrahim, A.A. Saber, A.R. Azazy, On New Ridge Estimators of the Conway-Maxwell Poisson Model in the Case of Highly Correlated Predictor Variables: Application to Plywood Quality Data, Int. J. Innov. Res. Sci. Stud. 8 (2025), 603–614. https://doi.org/10.53894/ijirss.v8i5.8775.