Feature Selection-Driven Supervised Machine Learning for Dissolved Oxygen Prediction Using Multidomain Environmental Parameters in a Major Fishing Lake
Al R. ROMANO, Eden May B. DELA PENA, Karl Ezra S. PILARIO
Abstract. Dissolved oxygen (DO) holds an important role in water quality monitoring due to its impact on the aquatic organisms inhabiting the water body. Sampaloc lake, located in San Pablo City in Laguna Province, Philippines, one of the major fishing lakes, has been affected by the decrease of DO concentration leading to continuous fish kill incidents. This study used supervised machine learning-based model utilizing the filter and wrapper method in feature selection to predict the DO levels in Sampaloc Lake. The goal of the study is to provide a baseline for building an early warning system for fish kills events in the lake. Water quality, physical and hydroclimatic parameters are the range of features selected as predictors to create a reliable DO predictive model. Based on the feature selection process, water temperature, atmospheric temperature, biochemical oxygen demand, surface pressure and pH are the consistent important feature in predicting DO for both filter and wrapper method. Overall, the result of the eXtreme Gradient Boosting Regression (XGBR) Model of the filter method with R² = 0.95, RMSE = 0.14, and MAE = 0.10 showed a reliable performance among the predicted models (Multiple Linear Regression, Random Forest, Kernel Ridge Regression, and Bootstrap Aggregating) tested in the study.
Keywords
Dissolved Oxygen, Machine Learning, Water Quality Prediction, Aquaculture, Feature Selection
Published online 5/10/2026, 17 pages
Copyright © 2026 by the author(s)
Published under license by Materials Research Forum LLC., Millersville PA, USA
Citation: Al R. ROMANO, Eden May B. DELA PENA, Karl Ezra S. PILARIO, Feature Selection-Driven Supervised Machine Learning for Dissolved Oxygen Prediction Using Multidomain Environmental Parameters in a Major Fishing Lake, Materials Research Proceedings, Vol. 66, pp 406-422, 2026
DOI: https://doi.org/10.21741/9781644904152-37
The article was published as article 37 of the book Advanced Materials and Sustainable Energy Technologies
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
References
[1] Phan-Van, M., Rousseau, D., & De Pauw, N. (2008). Effects of fish bioturbation on the vertical distribution of water temperature and dissolved oxygen in a fish culture-integrated waste stabilization pond system in Vietnam. Aquaculture, 281(1-4), 28–33. https://doi.org/10.1016/j.aquaculture.2008.04.033
[2] Wu, Y., Sun, L., Sun, X. et al. A hybrid XGBoost-ISSA-LSTM model for accurate short-term and long-term dissolved oxygen prediction in ponds. Environ Sci Pollut Res 29, 18142–18159 (2022). https://doi.org/10.1007/s11356-021-17020-5
[3] Brillo, B. B. B. (2017). The Politics of Lake Governance: Sampaloc Lake, Pandin Lake, and Tadlac Lake of the Laguna de Bay Region, Philippines. Asia-Pacific Social Science Review, 17(1). https://doi.org/10.59588/2350-8329.1123
[4] Magcale-Macandog, D. B., Predo, C. D., Campang, J. G., Pleto, J. V. R., Perez, M. G. L. D., Larida, N. J. A., Natuel, F. A., Quiñones, S. G. L., & Cabillon, Y. C. L. (2021). Socio-economic and environmental impacts of COVID-19 pandemic: Building resilience of the seven lakes of San Pablo city, Philippines. In Elsevier eBooks (pp. 255–270). https://doi.org/10.1016/b978-0-323-85512-9.00012-7
[5] Cinco, M., January 5, 2021, Philippine Daily Inquirer. 2 tons of tilapia lost as fish kill hits Laguna’s Sampaloc Lake.https://newsinfo.inquirer.net/1379395/2-tons-of-tilapia-lost-as-fish-kill-hits-lagunas-sampaloc-lake
[6] Zhu, M., Wang, J., Yang, X., Zhang, Y., Zhang, L., Ren, H., Wu, B., & Ye, L. (2022). A review of the application of machine learning in water quality evaluation. Eco-Environment & Health, 1(2), 107–116. https://doi.org/10.1016/j.eehl.2022.06.001
[7] Laguna Lake Development Authority. (2005). Sampaloc Lake water quality report: 1996–2005. Environmental Quality Management Division. Retrieved from https://llda.gov.ph/wp-content/uploads/dox/7lakes/lakesampaloc.pdf
[8] Selim, A., Shuvo, S., Islam, M., Moniruzzaman, M., Shah, S., & Ohiduzzaman, M. (2023). Predictive models for dissolved oxygen in an urban lake by regression analysis and artificial neural network. Total Environment Research Themes, 7, 100066. https://doi.org/10.1016/j.totert.2023.100066
[9] Hofmann, T., Scholkopf, ¨ B., Smola, A.J., 2008. Kernel methods in machine learning. Ann. Stat. 36 (3), 1171–1220. https://doi.org/10.1214/009053607000000677
[10] Guo, H., Jinhui Jeanne Huang, Zhu, X., Wang, B., Tian, S., Wang, X., & Mai, Y. (2021). A generalized machine learning approach for dissolved oxygen estimation at multiple spatiotemporal scales using remote sensing. Environmental Pollution, 288, 117734–117734. https://doi.org/10.1016/j.envpol.2021.117734
[11] Breiman, L. (2001). Random Forests. Machine Learning, 45, 5-32.
https://doi.org/10.1023/A:1010933404324
[12] Chen, K., Chen, H., Zhou, C., Huang, Y., Qi, X., Shen, R., Liu, F., Zuo, M., Zou, X., Wang, J., Zhang, Y., Chen, D., Chen, X., Deng, Y., & Ren, H. (2019). Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Research, 171, 115454. https://doi.org/10.1016/j.watres.2019.115454
[13] an, Y., Jie, J., & Yin, M. (2025). Detection of event-related potentials as a biomarker in major depressive disorder using an XGBoost model. Biomedical Signal Processing and Control, 108, 107879. https://doi.org/10.1016/j.bspc.2025.107879
[14] J, A. O., Ogwueleka, F., & Odion, P. O. (2020). Effective and accurate Bootstrap Aggregating (BAgging) ensemble algorithm model for prediction and classification of hypothyroid disease. International Journal of Computer Applications, 176(39), 40–48. https://doi.org/10.5120/ijca2020920542
[15] Pilario, K. E., Escober, E. J., De Los Reyes, A., V., & Espino, M. P. (2024). Robust Prediction of Chlorophyll-a from Nitrogen and Phosphorus Content in Philippine and Global Lakes Using Fine-Tuned, Explainable Machine Learning. Environmental Challenges, 101056. https://doi.org/10.1016/j.envc.2024.101056
[16] Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N., 2016. Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104 (1), 148–175. https://doi.org/10.1109/JPROC.2015.2494218.
[17] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M., 2019. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631. https://doi.org/10.1145/3292500.3330701.
[18] Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623. https://doi.org/10.7717/peerj-cs.623
[19] Hodson, T. O. (2022). Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not. Geoscientific Model Development, 15(14), 5481–5487. https://gmd.copernicus.org/articles/15/5481/2022/
[20] Lai, S., Hu, X., Xu, H., Ren, Z., & Liu, Z. (2023). Multimodal sentiment analysis: A survey. Displays, 80, 102563. https://doi.org/10.1016/j.displa.2023.102563
[21] Jenny J-P, Normandeau A, Francus P, Taranu ZE, Gregory-Eaves I, Lapointe F, Jautzy J, Ojala AEK, Dorioz J-M, Schimmelmann A, et al. 2016b. Urban point sources of nutrients were the leading cause for the historical spread of hypoxia across European lakes. Proc Natl Acad Sci USA. 113(45):12655–12660.
[22] Hutchings, A. M., De Vries, C. S., Hayes, N. R., & Orr, H. G. (2024b). Temperature and dissolved oxygen trends in English estuaries over the past 30 years. Estuarine Coastal and Shelf Science, 306, 108892. https://doi.org/10.1016/j.ecss.2024.108892
[23] Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
[24] Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag. https://doi.org/10.1007/978-0-387-21606-5
[25] Grekousis, G. (2025). Geographical-XGBoost: A new ensemble model for spatially local regression based on gradient-boosted trees. Journal of Geographical Systems, 27(2), 169–195. https://doi.org/10.1007/s10109-025-00465-4
[26] Fusi, M., Barausse, A., Booth, J. M., Chapman, E., Daffonchio, D., Sanderson, W., Diele, K., & Giomi, F. (2024). The predictability of fluctuating environments shapes the thermal tolerance of marine ectotherms and compensates narrow safety margins. Scientific Reports, 14, 26174. https://doi.org/10.1038/s41598-024-77621-1
[27] Xiao, Q., Zhou, Y., Luo, J., Hu, C., Duan, H., Qiu, Y., Zhang, M., Hu, Z., & Xiao, W. (2024). Low carbon dioxide emissions from aquaculture farm of lake revealed by long-term measurements. Agriculture, Ecosystems & Environment, 363, 108851. https://doi.org/10.1016/j.agee.2023.108851
[28] Dębska, K., Rutkowska, B., Szulc, W., & Gozdowski, D. (2021). Changes in selected water quality parameters in the Utrata River as a function of catchment area land use. Water, 13(21), 2989. https://doi.org/10.3390/w13212989
[29] Akinnawo, S. O. (2023). Eutrophication: Causes, consequences, physical, chemical and biological techniques for mitigation strategies. Environmental Challenges, 12, 100733. https://doi.org/10.1016/j.envc.2023.100733
[30] Dębska, K., Rutkowska, B., Szulc, W., & Gozdowski, D. (2021). Changes in selected water quality parameters in the Utrata River as a function of catchment area land use. Water, 13(21), 2989. https://doi.org/10.3390/w13212989
[31] Rather, I. A., & Dar, A. Q. (2020). Assessing the impact of land use and land cover dynamics on water quality of Dal Lake, NW Himalaya, India. Applied Water Science, 10, 219. https://doi.org/10.1007/s13201-020-01300-5
[32] Fajardo, C., Costa, G., Nande, M., Botelho, C. M., & Boaventura, R. A. R. (2023). Removal of pharmaceuticals from water by adsorption onto biochar produced from waste biomass. Environmental Challenges, 12, 100733. https://doi.org/10.1016/j.envc.2023.100733
[33] Akinnawo, S. O. (2023). Eutrophication: Causes, consequences, physical, chemical and biological techniques for mitigation strategies. Environmental Challenges, 12, 100733. https://doi.org/10.1016/j.envc.2023.100733
[34] Zhang, Y., Shen, Z., Wang, M., Guo, Y., & Xu, H. (2019). Adsorption of heavy metals from aqueous solution by biochar derived from agricultural waste: A review. Scientific Reports, 9, 1–12. https://doi.org/10.1038/s41598-019-56046-1
[35] Núñez-Delgado, A. (2024). Research on environmental aspects of retention/release of pollutants in soils and sorbents: What should be next? Environmental Research, 251(Part 1), 118593. https://doi.org/10.1016/j.envres.2024.118593
[36] Dey, S., Bera, T., & Bhattacharya, S. S. (2024). Biochar-based remediation of heavy metal–contaminated water and soil: Mechanisms and environmental applications. Environmental Monitoring and Assessment, 196, 12732. https://doi.org/10.1007/s10661-024-12732-w
[37] Kaur, R., & Singh, J. (2024). Biochar for wastewater treatment: A review of production, modification, and applications. Water, 16(24), 3604. https://doi.org/10.3390/w16243604
[38] Verma, A., Kumar, S., & Singh, R. (2024). Machine learning approaches for air quality prediction using regression techniques. In 2024 International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES) (pp. 1–6). IEEE. https://doi.org/10.1109/ICPEICES62430.2024.10719064
[39] Shen, Z., Zhang, Y., Jin, F., McMillan, O., & Al-Tabbaa, A. (2022). Characteristics and mechanisms of biochar for contaminant removal from water: A review. Journal of Hydrology, 612, 127711. https://doi.org/10.1016/j.jhydrol.2022.127711
[40] Xu, C., Luo, P., Wu, P., Song, C., & Chen, X. (2022). Detection of periodicity, aperiodicity, and corresponding driving factors of river dissolved oxygen based on high-frequency measurements. Journal of Hydrology, 609, 127711. https://doi.org/10.1016/j.jhydrol.2022.127711

