LOGO

Analysis  of the Accuracy of CatBoost, Random Forest, and SVM Models in Poverty Level Classification in Indonesia

Authors
  • Pasha Khatami Hasibuan

    Yogyakarta State University image/svg+xml
    Author
  • Rezkya Nurdiana

    Author
  • Atika Pratiwi Harahap

    Author
Keywords:
Poverty Rate, CatBoost, Random Forest, SVM, Indonesia, Classification, Analysis
Abstract

Introduction: Poverty alleviation is a key focus in Indonesia's national development agenda, but the effectiveness of social assistance distribution is often hampered by exclusion errors and slow data updates

Objective: This study aims to conduct a rigorous comparative analysis of the performance of three leading machine learning algorithms, namely CatBoost, Random Forest, and Support Vector Machine (SVM), in classifying poverty levels in districts/cities in Indonesia to improve the accuracy of policy targeting

Methods: Using a comprehensive dataset from the Central Statistics Agency (BPS) covering 11 socio-economic indicators from 514 administrative regions, this study applied standard data pre-processing techniques, data sharing using stratified sampling (80% trained, 20% tested), and model validation through a 5-Fold Cross-Validation scheme to ensure consistency of results.

Results: The results of the experiment showed that the CatBoost model obtained better predictive performance with an Accuracy of 94.17%, F1-Macro of 86.79%, MCC of 73.90%, and an AUC-ROC of 95.97%, beating the Random Forest model in terms of generalization (Accuracy of 94.17%) and SVM (Accuracy of 86.41%). The main scientific findings from the feature importance analysis show that Average School Age, Expenditure per Capita, and Access to Decent Sanitation are the three most significant factors that affect the poverty conditions of an area.

Conclusion: This study shows that algorithms that use gradient boosting (CatBoost) are more efficient and resilient than bagging or kernel-based methods in overcoming the heterogeneity of Indonesian demographic data. The results of this study encourage the government to implement a data-based approach in setting program targets, with an emphasis on intervention on improving the quality of human resources and basic infrastructure.

 

Downloads
Download data is not yet available.
Cover Image
Downloads
Published
2025-12-31
Section
Articles
References

[1] BPS, Profil Kemiskinan di Indonesia Maret 2024, vol. 50/07/Th., no. 50. 2024. [Online]. Available: https://www.bps.go.id/pressrelease/2023/07/17/2016/profil-kemiskinan-di-indonesia-maret-2023.html#:~:text=Jumlah penduduk miskin pada Maret,yang sebesar 7%2C53 persen.

[2] BPK, “Instruksi Presiden Republik Indonesia Nomor 4 Tahun 2022 Tentang Percepatan Penghapusan Kemiskinan Ekstrem,” Badan Pemeriksaan Keuangan, no. 146187, pp. 1–15, 2022, [Online]. Available: https://peraturan.bpk.go.id/Details/211477/inpres-no-4-tahun-2022

[3] U. Nations, The Sustainable Development Goals Report. 2023. [Online]. Available: https://unstats.un.org/sdgs/report/2023/

[4] J. Y. Kim, “Using Machine Learning to Predict Poverty Status in Costa Rican Households,” SSRN Electronic Journal, 2021, doi: 10.2139/ssrn.3971979.

[5] C. Zeng, “Poverty Prediction Using Machine Learning Approach,” no. 2020, pp. 1–6, 2022.

[6] N. Jean, M. Burke, M. Xie, W. M. Davis, D. B. Lobell, and S. Ermon, “Combining satellite imagery and machine learning to predict poverty,” Science (1979), vol. 353, no. 6301, pp. 790–794, 2016, doi: 10.1126/science.aaf7894.

[7] R. Pino-Mejías, A. Pérez-Fargallo, C. Rubio-Bellido, and J. A. Pulido-Arcas, “Artificial neural networks and linear regression prediction models for social housing allocation: Fuel Poverty Potential Risk Index,” Energy, vol. 164, pp. 627–641, Dec. 2018, doi: 10.1016/J.ENERGY.2018.09.056.

[8] K. S. Utomo, “Perbandingan Algoritma Machine Learning Untuk Penentuan Klasifikasi Kemiskinan Multidimensi Di Provinsi Nusa Tenggara Timur,” Jurnal Statistika Terapan (ISSN 2807-6214), vol. 2, no. 01, pp. 36–46, 2022.

[9] A. Setyawan, A. Fitriani, E. Rilvani, U. P. Bangsa, and K. Bekasi, “Klasifikasi Kemiskinan Di Indonesia Dengan Decision Tree Menggunakan Rapidminer,” vol. 3, no. 7, 2025.

[10] D. N. Handayani and S. Qutub, “Penerapan Random Forest Untuk Prediksi Dan Analisis Kemiskinan,” RIGGS: Journal of Artificial Intelligence and Digital Business, vol. 4, no. 2, pp. 405–412, 2025, doi: 10.31004/riggs.v4i2.512.

[11] T. Terttiaavini, A. Heryati, and T. S. Saputra, “Optimizing Socioeconomic Features for Poverty Prediction in South Sumatera,” TIERS Information Technology Journal, vol. 6, no. 1, pp. 16–32, 2025, doi: 10.38043/tiers.v6i1.6244.

[12] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Syst Appl, vol. 73, pp. 220–239, May 2017, doi: 10.1016/J.ESWA.2016.12.035.

[13] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: Unbiased boosting with categorical features,” Adv Neural Inf Process Syst, vol. 2018-Decem, no. Section 4, pp. 6638–6648, 2018.

[14] J. T. Hancock and T. M. Khoshgoftaar, “CatBoost for big data: an interdisciplinary review,” Journal of Big Data 2020 7:1, vol. 7, no. 1, pp. 94-, Nov. 2020, doi: 10.1186/S40537-020-00369-8.

[15] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 785–794, Mar. 2016, doi: 10.1145/2939672.2939785.

[16] G. Louppe, “Understanding Random Forests: From Theory to Practice,” no. July, 2015, [Online]. Available: http://arxiv.org/abs/1407.7502

[17] M. Belgiu and L. Drăgu, “Random forest in remote sensing: A review of applications and future directions,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 114, pp. 24–31, Apr. 2016, doi: 10.1016/J.ISPRSJPRS.2016.01.011.

[18] C. Cortes, V. Vapnik, and L. Saitta, “Support-vector networks,” Machine Learning 1995 20:3, vol. 20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018.

[19] S. Amari and S. Wu, “Improving support vector machine classifiers by modifying kernel functions,” Neural Networks, vol. 12, no. 6, pp. 783–789, Jul. 1999, doi: 10.1016/S0893-6080(99)00032-5.

[20] R. Kohavi and S. Edu, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” pp. 1–7, 2006, [Online]. Available: papers://5e3e5e59-48a2-47c1-b6b1-a778137d3ec1/Paper/p2015

[21] D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics 2019 21:1, vol. 21, no. 1, pp. 6-, Jan. 2020, doi: 10.1186/S12864-019-6413-7.

[22] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-Validation,” Encyclopedia of Database Systems, pp. 532–538, 2009, doi: 10.1007/978-0-387-39940-9_565.

[23] S. Ben Jabeur, C. Gharib, S. Mefteh-Wali, and W. Ben Arfi, “CatBoost model and artificial intelligence techniques for corporate failure prediction,” Technol Forecast Soc Change, vol. 166, p. 120658, May 2021, doi: 10.1016/J.TECHFORE.2021.120658.

[24] W. Bank, “The Promise of Education in Indonesia,” The Promise of Education in Indonesia, 2020, doi: 10.1596/34807.

[25] S. Alkire and M. E. Santos, “A Multidimensional Approach: Poverty Measurement & Beyond,” Social Indicators Research 2013 112:2, vol. 112, no. 2, pp. 239–257, Feb. 2013, doi: 10.1007/S11205-013-0257-3.

[26] Asian Development Bank, “Indonesia , 2020 – 2024 — Emerging Stronger,” no. September, pp. 2020–2024, 2020.

[27] S. Arbianti and Suchaina, “Peran Pendidikan dan Kesehatan dalam Mengurangi Ketimpangan dan Kemiskinan di Indonesia: Pendekatan Human Capital,” Jurnal Ekonomi-Qu, vol. 15, no. 1, pp. 54–64, 2025, [Online]. Available: http://dx.doi.org/10.35448/jequ.####

How to Cite

[1]
P. K. Hasibuan, Rezkya Nurdiana, and A. P. Harahap, Trans., “Analysis  of the Accuracy of CatBoost, Random Forest, and SVM Models in Poverty Level Classification in Indonesia”, JOSTEN, vol. 1, no. 1, pp. 31–38, Dec. 2025, Accessed: Jan. 31, 2026. [Online]. Available: https://josten.aksanesia.com/index.php/josten/article/view/10

Most read articles by the same author(s)