Machine Learning Analytics for Predicting Tax Revenue Potential

Main Article Content

Raden David Febriminanto Meditya Wasesa

Abstract

In line with rapid business process digitalization in the Directorate General of Taxes, the size of the data stored in the institution has grown exponentially. However, there is a problem with generating value out of the valuable data assets. Correspondingly, this research provides machine-learning-based predictive analytics as a solution to the question of how to use taxpayers' trigger data as a decision support system to discover and realize unexplored tax potential. More specifically, this research presents predictive analytics models that can accurately predict which potential taxpayers are likely to pay their due. We developed three machine learning models: logistic regression, random forest, and decision tree. We analyzed 5,562 tax revenue potential data samples with eight predictors: trigger data nominal value, distance to tax office, type of taxpayer, media of tax report, type of tax, report status, registered year of taxpayer, and area coverage. Our study shows that the random forest model provided the best prediction performance. The resultant weight of each attribute indicated that the status of the tax report was the top tier of variable importance in predicting tax revenue potential. The analytics can help tax officers determine potential taxpayers with the highest likelihood to pay their due. Given the size of the data records, this approach can provide tax administrators with a powerful tool to increase work efficiency, combat tax evasion, and provide better customer service.

Article Details

How to Cite
Febriminanto, R., & Wasesa, M. (2022). Machine Learning Analytics for Predicting Tax Revenue Potential. Indonesian Treasury Review: Jurnal Perbendaharaan, Keuangan Negara Dan Kebijakan Publik, 7(3), 193-205. https://doi.org/https://doi.org/10.33105/itrev.v7i3.497
Section
Articles

References

Alarie, B., Niblett, A., & Yoon, A. H. (2016). Using Machine Learning to Predict Outcomes in Tax Law, , no. 1–23. University of Toronto, Faculty of Law.
Andrejovská, A., & Puliková, V. (2018). Tax revenues in the context of economic determinants. Montenegrin Journal of Economics, 14(1), 133–141. https://doi.org/10.14254/1800-5845/2018.14-1.10.
Breiman, L., (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:101093340432.
Brender A., & Navon, G. (2010). Predicting Government Tax Revenues and Analyzing Forecast Uncertainty. Israel Economic Review, Bank of Israel,7(2), 81-111.
Brown, I., Brown, I., & Mues, C. (2017). An experimental comparison of classification algorithms for imbalanced credit scoring data sets expert systems with applications. Expert Systems With Applications, 39(3), 3446–3453. https://doi.org/10.1016/j.eswa.2011.09.033.
Cezar, A., & Lozano, G. (2020). Tax crime prediction with machine learning : A case study in the municipality of são paulo. Proceedings of the 22nd International Conference on Enterprise Information Systems, 1: ICEIS, 452–459. https://doi.org/10.5220/0009564704520459.
Hariani A. (2021). WP terdaftar naik 20 kali lipat di 20 tahun terakhir. Retrieved December 13, 2021, from https://www.pajak.com/pajak/wp-terdaftar-naik-20-kali-lipat-di-20-tahun-terakhir/.
Hassan, M. S., Mahmood, H., Tahir, M. N., Yousef Alkhateeb, T. T., & Wajid, A. (2021). Governance: A source to increase tax revenue in Pakistan. Complexity, 6663536. https://doi.org/10.1155/2021/6663536.
Izquierdo-Verdiguier, E., & Zurita-Milla, R. (2020). An evaluation of guided regularized random forest for classification and regression tasks in remote sensing. International Journal of Applied Earth Observation and Geoinformation, 88 (October 2019), 102051. https://doi.org/10.1016/j.jag.2020.102051.
Javid, A. Y., & Arif, U. (2012). Analysis of revenue potential and revenue effort in developing asian countries. Winter, 365–380.
Kohavi, R. & Provost, F. (1998). Glossary of terms. in glossary of terms. machine learning—Special issue on applications of machine learning and the knowledge discovery process. Machine Learning, 30, 271–274. https://doi.org/10.1177/1403494813515131.
Lismont, J., Cardinaels, E., Bruynseels, L., De Groote, S., Baesens, B., Lemahieu, W., & Vanthienen, J. (2018). Predicting tax avoidance by means of social network analytics. Decision Support Systems, 108, 13–24. https://doi.org/10.1016/j.dss.2018.02.001.
Mohankumar M., Amuthakkani S. & Jeyamala G. (2016). Comparative analysis of decision tree algorithms for the prediction of eligibility of a man for availing bank loan. International Journal of Advanced Research in Biology Engineering Science and Technology, 2(15), 360–366.
Nusinovici, S., Tham, Y. C., Chak Yan, M. Y., Wei Ting, D. S., Li, J., Sabanayagam, C., Wong, T. Y., & Cheng, C. Y. (2020). Logistic regression was as good as machine learning for predicting major chronic diseases. Journal of Clinical Epidemiology, 122, 56–69. https://doi.org/10.1016/j.jclinepi.2020.03.002.
Ogneru, V. (2019). Analysis of the relationship between tax revenue and gross value added in the Romanian economy. Financial Studies, 23 (2(84)), 37–55. http://hdl.handle.net/10419/231676.
Peng, C. Y. J., Lee, K. L., & Ingersoll, G. M. (2002). An introduction to logistic regression analysis and reporting. Journal of Educational Research, 96(1), 3–14. https://doi.org/10.1080/00220670209598786.
Petutschnig, M. (2017). Future orientation and taxes: Evidence from big data. Journal of International Accounting, Auditing and Taxation, 29, 14–31. https://doi.org/10.1016/j.intaccaudtax.2017.03.003.
Sapiei, N. S., Kasipillai, J., & Eze, U. C. (2014). Determinants of tax compliance behaviour of corporate taxpayers in malaysia. EJournal of Tax Research, 12(2), 383–409. https://www.scopus.com/inward/record.uri?eid=2-s2.0-84920280986&partnerID=40&md5=08271e486a55d4b29dc37779388fe01c.
Shmueli, G., & Koppius, O. R. (2011). Predictive analytics in information systems research. MIS Quarterly: Management Information Systems, 35(3), 553–572. https://doi.org/10.2307/23042796.
Strømme, Ø. (2018). Increased compliance and efficiency with machine learning, (Issue June, 2018), 50-52, Budapest, General Assembly of IOTA, WWW.IOTA-TAX.ORG
Tarfa, G.E. ; Tarekegn, G & Yosef, B. (2020). Effects of tax audit on revenue generation. Journal of International Trade, Logistics and Law, 6, 65–74.
Vrigazova, B. (2021). The proportion for splitting data into training and test set for the bootstrap in classification problems. Business Systems Research, 12(1), 228–242. https://doi.org/10.2478/bsrj-2021-0015.