Comparative Evaluation of Rule-Based and Large Language Models for Financial Transaction Extraction in Chatbots

Authors

  • Verdymas Atma Yulianto Universitas Dian Nuswantoro
  • Erba Lutfina
  • Galuh Wilujeng Saraswati

DOI:

10.33395/sinkron.v10i2.16020

Keywords:

chatbot, financial management, messaging applications, natural language processing, prototyping method

Abstract

The increasing use of digital financial services requires tools that allow users to record and manage personal financial transactions efficiently. However, many conventional applications still rely on form-based interfaces that may reduce user engagement in daily financial recording. This study evaluates a conversational personal financial recording system that uses natural language processing to convert informal user messages into structured transaction data. The system was developed using the prototyping method and implemented through a messaging-based interface, a backend service, and a natural language processing module. The evaluation used a dataset of 300 conversational financial messages annotated with intent, amount, and category. The study compares a rule-based baseline with two open-source large language models, using intent accuracy, entity extraction metrics, output validity, user acceptance testing, and the system usability scale. The results show that open-source large language models achieved the best performance across the natural language processing evaluation, with strong intent classification, high entity extraction quality, and complete output validity. The user acceptance testing involving 30 participants produced an average success rate of 97.3%, while the system usability scale score reached 82.5, indicating excellent usability. These findings suggest that prompt-constrained large language models can improve conversational financial recording by providing reliable structured extraction and an accessible user experience.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Alda, M. (2023). Pengembangan Aplikasi Pengolahan Data Siswa Berbasis Android Menggunakan Metode Prototyping. Jurnal Manajemen Informatika (JAMIKA), 13(1), 11–23. https://doi.org/10.34010/jamika.v13i1.8216

Alfath, M. F., Fanani, L., & Kharisma, A. P. (2024). Pengembangan Aplikasi Berlatih Membaca Cepat Berbahasa Inggris Berbasis Progressive Web App dengan Metode Prototyping. Jurnal Teknologi Informasi Dan Ilmu Komputer, 11(5), 1001–1008. https://doi.org/10.25126/jtiik.2024117982

Annisa, Z. A., Perdana, R. S., & Adikara, P. P. (2024). Kombinasi Intent Classification dan Named Entity Recognition pada Data Berbahasa Indonesia dengan Metode Dual Intent and Entity Transformer. Jurnal Teknologi Informasi Dan Ilmu Komputer, 11(5), 1017–1024. https://doi.org/10.25126/jtiik.2024117985

Bai, Y., Gong, J., Han, M., & Yang, J. (2025). The Financial Institution Text Data Mining and Value Analysis Model Based on Big Data and Natural Language Processing. Journal of Organizational and End User Computing, 37(1). https://doi.org/10.4018/JOEUC.374213

Batura, T., Yerimbetova, A., Mukazhanov, N., Shvarts, N., Sakenov, B., & Turdalyuly, M. (2025). Information Extraction from Multi-Domain Scientific Documents: Methods and Insights. Applied Sciences (Switzerland), 15(16). https://doi.org/10.3390/app15169086

Benayas, A., Miguel-Ángel, S., & Mora-Cantallops, M. (2024). Enhancing Intent Classifier Training with Large Language Model-generated Data. Applied Artificial Intelligence, 38(1). https://doi.org/10.1080/08839514.2024.2414483

Chandrakala, C. B., Bhardwaj, R., & Pujari, C. (2024). An intent recognition pipeline for conversational AI. International Journal of Information Technology (Singapore), 16(2), 731–743. https://doi.org/10.1007/s41870-023-01642-8

Chen, Z., Ma, D., Li, H., Chen, L., Ji, J., Liu, Y., Chen, B., Wu, M., Zhu, S., Dong, X., Ge, F., Miao, Q., Lou, J. G., Fan, S., & Yu, K. (2025). DFM: Dialogue foundation model for universal large-scale dialogue-oriented task learning. AI Open, 6, 108–117. https://doi.org/10.1016/j.aiopen.2025.04.001

Dagdelen, J., Dunn, A., Lee, S., Walker, N., Rosen, A. S., Ceder, G., Persson, K. A., & Jain, A. (2024). Structured information extraction from scientific text with large language models. Nature Communications, 15(1). https://doi.org/10.1038/s41467-024-45563-x

Dave, E., & Chowanda, A. (2024). IPerFEX-2023: Indonesian personal financial entity extraction using indoBERT-BiGRU-CRF model. Journal of Big Data, 11(1). https://doi.org/10.1186/s40537-024-00987-6

Dietrich, J., & Hollstein, A. (2025). Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments. Drug Safety, 48(3), 287–303. https://doi.org/10.1007/s40264-024-01499-1

Dwi Astuti, M., & Soleha, E. (2023). Pengaruh Literasi Keuangan, Inklusi Keuangan Dan Locus of Control Terhadap Pengelolaan Keuangan UMKM di Kecamatan Bojongmangu. JURNAL EKONOMI PENDIDIKAN DAN KEWIRAUSAHAAN, 11(1), 51–64. https://doi.org/10.26740/jepk.v11n1.p51-64

Eriana, E. S., & Subariah, R. (2025). Implementation of Natural Language Processing Based Chatbot as a Virtual Assistant in Science Learning. Jurnal Penelitian Pendidikan IPA, 11(10), 633–640. https://doi.org/10.29303/jppipa.v11i10.12747

Ferrera, A., Mezzotero, G., & Ursino, D. (2025). A linguistics-based approach to refining automatic intent detection in conversational agent design. Information Sciences, 689. https://doi.org/10.1016/j.ins.2024.121493

Mahastanti, L., & Utoyo, D. R. R. (2022). Pengaruh Payment Gateway (GO-PAY) Terhadap Kinerja Finansial UMKM di Kota Salatiga. JURNAL EKONOMI PENDIDIKAN DAN KEWIRAUSAHAAN, 10(2), 105–116. https://doi.org/10.26740/jepk.v10n2.p105-116

Otoritas Jasa Keuangan, & Badan Pusat Statistik. (2025). Survei Nasional Literasi dan Inklusi Keuangan (SNLIK) 2025. Otoritas Jasa Keuangan (OJK) dan Badan Pusat Statistik (BPS).

Ouaddi, C., Benaddi, L., Bouziane, E. mahi, Naimi, L., Rahouti, M., Jakimi, A., & Saadane, R. (2025). Assessing the effectiveness of large language models for intent detection in tourism chatbots: A comparative analysis and performance evaluation. Scientific African, 28. https://doi.org/10.1016/j.sciaf.2025.e02649

Pressman, R. S. (2010). Software Engineering: A Practitioner’s Approach (7th ed.). McGraw-Hill Higher Education.

Puspitasari, A., Paradhita, A. N., Tineka, Y. W., Sulistyowati, V., Noriska, N. K. S., & Haryanto. (2024). Natural Language Processing (NLP) Technology for Chatbot Website. Jurnal Penelitian Pendidikan IPA, 10(SpecialIssue), 319–324. https://doi.org/10.29303/jppipa.v10ispecialissue.8241

Putri Oktavianita, R., & Andreas Sutanto, F. (2024). Rekomendasi Pemilihan Hotel Berbasis Chatbot dengan Framework Rasa Dengan Metode Natural Language Processing (NLP). Jurnal Riset Sistem Informasi Dan Teknik Informatika (JURASIK), 9(2), 634–641. https://doi.org/10.30645/jurasik.v9i2.795.g770

Rachmawati, F. F., Sudarno, S., & Sabandi, M. (2023). Pengaruh Literasi Keuangan dan Lingkungan Sosial Dimoderasi Tingkat Pendidikan Terhadap Penggunaan QRIS Pada Pelaku UMKM di Kota Surakarta. JURNAL EKONOMI PENDIDIKAN DAN KEWIRAUSAHAAN, 11(1), 21–36. https://doi.org/10.26740/jepk.v11n1.p21-36

Ricaldi, L. C., Martin, T. K., & Huston, S. J. (2022). Financial literacy and its impact on the credit card debt puzzle. Financial Services Review, 30(2), 107–124. https://doi.org/10.61190/fsr.v30i2.3477

Sasmita, W. M. H., Sumpeno, S., & Rachmadi, R. F. (2025). Improving Government Helpdesk Service With an AI-Powered Chatbot Built on the Rasa Framework. Jurnal RESTI, 9(2), 393–403. https://doi.org/10.29207/resti.v9i2.6293

Sezgin, E., Kocaballi, A. B., Dolce, M., Skeens, M., Militello, L., Huang, Y., Stevens, J., & Kemper, A. R. (2024). Chatbot for Social Need Screening and Resource Sharing With Vulnerable Families: Iterative Design and Evaluation Study. JMIR Human Factors, 11. https://doi.org/10.2196/57114

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002

Tandrio, F., & Fianty, M. I. (2026). Web-Based Payroll System Development Using The Prototyping Method and Structured Database Design. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 11(3), 851–863. https://doi.org/10.33480/jitk.v11i3.7044

Widiyanti, N. F., Sukmana, H. T., Hulliyah, K., Khairani, D., & Oh, L. K. (2023). Improving Indonesian Named Entity Recognition for Domain Zakat Using Conditional Random Fields. Jurnal Online Informatika, 8(2), 131–138. https://doi.org/10.15575/join.v8i2.898

Wildannissa Pinasti, & Lya Hulliyyatus Suadaa. (2024). Named Entity Recognition in Statistical Dataset Search Queries. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 13(3), 171–177. https://doi.org/10.22146/jnteti.v13i3.11580

Wobst, J., Röttger, P., & Lueg, R. (2025). Measuring value-based management using natural language processing. Management Accounting Research, 67. https://doi.org/10.1016/j.mar.2025.100946

Yan, Z., Ye, Z., Ge, J., Qin, J., Liu, J., Cheng, Y., & Gurrin, C. (2025). DocExtractNet: A novel framework for enhanced information extraction from business documents. Information Processing and Management, 62(3). https://doi.org/10.1016/j.ipm.2024.104046

Zahra, A., Hidayatullah, A. F., & Rani, S. (2022). Bidirectional long-short term memory and conditional random field for tourism named entity recognition. IAES International Journal of Artificial Intelligence, 11(4), 1270–1277. https://doi.org/10.11591/ijai.v11.i4.pp1270-1277

Downloads


Crossmark Updates

How to Cite

Yulianto, V. A., Erba Lutfina, & Galuh Wilujeng Saraswati. (2026). Comparative Evaluation of Rule-Based and Large Language Models for Financial Transaction Extraction in Chatbots. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(2), 1207-1219. https://doi.org/10.33395/sinkron.v10i2.16020