Optimasi Pertanyaan Menggunakan Refined Query Dalam Sistem Tanya Jawab Kitab Hadis

  • Andy Huang Wijaya Universitas Islam Negeri Sultan Syarif Kasim
  • Nazruddin Safaat Harahap Universitas Islam Negeri Sultan Syarif Kasim
  • Muhammad Irsyad Universitas Islam Negeri Sultan Syarif Kasim
  • Febi Yanto Universitas Islam Negeri Sultan Syarif Kasim
Keywords: Refined Query, Hadis, Large Language Models, Sistem Tanya Jawab, Generative Pre-training, Transformer

Abstract

This research aims to enhance a Question-Answering System for Hadith texts by incorporating Refined Query techniques and Large Language Models (LLMs), specifically OpenAI's GPT-4. Utilizing a dataset of 62,169 Hadith from nine significant books, the study follows a comprehensive methodology that covers data collection, analysis and preprocessing, and the integration of LangChain and OpenAI's Chat Model for optimized querying. The evaluation of the system's performance was conducted through comparative analysis before and after the application of Refined Query, BERTScore for text quality, and user-based quality assessments. Results demonstrate that Refined Query significantly improves the system's capacity to produce accurate and contextually relevant responses. Implementing Refined Query not only enhanced answer precision but also facilitated the generation of responses where none were previously available. The average BERTScore of 0.80351 and the quality of user responses with an average score of 87.3% for the student test and 90.3% for the hadith expert test further validate the efficacy of the system. This research advances the domain of Islamic information systems by demonstrating the fruitful integration of advanced computational techniques with religious texts, offering a fundamental step towards better access to the understanding of Islamic jurisprudence.

References

Abdul Aziz, A., Salleh, D. M., Fadylawaty, S., Abdullah, S., & Norazmi Bin Nordin, M. (2021). Analysis Of Literature Review On Spiritual Concepts According To The Perspectives Of The Al-Quran, Hadith And Islamic Scholars. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(9), 3152–3159. https://turcomat.org/index.php/turkbilmat/article/view/4790
Ariyanto, A. D. P., fatichah, C., & Arifin, A. Z. (2021). Analisis Metode Representasi Teks Untuk Deteksi Interelasi Kitab Hadis: Systematic Literature Review. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(5), 992–1000. https://doi.org/10.29207/RESTI.V5I5.3499
Bansal, A., Eberhart, Z., Wu, L., & Mcmillan, C. (2021). A Neural Question Answering System for Basic Questions about Subroutines. ArXiv Preprint. arXiv:2101.03999v1
Dao, X.-Q. (2023). Performance Comparison of Large Language Models on VNHSGE English Dataset: OpenAI ChatGPT, Microsoft Bing Chat, and Google Bard. ArXiv Preprint. arXiv:2307.02288
Feng, F., Yang, Y., Cer, D., Arivazhagan, N., & Wang, W. (2022). Language-agnostic BERT Sentence Embedding. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1, 878–891. https://doi.org/10.18653/V1/2022.ACL-LONG.62
González-Santamarta, M. A., Rodríguez-Lera, F. J., Angel, ´, Guerrero-Higueras, M., & Matellán-Olivera, V. (2023). Integration of Large Language Models within Cognitive Architectures for Autonomous Robots. https://arxiv.org/abs/2309.14945v1
Huang, X., Zhang, J., Li, D., & Li, P. (2019). Knowledge graph embedding based question answering. WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining, 105–113. https://doi.org/10.1145/3289600.3290956
Huo, S., Arabzadeh, N., & Clarke, C. L. A. (2023). Retrieving Supporting Evidence for LLMs Generated Answers. ArXiv Preprint. https://arxiv.org/abs/2306.13781v1
Jeong, C. (2023). A Study on the Implementation of Generative AI Services Using an Enterprise Data-Based LLM Application Architecture. Advances in Artificial Intelligence and Machine Learning, 03(04), 1588–1618. https://doi.org/10.54364/AAIML.2023.1191
Kim, R., Research, A. G., Webster, K., Research, G., Collins, M., & Narayan, S. (2022). Query Refinement Prompts for Closed-Book Long-Form Question Answering. ArXiv Preprint. arXiv:2210.17525v1
Maraoui, H., Haddar, K., & Romary, L. (2021). Arabic Factoid Question-Answering System for Islamic Sciences Using Normalized Corpora. Procedia Computer Science, 192, 69–79. DOI : 10.1016/J.PROCS.2021.08.008
Muennighoff, N. (2022). SGPT: GPT Sentence Embeddings for Semantic Search. ArXiv Preprint. https://arxiv.org/abs/2202.08904v5
Ni’mah, A. T., & Arifin, A. Z. (2020). Perbandingan Metode Term Weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis. Rekayasa, 13(2), 172–180. https://doi.org/10.21107/REKAYASA.V13I2.6412
Rosdi, A. Z., Hassan, S. N. S., Muhamad, N. A. F., Zainuzi, N. I. H. M., & Mahfuz, M. S. (2023). Panduan Asas Kaedah Kenal Pasti Status Hadis: Kajian Diskriptif Penggunaan Ensiklopedia Hadis 9 Imam. Journal Of Hadith Studies, 46–54. https://doi.org/10.33102/JOHS.V8I1.225
Saxena, A., Tripathi, A., & Talukdar, P. (2020). Improving Multi-hop Question Answering over Knowledge Graphs Using Knowledge Base Embeddings. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 4498–4507. https://doi.org/10.18653/V1/2020.ACL-MAIN.412
Supriyadi, T., Julia, J., Aeni, A., Learning, E. S.-I. J. of, & 2020, undefined. (2020). Action Research in Hadith Literacy: A Reflection of Hadith Learning in The Digital Age. Ijlter.Net, 19(5), 99–124. https://doi.org/10.26803/ijlter.19.5.6
Topsakal, O., & Akinci, T. C. (2023). Creating Large Language Model Applications Utilizing LangChain: A Primer on Developing LLM Apps Fast. All Sciences Proceedings. https://doi.org/10.59287/icaens.1127
Wang, J., Yi, X., Guo, R., Jin, H., Xu, P., Li, S., Wang, X., Guo, X., Li, C., Xu, X., Yu, K., Yuan, Y., Zou, Y., Long, J., Cai, Y., Li, Z., Zhang, Z., Mo, Y., Gu, J., … Xie, C. (2021). Milvus: A Purpose-Built Vector Data Management System. Proceedings of the ACM SIGMOD International Conference on Management of Data, 2614–2627. https://doi.org/10.1145/3448016.3457550
Wang, Y., Ma, X., & Chen, W. (2023). Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering. ArXiv Preprint. https://arxiv.org/abs/2309.02233v1
Yu, C.-W., Chuang, Y.-S., Lotsos, A. N., & Haase, C. M. (2023). Decoding Affect in Dyadic Conversations: Leveraging Semantic Similarity through Sentence Embedding. ArXiv Preprint. arXiv:2309.12646v1
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating Text Generation with BERT. 8th International Conference on Learning Representations, ICLR 2020. https://arxiv.org/abs/1904.09675v3
Published
2024-06-20
How to Cite
Wijaya, A. H., Harahap, N. S., Irsyad , M., & Yanto, F. (2024). Optimasi Pertanyaan Menggunakan Refined Query Dalam Sistem Tanya Jawab Kitab Hadis. SATIN - Sains Dan Teknologi Informasi, 10(1), 56-69. https://doi.org/10.33372/stn.v10i1.1116