0% Complete
Home
/
15th International Conference on Computer and Knowledge Engineering
A Hybrid Architecture to Optimize Persian FAQ Retrieval using Semantic Similarity Search
Authors :
Seyed Amir Mohammad Hosseini
1
Fatemeh Dehbashi
2
Setare Kahnemuee
3
Mohsen Kahani
4
Morteza Fardin
5
1- Computer Engineering Department Ferdowsi University of Mashhad
2- Computer Engineering Department Ferdowsi University of Mashhad
3- Computer Engineering Department Ferdowsi University of Mashhad
4- Computer Engineering Department Ferdowsi University of Mashhad
5- Baran Software Group
Keywords :
Persian Language Processing،FAQ Retrieval،Hybrid Clustering،Sentence Embeddings،Semantic Search،Evaluation Framework
Abstract :
Traditional FAQ-based Question Answering (QA) systems, which rely on lexical matching, often fail to comprehend the semantic nuances of morphologically rich languages like Persian. Furthermore, their performance is typically measured by rigid metrics that underestimate a model's true capabilities. This paper addresses these challenges by proposing a novel, two-stage Hybrid Clustering-Based Search architecture designed to improve both the accuracy and efficiency of semantic retrieval. We introduce an iterative "Hierarchical DBSCAN" method to cluster a real-world Persian knowledge base, allowing for a focused, coarse-to-fine search pipeline. To robustly evaluate our system, we created a new multi-faceted dataset containing formal, informal, and challenging queries. Our experiments show that the proposed hybrid architecture, achieves a Top-1 accuracy of 79% and a Top-3 accuracy of 86% and also answering delay of only 0.115 seconds. This represents an improvement in both accuracy and speed compared to the optimal E5-Base model in the standard Direct Semantic Search baseline. Our work provides an efficient and validated blueprint for developing practical semantic QA systems for the Persian language.
Papers List
List of archived papers
DPRNN-FORMER: AN EFFICIENT WAY TO DEAL WITH BLIND SOURCE SEPARATION
Ramin Ghorbani - Sajad Haghzad Klidbary
Improving Motor Imagery Classification in BCI Systems Using EMD and Multi-Layer CNNs
Reza Arghand - Ali Chaibakhsh - Moein Radman
Joint mobility-aware offloading and UAV position optimization in Blockchain-enabled 5G
Zeinab Rabbani - Zeinab Movahedi
Leveraging the Power of Object Detection Models in Identifying Litter for a Significant Reduction in Environmental Pollution
Lim Zhen Xian - Ervin Gubin Moung - Jason Teo Tze Wi - Nordin Saad - Farashazillah Yahya - Tiong Lin Rui - Ali Farzamnia
Maximum diffusion of news in social media with the approach of reducing the search space
Masoud Karian
Blind image quality assessment based on Multi-resolution Local Structures
Seyed Majid Khorashadizadeh - Mehdi Sadeghi Bakhi - Fatemeh Seifishahpar - AliMohammad Latif
IR-LPR: Large Scale of Iranian License Plate Recognition Dataset
Mahdi Rahmani - Melika Sabaghian - Seyyedeh Mahila Moghadami - Mohammad Mohsen Talaie - Mahdi Naghibi - Mohammad Ali Keyvanrad
Islamic Geometric algorithms: A survey
Elham Akbari - Azam Bastanfard
Bridging Knowledge and Language Models in Healthcare: A RAG Survey
Seyedali Hasanzadeh - Fahimeh Ghasemian - Elham Shabaninia
A Federated Learning-Based Hybrid Deep Learning Framework for Enhanced Human Activity Recognition
Jamileh Azmoudeh - Sajjad Arghaee - Parisa Valizadeh - Samaneh Dandani - Iman Havangi - Mohammad Hossein Yaghmaee
more
Samin Hamayesh - Version 43.7.0