0% Complete
Home
/
15th International Conference on Computer and Knowledge Engineering
Transformer-Gather, Fuzzy-Reconsider: A Scalable Hybrid Framework for Entity Resolution
Authors :
Mohammadreza Sharifi
1
Danial Ahmadzadeh
2
1- Ferdowsi university of mashhad
2- other
Keywords :
Entity Resolution Problem،Natural Language Processing،Machine Learning،Approximate String Matching،Transformers،Deep Learning
Abstract :
Entity resolution plays a significant role in enterprise systems where data integrity must be rigorously maintained. Traditional methods often struggle with handling noisy data or semantic understanding, while modern methods suffer from computational costs or the excessive need for parallel computation. In this study, we introduce a scalable hybrid framework, which is designed to address several important problems, including scalability, noise robustness, and reliable results. We utilized a pre-trained language model to encode each structured data into corresponding semantic embedding vectors. Subsequently, after retrieving a semantically relevant subset of candidates, we apply a syntactic verification stage using fuzzy string matching techniques to refine classification on the unlabeled data. This approach was applied to a real-world entity resolution task, which exposed a linkage between a central user management database and numerous shared hosting server records. Compared to other methods, this approach exhibits an outstanding performance in terms of both processing time and robustness, making that a reliable solution for a server-side product. Crucially, this efficiency does not compromise results, as the system maintains a high retrieval recall of approximately 0.97. The scalability of the 'Transformer-Gather, Fuzzy-Reconsider' framework makes it deployable on standard CPU-based infrastructure, offering a practical and effective solution for enterprise-level data integrity auditing.
Papers List
List of archived papers
Collaborative LLM Reasoning for Vulnerability Detection in Smart Contracts
Amirreza Samari - Parsa Hedayatnia - Seyyed Javad Bozorgzadeh Razavi - Mohammad Allahbakhsh - Haleh Amintoosi
Persis: A Persian Font Recognition Pipeline Using Convolutional Neural Networks
Mehrdad Mohammadian - Neda Maleki - Tobias Olsson - Fredrik Ahlgren
Assessing Users' Influence on Respondents in Conversation Quality: A Quantitative Study on Reddit Based on the Cooperative Principle
Afsaneh Habibi - Fattaneh Taghiyareh
Real-Time Gender Recognition with a Deep Neural Network
Samad Azimi Abriz - Majid Meghdadi
A Review on Machine Learning Methods for Workload Prediction in Cloud Computing
Mohammad Yekta - Hadi Shahriar Shahhoseini
Crack Segmentation in Civil Structure Images Using a Deep Learning Based Multi-Classifier System
Mohammadreza Asadi - Seyedeh Sogand Hashemi - Mohammad Taghi Sadeghi
A Novel Deformable Registration Method for Cerebral Magnetic Resonance Images
Bahareh Asadpour Dasht Bayaz - Mahdi Saadatmand - Fabrice Wallois
Blind image quality assessment based on Multi-resolution Local Structures
Seyed Majid Khorashadizadeh - Mehdi Sadeghi Bakhi - Fatemeh Seifishahpar - AliMohammad Latif
Synthetic Trajectory Sharing Indoors under Privacy Constraints
Mahdi Soltanpour - Vahideh Moghtadaiee - Mina Alishahi
Dynamic Knowledge Enhanced Neural Fashion Trend Forecasting with Quantile Loss
Fatemeh Rooholamini - Reza Azmi - Mobina Khademhossein - Maral Zarvani
more
Samin Hamayesh - Version 43.7.0