0% Complete
Home
/
15th International Conference on Computer and Knowledge Engineering
Transformer-Gather, Fuzzy-Reconsider: A Scalable Hybrid Framework for Entity Resolution
Authors :
Mohammadreza Sharifi
1
Danial Ahmadzadeh
2
1- Ferdowsi university of mashhad
2- other
Keywords :
Entity Resolution Problem،Natural Language Processing،Machine Learning،Approximate String Matching،Transformers،Deep Learning
Abstract :
Entity resolution plays a significant role in enterprise systems where data integrity must be rigorously maintained. Traditional methods often struggle with handling noisy data or semantic understanding, while modern methods suffer from computational costs or the excessive need for parallel computation. In this study, we introduce a scalable hybrid framework, which is designed to address several important problems, including scalability, noise robustness, and reliable results. We utilized a pre-trained language model to encode each structured data into corresponding semantic embedding vectors. Subsequently, after retrieving a semantically relevant subset of candidates, we apply a syntactic verification stage using fuzzy string matching techniques to refine classification on the unlabeled data. This approach was applied to a real-world entity resolution task, which exposed a linkage between a central user management database and numerous shared hosting server records. Compared to other methods, this approach exhibits an outstanding performance in terms of both processing time and robustness, making that a reliable solution for a server-side product. Crucially, this efficiency does not compromise results, as the system maintains a high retrieval recall of approximately 0.97. The scalability of the 'Transformer-Gather, Fuzzy-Reconsider' framework makes it deployable on standard CPU-based infrastructure, offering a practical and effective solution for enterprise-level data integrity auditing.
Papers List
List of archived papers
Towards Efficient Capsule Networks through Approximate Squash Function and Layer-wise Quantization
Mohsen Raji - Kimia Soroush - Amir Ghazizadeh
Human vs NotebookLM for Educational Podcasts: A Controlled Experiment on Two General Topics
Ali Banihashemi - Amirali Shahriary - Yadollah Yaghoobzadeh
Enhanced Skin Cancer Classification Using Deep Learning and Gradient Boosting Techniques
Amir Mohammad Sharafaddini - Najme Mansouri
IranITJobs2021: a Dataset for Analyzing Iranian Online IT Job Advertisements Collected Using a New Crowdsourcing Process
Fakhroddin Noorbehbahani - Nikta Akbarpour - Mohammad Reza Saeidi
Multi Model CNN Based Gas Meter Characters Recognition
Sanaz Tarhib - Jafar Tanha - Soodabeh Imanzadeh - Sahar Hassanzadeh Mostafaei
Bridging Knowledge and Language Models in Healthcare: A RAG Survey
Seyedali Hasanzadeh - Fahimeh Ghasemian - Elham Shabaninia
An Energy-efficient Clustering Method based on Butterfly Optimization Algorithm by Considering the Criterion of Intra-cluster Distances in WSNs
Fariba Saghi Hadi S. Aghdasi
Paddy Plant Stress Identification Using Few-Shot Learning Framework
Ervin Gubin Moung - Pavindrah Naidu a/l Narayanasamy Naiidu - Maisarah Mohd Sufian - Valentino Liaw - Ali Farzamnia - Lorita Angeline
A Cloud Broker with Gap Analysis Perspective for Scheduling Multi-Workflows Across On-Demand and Reserved Resources
Negin Shafinezhad - Hamidreza Abrishami - Saeid Abrishami
Intracranial Hemorrhage Classification using CBAM Attention Module and Convolutional Neural Networks
Parnian Rahimi - Marjan Naderan - Amir Jamshidnezhad - Shahram Rafie
more
Samin Hamayesh - Version 43.7.0