0% Complete
Home
/
15th International Conference on Computer and Knowledge Engineering
Transformer-Gather, Fuzzy-Reconsider: A Scalable Hybrid Framework for Entity Resolution
Authors :
Mohammadreza Sharifi
1
Danial Ahmadzadeh
2
1- Ferdowsi university of mashhad
2- other
Keywords :
Entity Resolution Problem،Natural Language Processing،Machine Learning،Approximate String Matching،Transformers،Deep Learning
Abstract :
Entity resolution plays a significant role in enterprise systems where data integrity must be rigorously maintained. Traditional methods often struggle with handling noisy data or semantic understanding, while modern methods suffer from computational costs or the excessive need for parallel computation. In this study, we introduce a scalable hybrid framework, which is designed to address several important problems, including scalability, noise robustness, and reliable results. We utilized a pre-trained language model to encode each structured data into corresponding semantic embedding vectors. Subsequently, after retrieving a semantically relevant subset of candidates, we apply a syntactic verification stage using fuzzy string matching techniques to refine classification on the unlabeled data. This approach was applied to a real-world entity resolution task, which exposed a linkage between a central user management database and numerous shared hosting server records. Compared to other methods, this approach exhibits an outstanding performance in terms of both processing time and robustness, making that a reliable solution for a server-side product. Crucially, this efficiency does not compromise results, as the system maintains a high retrieval recall of approximately 0.97. The scalability of the 'Transformer-Gather, Fuzzy-Reconsider' framework makes it deployable on standard CPU-based infrastructure, offering a practical and effective solution for enterprise-level data integrity auditing.
Papers List
List of archived papers
Chaotic multi-population ABC algorithm based on memory and levy flight for solving dynamic job shop scheduling problems
Mohammad Ali Zarif - Javad Hamidzadeh
Spatial-channel attention-based stochastic neighboring embedding pooling and long short term memory for lung nodules classification
AHMED SAIHOOD - HOSSEIN KARSHENAS - AHMADREZA NAGHSH NILCHI
REMA: Reinforced Exponential Moving Average for Real-Time Anomaly Detection in Sensor Data
Mohammad Hossein Jafari Naeimi - Ali Norouzi - Athena Abdi
Solving the influence maximization problem by using entropy and weight of edges
Farzaneh Kazemzadeh - Amir Karian - Mitra Mirzarezaee - Ali Asghar Safaei
An optimal workflow scheduling method in cloud-fog computing using three-objective Harris-Hawks algorithm
Ahmadreza Montazerolghaem - Maryam Khosravi - Fatemeh Rezaee
Plant Disease Detection Using Dynamic Knowledge Distillation and Attention Mechanism
Mohammad Ghasemi Arian - Mohammad Hossein Yaghmaee Moghaddam
Standardized ReACT Logits: An Effective Approach for Anomaly Segmentation in Self-driving Cars
Mahdi Farhadi - Seyede Mahya Hazavei - Shahriar Baradaran Shokouhi
Robat-e-Beheshti: A Persian Wake Word Detection Dataset for Robotic Purposes
Parisa Ahmadzadeh Raji - Yasser Shekofteh
Non-Functional Requirement Extracting Methods for AI-based Systems: A Survey
Reza Damirchi - Amineh Amini
MultiPath ViT OCR: A Lightweight Visual Transformer-based License Plate Optical Character Recognition
Alireza Azadbakht - Saeed Reza Kheradpisheh - Hadi Farahani
more
Samin Hamayesh - Version 43.7.0