0% Complete
Home
/
15th International Conference on Computer and Knowledge Engineering
Transformer-Gather, Fuzzy-Reconsider: A Scalable Hybrid Framework for Entity Resolution
Authors :
Mohammadreza Sharifi
1
Danial Ahmadzadeh
2
1- Ferdowsi university of mashhad
2- other
Keywords :
Entity Resolution Problem،Natural Language Processing،Machine Learning،Approximate String Matching،Transformers،Deep Learning
Abstract :
Entity resolution plays a significant role in enterprise systems where data integrity must be rigorously maintained. Traditional methods often struggle with handling noisy data or semantic understanding, while modern methods suffer from computational costs or the excessive need for parallel computation. In this study, we introduce a scalable hybrid framework, which is designed to address several important problems, including scalability, noise robustness, and reliable results. We utilized a pre-trained language model to encode each structured data into corresponding semantic embedding vectors. Subsequently, after retrieving a semantically relevant subset of candidates, we apply a syntactic verification stage using fuzzy string matching techniques to refine classification on the unlabeled data. This approach was applied to a real-world entity resolution task, which exposed a linkage between a central user management database and numerous shared hosting server records. Compared to other methods, this approach exhibits an outstanding performance in terms of both processing time and robustness, making that a reliable solution for a server-side product. Crucially, this efficiency does not compromise results, as the system maintains a high retrieval recall of approximately 0.97. The scalability of the 'Transformer-Gather, Fuzzy-Reconsider' framework makes it deployable on standard CPU-based infrastructure, offering a practical and effective solution for enterprise-level data integrity auditing.
Papers List
List of archived papers
Overview of Electric Vehicles Charging Stations in Smart Grids
Mohammed Wadi - Wisam Elmasry - Mohammed Jouda - Hossein Shahinzadeh - Gevork B. Gharehpetian
Fast and Accurate Motif Discovery in Protein Sequences Using Parallel Processing with OpenMP
Rahele Mohammadi - Mahmoud Naghibzadeh - Abdorreza Savadi
A 2D-CNN Architecture for Improving the Classification Accuracy of an Electronic Nose with Different Sensor Positions
Hannaneh Mahdavi - Reza Goldoust - Saeideh Rahbarpour
A Survey of the AVOA Metaheuristic Algorithm and its Suitability for Power System Optimization and Damping Controller Design
Aliyu Sabo - Theophilus Ebuka Odoh - Samuel Habu - Hossien Shahinzadeh - Farshad Ebrahimi
Simulation-Based Data Augmentation for Apple Leaf Disease Using Statistical Moments and HSV Color Features
Seyedeh Maryam Moosavi - Morteza Gholipour - Yasser Baleghi
SingAll: Scalable Control Flow Checking for Multi-Process Embedded Systems
Mehdi Amininasab - Ahmad Patooghy - Mahdi Fazeli
Farsi Text in Scene: A new dataset
Ali Salmasi - Ehsanollah Kabir
Binary Classification of Capuchin Bird Calls via Spectrogram-Enhanced Frequency-Aware Convolutional Neural Networks
Samad Najjar-Ghabel - Shamim Yousefi - Reza Danandeh Bileh Savar
AgeNet-AT: An End-to-End Model for Robust Joint Speaker Age Estimation and Gender Recognition Based on Attention Mechanism and Titanet
Mahsa Zamani Tarashandeh - Amirhossein Torkanloo - Mohammad Hossein Moattar
Towards Low-Overhead Mitigation of Trojan Bit-Flip Attacks on DNNs via Causal Inference
Bahare Gholami - Mohsen Raji
more
Samin Hamayesh - Version 43.7.0