0% Complete
Home
/
15th International Conference on Computer and Knowledge Engineering
Transformer-Gather, Fuzzy-Reconsider: A Scalable Hybrid Framework for Entity Resolution
Authors :
Mohammadreza Sharifi
1
Danial Ahmadzadeh
2
1- Ferdowsi university of mashhad
2- other
Keywords :
Entity Resolution Problem،Natural Language Processing،Machine Learning،Approximate String Matching،Transformers،Deep Learning
Abstract :
Entity resolution plays a significant role in enterprise systems where data integrity must be rigorously maintained. Traditional methods often struggle with handling noisy data or semantic understanding, while modern methods suffer from computational costs or the excessive need for parallel computation. In this study, we introduce a scalable hybrid framework, which is designed to address several important problems, including scalability, noise robustness, and reliable results. We utilized a pre-trained language model to encode each structured data into corresponding semantic embedding vectors. Subsequently, after retrieving a semantically relevant subset of candidates, we apply a syntactic verification stage using fuzzy string matching techniques to refine classification on the unlabeled data. This approach was applied to a real-world entity resolution task, which exposed a linkage between a central user management database and numerous shared hosting server records. Compared to other methods, this approach exhibits an outstanding performance in terms of both processing time and robustness, making that a reliable solution for a server-side product. Crucially, this efficiency does not compromise results, as the system maintains a high retrieval recall of approximately 0.97. The scalability of the 'Transformer-Gather, Fuzzy-Reconsider' framework makes it deployable on standard CPU-based infrastructure, offering a practical and effective solution for enterprise-level data integrity auditing.
Papers List
List of archived papers
Dynamic Hand Gesture Recognition with 2DCNN-LSTM and Improved Keyframe Extraction
Narjes Heidari - Javid Norouzi - Mohammad Sadegh Helfroush - Habibollah Danyal
Delay Optimization of a Federated Learning-based UAV-aided IoT network
Hossein Mohammadi Firouzjaei - Javad Zeraatkar Moghaddam - Mehrdad Ardebilipour
Graph Attention Networks for Modeling Multi-Sensor Relationships in Early Prediction of Critical Events in ICU Patients
Amir Akhavan Saffar - Danial Eskandari Faruji - Javad Hamidzadeh
Virtual machine consolidation using SLA-aware genetic algorithm placement for data centers with non-stationary workloads
Hossein Monshizadeh Naeen
Damage Detection After the Earthquake Using Sentinel-1 and 2 Images and Machine Learning Algorithms (Case Study: Sarpol-e Zahab Earthquake)
Niloofar Alizadeh - Behnam Asghari Beirami - Mehdi Mokhtarzade
Enhancing Persian Word Sense Disambiguation with Large Language Models: Techniques and Applications
Fatemeh Zahra Arshia - Saeedeh Sadat Sadidpour
Class-Aware Balanced Point Cloud Donwsampling for Efficient Large-Scale 3D Scene Understanding
Mohammad Yousefipour - Marjan Naderan - Morteza Jaderyan
Deep Inside Tor: Exploring Website Fingerprinting Attacks on Tor Traffic in Realistic Settings
Amirhossein Khajehpour - Farid Zandi - Navid Malekghaini - Mahdi Hemmatyar - Naeimeh Omidvar - Mahdi Jafari Siavoshani
Deep Deterministic Policy Gradient in Acoustic To Articulatory inversion
Farzane Abdoli - Hamid Sheikhzade - Vahid Pourahmadi
LPCNet: Lane detection by lane points correction network in challenging environments based on deep learning
Sina BaniasadAzad - Seyed Mohammadreza Mousavi mirkolaei
more
Samin Hamayesh - Version 43.7.0