0% Complete
Home
/
15th International Conference on Computer and Knowledge Engineering
ParsHomo: A T5-Powered Approach to High-Precision Persian Homograph Disambiguation
Authors :
Hasan Jalali
1
Taha Mohaddesi
2
1- School of Electrical and Computer Engineering College of Engineering, University of Tehran
2- ASMIN AI Research Center
Keywords :
Homograph Disambiguation،Persian Language Processing،Grapheme-to-Phoneme Conversion،T5 Transformer،Natural Language Processing،Text-to-Speech
Abstract :
This paper presents ParsHomo, a high-precision model for Persian homograph disambiguation using fine-tuned T5-based architectures (mT5 and ByT5). Starting from the HomoRich dataset, we manually curated and refined the human-labeled portion to ensure exceptional data quality. The cleaned dataset was used to fine-tune T5 models, leading to substantial improvements over previous systems. We evaluated various input formatting strategies—including question-style, label-based, and separator-based approaches—and used the trained model to relabel large-scale datasets such as HomoRich-GPT4o and GE2PE, resulting in a bootstrapped corpus of over 4.19 million sentences. ParsHomo offers the most accurate and scalable homograph disambiguation system for Persian to date, demonstrating the effectiveness of transformer-based models for contextaware phoneme generation in low-resource languages. Our bestperforming model (mT5 with label-based format) achieved an accuracy of 94.55% and a Character Error Rate of 1.71%, outperforming prior approaches by a wide margin.
Papers List
List of archived papers
Maximum diffusion of news in social media with the approach of reducing the search space
Masoud Karian
Adaptive Sliding Window Optimization for Multi-Dimensional Data Streams Using Reinforcement Learning
Abolfazl Zarghani
Brain Age Estimation with Twin Vision Transformer using Hippocampus Information Applicable to Alzheimer Dementia Diagnosis
Zahra Qodrati - Seyedeh Masoumeh Taji - Amirhossein Ghaemi - Habibollah Danyali - Kamran Kazemi - Alireza Ghaemi
Diagnosis of Depression Based on New Features Extractive from the Frequency Space of the EEG
Melika Changizi - Saeid Rashidi
Hate Sentiment Recognition System For Persian Language
Pegah Shams jey - Arash Hemmati - Ramin Toosi - Mohammad ali Akhaee
FGM Copula based Analysis of Coverage Region for Wireless Three-User Multiple Access Channel with Correlated Channel Coefficients
Mona Sadat Mohsenzadeh - Ghosheh Abed Hodtani
A Smart Electrochemical Biosensor for Arsenic Detection in Water
Keyvan Asefpour Vakilian
Virtual Network Embedding based on Univariate Distribution Estimation
Arezoo Jahani
A Cost-Sensitive Genetic Algorithm for Customer Segmentation in Auto Insurances
Alireza Khajenoori - Mohammad Saniee Abadeh - Mohsen Mohammadzadeh
Exploring 3D Transfer Learning CNN Models for Alzheimer’s Disease Diagnosis from MRI Images
Fatemehsadat Ghanadi Ladani - Hamidreza Baradaran Kashani
more
Samin Hamayesh - Version 43.7.0