0% Complete
Home
/
15th International Conference on Computer and Knowledge Engineering
ParsHomo: A T5-Powered Approach to High-Precision Persian Homograph Disambiguation
Authors :
Hasan Jalali
1
Taha Mohaddesi
2
1- School of Electrical and Computer Engineering College of Engineering, University of Tehran
2- ASMIN AI Research Center
Keywords :
Homograph Disambiguation،Persian Language Processing،Grapheme-to-Phoneme Conversion،T5 Transformer،Natural Language Processing،Text-to-Speech
Abstract :
This paper presents ParsHomo, a high-precision model for Persian homograph disambiguation using fine-tuned T5-based architectures (mT5 and ByT5). Starting from the HomoRich dataset, we manually curated and refined the human-labeled portion to ensure exceptional data quality. The cleaned dataset was used to fine-tune T5 models, leading to substantial improvements over previous systems. We evaluated various input formatting strategies—including question-style, label-based, and separator-based approaches—and used the trained model to relabel large-scale datasets such as HomoRich-GPT4o and GE2PE, resulting in a bootstrapped corpus of over 4.19 million sentences. ParsHomo offers the most accurate and scalable homograph disambiguation system for Persian to date, demonstrating the effectiveness of transformer-based models for contextaware phoneme generation in low-resource languages. Our bestperforming model (mT5 with label-based format) achieved an accuracy of 94.55% and a Character Error Rate of 1.71%, outperforming prior approaches by a wide margin.
Papers List
List of archived papers
An Improved and Accurate Measure for Mining Correlated High-utility Itemsets
Amir Masoud Heidari Orojloo - Morteza Keshtkaran
Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT
Seyed Mohammad Hossein Hashemi - Leila Safari - Mohsen Hooshmand - Amirhossein Dadashzadeh Taromi
A Facial Deepfake Detection Approach using CNN-based Models, Swin Transformer and Classifier Fusion
Alireza Honardoost - Mahdie Rahmati - Babak Nasersharif
An interactive user groups recommender system based on reinforcement learning
Hediyeh Naderi Allaf - Mohsen Kahani
Deep Learning Feature Extraction for COVID-19 Detection Algorithm using Computerized Tomography Scan
Maisarah Mohd Sufian - Ervin Gubin Moung - Chong Joon Hou - Ali Farzamnia
HiCAP: Hierarchical Clustering-based Attention Pooling for Graph Representation Learning
Parsa Haddadian - Rooholah Abedian - Ali Moeini
A Semi-supervised Fake News Detection using Sentiment Encoding and LSTM with Self-Attention
Pouya Shaeri - Ali Katanforoush
A Robust Network for Embedded Traffic Sign Recognation.
Omid Nejati Manzari - Shahriar Baradaran Shokouhi
A Comprehensive Dataset of Real-scene Images for Text Detection and Recognition in Persian
Iman Souzanchi - Ramin Rahimi - Mohammad Ali Majidi Anvari - Atefeh Baniasadi - Ashkan Sadeghi - Mohammad Reza Mohammadi
SGFL: A Federated Learning Approach for Non-IID Data Using Semi-Supervised DCGAN
Alireza Rabiee - Abolfazl Ajdarloo - Mohsen Rahmani
more
Samin Hamayesh - Version 43.7.0