0% Complete
Home
/
14th International Conference on Computer and Knowledge Engineering
Enhancing Persian Word Sense Disambiguation with Large Language Models: Techniques and Applications
Authors :
Fatemeh Zahra Arshia
1
Saeedeh Sadat Sadidpour
2
1- Faculty of Electronic & Computer Engineering, Malek Ashtar University of Technology
2- Faculty of Electronic & Computer Engineering, Malek Ashtar University of Technology
Keywords :
Word Sense Disambiguation (WSD)،Large Language Models (LLMs)،Persian Disambiguation
Abstract :
WSD means the task of word sense disambiguation, which is a very important task in NLP. It assigns not only the meaningful word to the source text but also the proper meaning of the word according to the context. Hence, it is key to the proper accomplishment of NLP in Persian—a language rich in morphology and great polysemy. The recent improvements in LMs have greatly advanced the capabilities of NLP, opening further improvement avenues in WSD performance. This paper presents the integration of LLMs for improving WSD in Persian, considering the linguistic challenges related to this language. In this study, we consider four models of the Persian language: FaBERT, AriaBERT, GPT-2 Persian, and PersianMind-v1.0. We use the supervised fine-tuning method on the SBU-WSD-Corpus. Our methodology will consist of preprocessing the Persian WSD corpus, then fine-tuning the models with the mentioned corpus, and measuring their performance. Results indicate that methods using LLMs significantly improve WSD accuracy against traditional methods, with FaBERT achieving the best accuracy. We have further expounded on their real-life applications, such as sentiment analysis, to show the consequential effect of this advancement on general NLP tasks. The study is concluded with some insights into future research directions, underlining the potential role that LLMs can play in further transforming WSD and related fields.
Papers List
List of archived papers
Evaluating the Impact of Traveling on COVID-19 Prevalence and Predicting the New Confirmed Cases According to the Travel Rate Using Machine Learning: A Case Study in Iran
Anita Ghandehari - Soheil Shirvani - Hadi Moradi
Community-Based QoE Enhancement for User-Generated Content Live Streaming
Reza Saeedinia - S.Omid Fatemi - Daniele Lorenzi - Farzad Tashtarian - Christian Timmerer
InfOnto: An ontology for fashion influencer marketing based on Instagram
Somaye Sultani - Mohsen Kahani
Prediction of West Texas Intermediate Crude-oil Price Using Hybrid Attention-based Deep Neural Networks: A Comparative Study
Alireza Jahandoost - Mahboobeh Houshmand - Seyyed Abed Hosseini
Adaptive-A-GCRNN: Enhancing Real-time Multi-band Spectrum Prediction through Attention-based Spatial-Temporal Modeling
Seyed majid Hosseini - Seyedeh Mozhgan Rahmatinia - Seyed Amin Hosseini Seno - Hadi Sadoghi yazdi
A large input-space-margin approach for adversarial training
Reihaneh Nikouei - Mohammad Taheri
Effect of Tissue Excitation in Breast Cancer Detection from Ultrasound RF Time Series: Phantom studies
Elaheh Norouzi Ghehi - Ali Fallah - Saeid Rashidi - Maryam Mehdizadeh Dastjerdi
Histopathology Image-Based Cancer Classification Utilizing Transfer Learning Approach
Amir Meydani - Alireza Meidani - Ali Ramezani - Maryam Shabani - Mohammad Mehdi Kazeminasab - Shahriar Shahablavasani
Delay Optimization of a Federated Learning-based UAV-aided IoT network
Hossein Mohammadi Firouzjaei - Javad Zeraatkar Moghaddam - Mehrdad Ardebilipour
YOLOatt-Med: YOLO-Based Attention Mechanism for Medical Image Classification
Fatemeh Naserizadeh - Erfan Akbarnezhad Sany - Parsa Sinichi - Seyyed Abed Hosseini
more
Samin Hamayesh - Version 42.2.1