0% Complete
Home
/
11th International Conference on Computer and Knowledge Engineering
Semi-automatic Detection of Persian Stopwords using FastText Library
Authors :
Mohammad Dehghani
1
Mohammad Manthouri
2
1- Tarbiat Modares University
2- Shahed University
Keywords :
stopword, natural language processing, fastText, word embedding, text mining
Abstract :
A stopword is a word that does not add much semantic information to the text that despite of its very high frequency. Stopwords include prepositions, conjunctions, and pronouns. One of the steps in natural language processing is to remove stopwords to reduce dataset size and process faster. In this study, a semi-automatic method for collecting the Persian language's stopwords is proposed. The proposed method lists the stopwords of each text depending on its subject. For this purpose, based on a corpus of news texts, the Inverse Document Frequency (IDF) weight in the text is calculated for each word and the stopwords candidates are determined. Then, using the fastText library, the vector of each word is obtained. In the next step, five neighbors are found for each vector. Next, by removing duplicate words, the final list of stopwords (1014 stopwords) is collected. The result of simulations show the accuracy of detecting stopwords by the k-nearest neighbor method is 94.6%.
Papers List
List of archived papers
Investigation of topological characteristics of Iranian railway network: A network science approach
Sina Firuzbakht - Mohammad Khansari
Standardized ReACT Logits: An Effective Approach for Anomaly Segmentation in Self-driving Cars
Mahdi Farhadi - Seyede Mahya Hazavei - Shahriar Baradaran Shokouhi
Designing a High Perfomance and High Profit P2P Energy Trading System Using a Consortium Blockchain Network
Poonia Taheri Makhsoos - Behnam Bahrak - Fattaneh Taghiyareh
African Vultures Optimization Algorithm for Optimal Damping Controllers Design in the Electrical Power Grid System
Aliyu Sabo - Theophilus Ebuka Odoh - Samuel Habu - Hossein Shahinzadeh - Farshad Ebrahimi
Enhancing Vehicle Make and Model Recognition with 3D Attention Modules
Narges Semiromizadeh - Omid Nejati Manzari - Shahriar B. Shokouhi - Sattar Mirzakuchaki
Supervised Contrastive Learning for Short Text Classification in Natural Language Processing
Mitra Esmaeili - Hamed Vahdat nejad
Joint mobility-aware offloading and UAV position optimization in Blockchain-enabled 5G
Zeinab Rabbani - Zeinab Movahedi
Optimizing MR Image Registration for Accurate Brain Volume Measurement in Children with Autism Spectrum Disorder
Shiva Sanati - Mahdi Saadatmand
Predicting the Recovery Rate of COVID-19 Using a Novel Hybrid Method
Fatemeh Ahouz - Ebrahim Sayahi
An interactive user groups recommender system based on reinforcement learning
Hediyeh Naderi Allaf - Mohsen Kahani
more
Samin Hamayesh - Version 42.2.1