0% Complete
Home
/
11th International Conference on Computer and Knowledge Engineering
Semi-automatic Detection of Persian Stopwords using FastText Library
Authors :
Mohammad Dehghani
1
Mohammad Manthouri
2
1- Tarbiat Modares University
2- Shahed University
Keywords :
stopword, natural language processing, fastText, word embedding, text mining
Abstract :
A stopword is a word that does not add much semantic information to the text that despite of its very high frequency. Stopwords include prepositions, conjunctions, and pronouns. One of the steps in natural language processing is to remove stopwords to reduce dataset size and process faster. In this study, a semi-automatic method for collecting the Persian language's stopwords is proposed. The proposed method lists the stopwords of each text depending on its subject. For this purpose, based on a corpus of news texts, the Inverse Document Frequency (IDF) weight in the text is calculated for each word and the stopwords candidates are determined. Then, using the fastText library, the vector of each word is obtained. In the next step, five neighbors are found for each vector. Next, by removing duplicate words, the final list of stopwords (1014 stopwords) is collected. The result of simulations show the accuracy of detecting stopwords by the k-nearest neighbor method is 94.6%.
Papers List
List of archived papers
Parallel Local Feature Selection For High-dimensional Data
Zhaleh Manbari - Chiman Salavati - Fardin AkhlaghianTab - Barzan Saeedpoor - Himan Delbina - Mahmud Abdulla Mohammad
WBT-GAN:Wavelet based Generative Adversarial Network for Texture Synthesis
Sara Saberi moghadam - Reza Azmi - Maral Zarvani
A Graph-based Feature Selection using Class-Feature Association Map (CFAM)
Motahare Akhavan - Seyed Mohammad Hossein Hasheminejad
Automatic Infrared-Based Volume and Mass Estimation System for Agricultural Products
Seyed Muhammad Hossein Mousavi - S. Muhammad Hassan Mosavi
Optimizing Text-Based Protocol Clustering in Reverse Engineering with Auto-Encoders and Fine-Tuned Parameters
Shiva Mahmoudzadeh - Mohaddese Nemati - Mehdi Teimouri
FAHP-OF: A New Method for Load Balancing in RPL-based Internet of Things (IoT)
Mohammad Koosha - Behnam Farzaneh - Emad Alizadeh - Shahin Farzaneh
Energy-Aware Dynamic Digital Twin Placement in Mobile Edge Computing
Mahdi Hematyar - Zeinab Movahedi
Multi-Task Transformer for Stock Market Trend Prediction
Seyed Morteza Mirjebreili - Ata Solouki - Hamidreza Soltanalizadeh - Mohammad Sabokrou
Leveraging Self-Supervised Models for Automatic Whispered Speech Recognition
Aref Farhadipour - Homa Asadi - Volker Dellwo
Automatic Generation of XACML Code using Model-Driven Approach
Athareh Fatemian - Bahman Zamani - Marzieh Masoumi - Mehran Kamranpour - Behrouz Tork Ladani - Shekoufeh Kolahdouz Rahimi
more
Samin Hamayesh - Version 41.5.3