0% Complete
Home
/
11th International Conference on Computer and Knowledge Engineering
Semi-automatic Detection of Persian Stopwords using FastText Library
Authors :
Mohammad Dehghani
1
Mohammad Manthouri
2
1- Tarbiat Modares University
2- Shahed University
Keywords :
stopword, natural language processing, fastText, word embedding, text mining
Abstract :
A stopword is a word that does not add much semantic information to the text that despite of its very high frequency. Stopwords include prepositions, conjunctions, and pronouns. One of the steps in natural language processing is to remove stopwords to reduce dataset size and process faster. In this study, a semi-automatic method for collecting the Persian language's stopwords is proposed. The proposed method lists the stopwords of each text depending on its subject. For this purpose, based on a corpus of news texts, the Inverse Document Frequency (IDF) weight in the text is calculated for each word and the stopwords candidates are determined. Then, using the fastText library, the vector of each word is obtained. In the next step, five neighbors are found for each vector. Next, by removing duplicate words, the final list of stopwords (1014 stopwords) is collected. The result of simulations show the accuracy of detecting stopwords by the k-nearest neighbor method is 94.6%.
Papers List
List of archived papers
MIPS-Core Application Specific Instruction-Set Processor for IDEA Cryptography − Comparison between Single-Cycle and Multi-Cycle Architectures
Ahmad Ahmadi - Reza Faghih Mirzaee
Real-Time Vehicle Detection and Classification in UAV imagery Using Improved YOLOv5
Mohammad Hossein Hamzenejadi - Hadis Mohseni
Optimizing MR Image Registration for Accurate Brain Volume Measurement in Children with Autism Spectrum Disorder
Shiva Sanati - Mahdi Saadatmand
Age Estimation Based on Facial Images Using Hybrid Features and Particle Swarm Optimization
NILOUFAR MEHRABI - SAYED PEDRAM HAERI BOROUJENI
Adaptive Channel Estimation for MIMO-OFDM Systems in Impulsive Noise Environments
Mojtaba Hajiabadi
Facial Emotion Recognition Under Mask Coverage Using a Data Augmentation Technique
Aref Farhadipour - Pouya Taghipour
Hybrid Flow-Rule Placement Method of Proactive and Reactive in SDNs
Mohammadreza Khoobbakht - Mohammadreza Noei - Mohammadreza Parvizimosaed
Android Malware Detection using Supervised Deep Graph Representation Learning
Fatemeh Deldar - Mahdi Abadi - Mohammad Ebrahimifard
InfOnto: An ontology for fashion influencer marketing based on Instagram
Somaye Sultani - Mohsen Kahani
Degarbayan-SC: A Colloquial Paraphrase Farsi Subtitles Dataset
Mohammad Javad Aghajani - Mohammad Ali Keyvanrad
more
Samin Hamayesh - Version 41.7.6