0% Complete
Home
/
11th International Conference on Computer and Knowledge Engineering
Semi-automatic Detection of Persian Stopwords using FastText Library
Authors :
Mohammad Dehghani
1
Mohammad Manthouri
2
1- Tarbiat Modares University
2- Shahed University
Keywords :
stopword, natural language processing, fastText, word embedding, text mining
Abstract :
A stopword is a word that does not add much semantic information to the text that despite of its very high frequency. Stopwords include prepositions, conjunctions, and pronouns. One of the steps in natural language processing is to remove stopwords to reduce dataset size and process faster. In this study, a semi-automatic method for collecting the Persian language's stopwords is proposed. The proposed method lists the stopwords of each text depending on its subject. For this purpose, based on a corpus of news texts, the Inverse Document Frequency (IDF) weight in the text is calculated for each word and the stopwords candidates are determined. Then, using the fastText library, the vector of each word is obtained. In the next step, five neighbors are found for each vector. Next, by removing duplicate words, the final list of stopwords (1014 stopwords) is collected. The result of simulations show the accuracy of detecting stopwords by the k-nearest neighbor method is 94.6%.
Papers List
List of archived papers
Enhancing EEG-based BCI Performances by Reducing Covariate Shift via Adaptive Multi-Domain Feature Extraction
Moein Radman - Reza Arghand - Nader Nariman-Zadeh - Ali Chaibakhsh
An Automated Visual Defect Segmentation for Flat Steel Surface Using Deep Neural Networks
Dorna Nourbakhsh Sabet - Mohammad Reza Zarifi - Javad Khoramdel - Yasamin Borhani - Esmaeil Najafi
Prediction of rTMS Treatment Response in Depression Using a Frequency-Based EEG Biomarker
Ali Asadi Zeidabadi - Saeid Rashidi
Towards Efficient Capsule Networks through Approximate Squash Function and Layer-wise Quantization
Mohsen Raji - Kimia Soroush - Amir Ghazizadeh
FedFog: A Serverless and Privacy-Aware Federated Learning Simulator for Edge–Fog Networks
Seyed Vahid Hashemi Nik - Seyed Mohammad Mahdi Asaadi - Somayeh Sobati-M
Predicting cascading failure with machine learning methods in the interdependent networks
Mohamad Hossein Maghsoodi - Mohamad Khansari
An Energy-efficient Clustering Method based on Butterfly Optimization Algorithm by Considering the Criterion of Intra-cluster Distances in WSNs
Fariba Saghi Hadi S. Aghdasi
Towards Low-Overhead Mitigation of Trojan Bit-Flip Attacks on DNNs via Causal Inference
Bahare Gholami - Mohsen Raji
FarCQA: A Farsi Community Dataset for Question Classification and Answer Selection
Saba Emami - Maedeh Mosharraf
SUT: a new multi-purpose synthetic dataset for Farsi document image analysis
Elham Shabaninia - Fatemeh sadat Eslami - Ali Afkari Fahandari - Hossein Nezamabadi-pour
more
Samin Hamayesh - Version 43.7.0