0% Complete
Home
/
11th International Conference on Computer and Knowledge Engineering
Semi-automatic Detection of Persian Stopwords using FastText Library
Authors :
Mohammad Dehghani
1
Mohammad Manthouri
2
1- Tarbiat Modares University
2- Shahed University
Keywords :
stopword, natural language processing, fastText, word embedding, text mining
Abstract :
A stopword is a word that does not add much semantic information to the text that despite of its very high frequency. Stopwords include prepositions, conjunctions, and pronouns. One of the steps in natural language processing is to remove stopwords to reduce dataset size and process faster. In this study, a semi-automatic method for collecting the Persian language's stopwords is proposed. The proposed method lists the stopwords of each text depending on its subject. For this purpose, based on a corpus of news texts, the Inverse Document Frequency (IDF) weight in the text is calculated for each word and the stopwords candidates are determined. Then, using the fastText library, the vector of each word is obtained. In the next step, five neighbors are found for each vector. Next, by removing duplicate words, the final list of stopwords (1014 stopwords) is collected. The result of simulations show the accuracy of detecting stopwords by the k-nearest neighbor method is 94.6%.
Papers List
List of archived papers
Bridging the Synthetic-to-Real Gap (BSRG): Creating Simulated Datasets for Domain Adaptation to Enhance Vehicle Detection
Behnaz Sadeghigol - Mohammad Ali Keyvanrad
Artificial Intelligence applications addressing different aspects of the Covid-19 crisis and key technological solutions for future epidemics control
Nadia Khalili - Hojatollah Hamidi
Impossible differential and zero-correlatin linear cryptanalysis of Marx, Marx2, Chaskey andSpeck32
Mahshid Saberi - Nasour Bagheri - Sadegh Sadeghi
Forecasting El Niño Six Months in Advance Utilizing Augmented Convolutional Neural Network
Mohammad Naisipour - Iraj Saeedpanah - Arash Adib - Mohammad Hossein Neisi Pour
Blind image quality assessment based on Multi-resolution Local Structures
Seyed Majid Khorashadizadeh - Mehdi Sadeghi Bakhi - Fatemeh Seifishahpar - AliMohammad Latif
A Federated Learning-Based Hybrid Deep Learning Framework for Enhanced Human Activity Recognition
Jamileh Azmoudeh - Sajjad Arghaee - Parisa Valizadeh - Samaneh Dandani - Iman Havangi - Mohammad Hossein Yaghmaee
Introducing E4MT and LMBNC: Persian pre-processing utilities
Zakieh Shakeri - Mehran Ziabary - Behrooz Vedadian - Fatemeh Azadi - Saeed Torabzadeh - Arian Atefi
Detecting Non-Spherical Clusters Using Modified CURE Algorithm
Arezou Safdari - Pedram Salehpour
Driving Violation Detection Using Vehicle Data and Environmental Conditions
Masood Ghasemi - Mahmood Fathy - Mohammad Shahverdy
New Design of Efficient Reversible Quantum Saturation Adder
Negin Mashayekhi - Mohammad Reza Reshadinezhad - Shekoofeh Moghimi
more
Samin Hamayesh - Version 42.4.1