0% Complete
Home
/
11th International Conference on Computer and Knowledge Engineering
FarSick: A Persian Semantic Textual Similarity And Natural Language Inference Dataset
Authors :
Zahra Ghasemi
1
Mohammad Ali Keyvanrad
2
1- Original author
2- coauthored
Keywords :
Persian dataset, Semantic Textual Similarity, Natural Language Inference, paraphrase expressions, plagiarism detection, deep learning, Natural Language Processing
Abstract :
Semantic textual similarity(STS) and natural language inference(NLI) are important tasks in natural language processing(NLP) such as information retrieval, text classification, subject extraction, text summarization, machine translation and plagiarism detection. Lack of appropriate datasets in Persian is a major obstacle to progress in this area. Therefore, in this paper, we present a new dataset for STS and NLI tasks in the Persian language. It includes 9804 pairs of Persian sentences with labels for similarity and inference for each pair of sentences. This dataset is collected by translating and editing the sentences of SICK dataset. We also measured the performance of traditional, statistical and deep learning models on it, e.g. transformers, Convolution Neural Networks, Bidirectional LSTMs, weighted average of word vectors, etc. We used different pre-trained embeddings, word2vec, glove, fastText and Bert sentence transformer. We used accuracy metric to test NLI tasks and Pearson metric to test STS tasks.
Papers List
List of archived papers
Leveraging the Power of Object Detection Models in Identifying Litter for a Significant Reduction in Environmental Pollution
Lim Zhen Xian - Ervin Gubin Moung - Jason Teo Tze Wi - Nordin Saad - Farashazillah Yahya - Tiong Lin Rui - Ali Farzamnia
Financial Market Prediction Using Deep Neural Networks with Hardware Acceleration
Dara Rahmati - Mohammad Hadi Foroughi - Ali Bagherzadeh - Mehdi Foroughi - Saeid Gorgin
To Transfer or Not To Transfer (TNT): Action Recognition in Still Image Using Transfer Learning
Ali Soltani Nezhad - Hojat Asgarian Dehkordi - Seyed Sajad Ashrafi - Shahriar Baradaran Shokouhi
Word-level Persian Lipreading Dataset
Javad Peymanfard - Ali Lashini - Samin Heydarian - Hossein Zeinali - Nasser Mozayani
Underwater Image Super-Resolution using Generative Adversarial Network-based Model
Alireza Aghelan - Modjtaba Rouhani
Detecting Non-Spherical Clusters Using Modified CURE Algorithm
Arezou Safdari - Pedram Salehpour
A Cost-Sensitive Genetic Algorithm for Customer Segmentation in Auto Insurances
Alireza Khajenoori - Mohammad Saniee Abadeh - Mohsen Mohammadzadeh
Soccer Video Event Detection Using Metric Learning
Ali Karimi - Ramin Toosi - Mohammad Ali Akhaee
A New Application of Machine Learning Based Methods for Disk Space Variation Fault Diagnosis in Transformer Windings
Reza Behkam - Amir Lotfi - Gevork B. Gharehpetian
A Survey of the AVOA Metaheuristic Algorithm and its Suitability for Power System Optimization and Damping Controller Design
Aliyu Sabo - Theophilus Ebuka Odoh - Samuel Habu - Hossien Shahinzadeh - Farshad Ebrahimi
more
Samin Hamayesh - Version 41.3.1