0% Complete
Home
/
11th International Conference on Computer and Knowledge Engineering
FarSick: A Persian Semantic Textual Similarity And Natural Language Inference Dataset
Authors :
Zahra Ghasemi
1
Mohammad Ali Keyvanrad
2
1- Original author
2- coauthored
Keywords :
Persian dataset, Semantic Textual Similarity, Natural Language Inference, paraphrase expressions, plagiarism detection, deep learning, Natural Language Processing
Abstract :
Semantic textual similarity(STS) and natural language inference(NLI) are important tasks in natural language processing(NLP) such as information retrieval, text classification, subject extraction, text summarization, machine translation and plagiarism detection. Lack of appropriate datasets in Persian is a major obstacle to progress in this area. Therefore, in this paper, we present a new dataset for STS and NLI tasks in the Persian language. It includes 9804 pairs of Persian sentences with labels for similarity and inference for each pair of sentences. This dataset is collected by translating and editing the sentences of SICK dataset. We also measured the performance of traditional, statistical and deep learning models on it, e.g. transformers, Convolution Neural Networks, Bidirectional LSTMs, weighted average of word vectors, etc. We used different pre-trained embeddings, word2vec, glove, fastText and Bert sentence transformer. We used accuracy metric to test NLI tasks and Pearson metric to test STS tasks.
Papers List
List of archived papers
Reliability Evaluation of 4:2 Compressors Based on Hammock Networks
Farshad Safaei - Mohammad mahdi Emadi Kouchak - Sara Talebpour
TrackMine: Topic Tracking in Model Mining using Genetic Algorithm
Mohammad Sajad Kasaei - Mohammadreza Sharbaf - Afsaneh Fatemi - Bahman Zamani
Automating Theory of Mind Assessment with a LLaMA-3-Powered Chatbot: Enhancing Faux Pas Detection in Autism
Avisa Fallah - Ali Keramati - Mohammad Ali Nazari - Fatemeh Sadat Mirfazeli
PeQa: a Massive Persian Quenstion-Answering and Chatbot Dataset
Fatemeh Zahra Arshia - Mohammad Ali Keyvanrad - Saeedeh Sadat Sadidpour - Sayyid Mohammad Reza Mohammadi
Early detection of Parkinson’s disease using Convolutional Neural Networks on SPECT images
Reyhaneh Dehghan - Marjan Naderan - Seyyed Enayatallah Alavi
Semantic Segmentation Using Region Proposals and Weakly-Supervised Learning
Maryam Taghizadeh - Abdolah Chalechale
Damage Detection After the Earthquake Using Sentinel-1 and 2 Images and Machine Learning Algorithms (Case Study: Sarpol-e Zahab Earthquake)
Niloofar Alizadeh - Behnam Asghari Beirami - Mehdi Mokhtarzade
DFIG-WECS Renewable Integration to the Grid and Stability Improvement through Optimal Damping Controller Design
Theophilus Ebuka Odoh - Aliyu Sabo - Hossien Shahinzadeh - Noor Izzri Abdul Wahab - Farshad Ebrahimi
ExaAEC: A New Multi-label Emotion Classification Corpus in Arabic Tweets
Saeed Sarbazi-Azad - Ahmad Akbari - Mohsen Khazeni
Real-Time Forecasting Using Mixed Frequency Time-Series Data
Armin Khayati - Mohammad Taheri - Koorush Ziarati
more
Samin Hamayesh - Version 43.7.0