0% Complete
Home
/
11th International Conference on Computer and Knowledge Engineering
FarSick: A Persian Semantic Textual Similarity And Natural Language Inference Dataset
Authors :
Zahra Ghasemi
1
Mohammad Ali Keyvanrad
2
1- Original author
2- coauthored
Keywords :
Persian dataset, Semantic Textual Similarity, Natural Language Inference, paraphrase expressions, plagiarism detection, deep learning, Natural Language Processing
Abstract :
Semantic textual similarity(STS) and natural language inference(NLI) are important tasks in natural language processing(NLP) such as information retrieval, text classification, subject extraction, text summarization, machine translation and plagiarism detection. Lack of appropriate datasets in Persian is a major obstacle to progress in this area. Therefore, in this paper, we present a new dataset for STS and NLI tasks in the Persian language. It includes 9804 pairs of Persian sentences with labels for similarity and inference for each pair of sentences. This dataset is collected by translating and editing the sentences of SICK dataset. We also measured the performance of traditional, statistical and deep learning models on it, e.g. transformers, Convolution Neural Networks, Bidirectional LSTMs, weighted average of word vectors, etc. We used different pre-trained embeddings, word2vec, glove, fastText and Bert sentence transformer. We used accuracy metric to test NLI tasks and Pearson metric to test STS tasks.
Papers List
List of archived papers
Optimizing Text-Based Protocol Clustering in Reverse Engineering with Auto-Encoders and Fine-Tuned Parameters
Shiva Mahmoudzadeh - Mohaddese Nemati - Mehdi Teimouri
A Robust Network for Embedded Traffic Sign Recognation.
Omid Nejati Manzari - Shahriar Baradaran Shokouhi
IR-LPR: Large Scale of Iranian License Plate Recognition Dataset
Mahdi Rahmani - Melika Sabaghian - Seyyedeh Mahila Moghadami - Mohammad Mohsen Talaie - Mahdi Naghibi - Mohammad Ali Keyvanrad
Hate Sentiment Recognition System For Persian Language
Pegah Shams jey - Arash Hemmati - Ramin Toosi - Mohammad ali Akhaee
Improving performance of multi-label classification using ensemble of feature selection and outlier detection
Mohammad Ali Zarif - Javad Hamidzadeh
Cluster Sampling: A Cluster-Driven Sampling Strategy for Deep Metric Learning
Hamideh Rafiee - Ahmad Ali Abin - Seyed Soroush Majd
Non-Functional Requirement Extracting Methods for AI-based Systems: A Survey
Reza Damirchi - Amineh Amini
AL-YOLO: Accurate and Lightweight Vehicle and Pedestrian Detector in Foggy Weather
Behdad Sadeghian Pour - Hamidreza Mohammadi Jozani - Shahriar Baradaran Shokouhi
A Novel Deformable Registration Method for Cerebral Magnetic Resonance Images
Bahareh Asadpour Dasht Bayaz - Mahdi Saadatmand - Fabrice Wallois
An Attention-Based Model for Clinical Time Series Prediction: Enhancing ICU Readmission Prediction
Hananeh Sadat Madinei - Mohammad Reza Keyvanpour - Seyed Vahab Shojaedini
more
Samin Hamayesh - Version 42.2.1