0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
Degarbayan-SC: A Colloquial Paraphrase Farsi Subtitles Dataset
Authors :
Mohammad Javad Aghajani
1
Mohammad Ali Keyvanrad
2
1- Maleke-ashtar University of Technology
2- Maleke-ashtar University of Technology
Keywords :
colloquial dataset،deep learning،movie subtitles،Natural Language Processing،paraphrase detection،paraphrase generation،Persian dataset،supervised learning
Abstract :
Paraphrase generation and paraphrase detection are important tasks in Natural Language Processing (NLP), such as information retrieval, text simplification, question answering, and chatbots. The lack of comprehensive datasets in the Persian paraphrase is a major obstacle to progress in this area. In spite of their importance, no large-scale corpus has been made available so far, given the difficulties in its creation and the intensive labor required. In this paper, the construction process of Degarbayan-SC using movie subtitles, together with some of the difficulties we experienced during data extraction and sentence alignment, is addressed. As you know, movie subtitles are in Colloquial language. It is different from formal language. To the best of our knowledge, Degarbayan-SC is the first freely released large-scale (in the order of a million words) Persian paraphrase corpus. Furthermore, this newly introduced dataset will help the growth of Persian paraphrase. We have tested our dataset on neural network models and compared the performances of different attention-based models (transformers) and the GRU model on it. We have also declared the sentences generated by the neural networks and performed human metrics on them.
Papers List
List of archived papers
Optimization Resource Allocation in NOMA-based Fog Computing with a Hybrid Algorithm
Zohreh Torki - S.Mojtaba Matinkhah
A New Hypercube Variant: Pruned Shuffle Connected Cube
Reza Latifi - Mahmoud Naghibzadeh
Enhancing Lighter Neural Network Performance with Layer-wise Knowledge Distillation and Selective Pixel Attention
Siavash Zaravashan - Sajjad Torabi - Hesam Zaravashan
Word-level Persian Lipreading Dataset
Javad Peymanfard - Ali Lashini - Samin Heydarian - Hossein Zeinali - Nasser Mozayani
A 2D-CNN Architecture for Improving the Classification Accuracy of an Electronic Nose with Different Sensor Positions
Hannaneh Mahdavi - Reza Goldoust - Saeideh Rahbarpour
Robustness Scan of Digital Circuits Using Convolutional Neural Networks
Mobin Vaziri - Mohammad Mehdi Rahimifar - Hadi Jahanirad
CSI-Based Human Activity Recognition using Convolutional Neural Networks
Parisa Fard Moshiri - Mohammad Nabati - Reza Shahbazian - Seyed Ali Ghorashi
Facial Emotion Recognition Under Mask Coverage Using a Data Augmentation Technique
Aref Farhadipour - Pouya Taghipour
An Analysis of Botnet Detection Using Graph Neural Network
Faezeh Alizadeh - Mohammad Khansari
ROCT-Net: A new ensemble deep convolutional model with improved spatial resolution learning for detecting common diseases from retinal OCT images
Mohammad Rahimzadeh - Mahmoud Reza Mohammadi
more
Samin Hamayesh - Version 41.7.6