0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
Degarbayan-SC: A Colloquial Paraphrase Farsi Subtitles Dataset
Authors :
Mohammad Javad Aghajani
1
Mohammad Ali Keyvanrad
2
1- Maleke-ashtar University of Technology
2- Maleke-ashtar University of Technology
Keywords :
colloquial dataset،deep learning،movie subtitles،Natural Language Processing،paraphrase detection،paraphrase generation،Persian dataset،supervised learning
Abstract :
Paraphrase generation and paraphrase detection are important tasks in Natural Language Processing (NLP), such as information retrieval, text simplification, question answering, and chatbots. The lack of comprehensive datasets in the Persian paraphrase is a major obstacle to progress in this area. In spite of their importance, no large-scale corpus has been made available so far, given the difficulties in its creation and the intensive labor required. In this paper, the construction process of Degarbayan-SC using movie subtitles, together with some of the difficulties we experienced during data extraction and sentence alignment, is addressed. As you know, movie subtitles are in Colloquial language. It is different from formal language. To the best of our knowledge, Degarbayan-SC is the first freely released large-scale (in the order of a million words) Persian paraphrase corpus. Furthermore, this newly introduced dataset will help the growth of Persian paraphrase. We have tested our dataset on neural network models and compared the performances of different attention-based models (transformers) and the GRU model on it. We have also declared the sentences generated by the neural networks and performed human metrics on them.
Papers List
List of archived papers
The Effect of Network Environment on Traffic Classification
Abolghasem Rezaei Khesal - Mehdi Teimouri
Supervised Contrastive Learning for Short Text Classification in Natural Language Processing
Mitra Esmaeili - Hamed Vahdat nejad
Cardiology Disease Diagnosis by Analyzing Histological Microscopic Images Using Deep Learning
Maria Salehpanah - Jafar Tanha - Zahra Jafari - SeyedEhsan Roshan - Sajad Rezaei
A Chaotic Crow Search Algorithm for Overlapping Clustering
Mostafa Sabzekar - Seyed Vahid Mousavainejad
LPCNet: Lane detection by lane points correction network in challenging environments based on deep learning
Sina BaniasadAzad - Seyed Mohammadreza Mousavi mirkolaei
Farsi Optical Character Recognition Using a Transformer-based Model
Fatemeh Asadi Zeydabadi - Elham Shabaninia - Hossein Nezamabadi-pour - Melika Shojaee
Optimization Resource Allocation in NOMA-based Fog Computing with a Hybrid Algorithm
Zohreh Torki - S.Mojtaba Matinkhah
DPRNN-FORMER: AN EFFICIENT WAY TO DEAL WITH BLIND SOURCE SEPARATION
Ramin Ghorbani - Sajad Haghzad Klidbary
Area-Efficient VLSI Implementation of Bit-Serial Multiplier Using Polynomial Basis over GF(2m)
Saeideh Nabipour - Javad Javidan - Gholamreza Zare Fatin
Vaccine Distribution Modelling in Pandemics through Multi-Agent Systems: COVID-19 Case
Hossein Yarahmadi - Mohammad Ebrahim Shiri - Hamid Reza Navidi - Arash Sharifi - Moharram Challenger - Hassan Piriaei
more
Samin Hamayesh - Version 42.2.1