0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
Degarbayan-SC: A Colloquial Paraphrase Farsi Subtitles Dataset
Authors :
Mohammad Javad Aghajani
1
Mohammad Ali Keyvanrad
2
1- Maleke-ashtar University of Technology
2- Maleke-ashtar University of Technology
Keywords :
colloquial dataset،deep learning،movie subtitles،Natural Language Processing،paraphrase detection،paraphrase generation،Persian dataset،supervised learning
Abstract :
Paraphrase generation and paraphrase detection are important tasks in Natural Language Processing (NLP), such as information retrieval, text simplification, question answering, and chatbots. The lack of comprehensive datasets in the Persian paraphrase is a major obstacle to progress in this area. In spite of their importance, no large-scale corpus has been made available so far, given the difficulties in its creation and the intensive labor required. In this paper, the construction process of Degarbayan-SC using movie subtitles, together with some of the difficulties we experienced during data extraction and sentence alignment, is addressed. As you know, movie subtitles are in Colloquial language. It is different from formal language. To the best of our knowledge, Degarbayan-SC is the first freely released large-scale (in the order of a million words) Persian paraphrase corpus. Furthermore, this newly introduced dataset will help the growth of Persian paraphrase. We have tested our dataset on neural network models and compared the performances of different attention-based models (transformers) and the GRU model on it. We have also declared the sentences generated by the neural networks and performed human metrics on them.
Papers List
List of archived papers
SingAll: Scalable Control Flow Checking for Multi-Process Embedded Systems
Mehdi Amininasab - Ahmad Patooghy - Mahdi Fazeli
Enhanced Melanoma Detection: An Improved Deformable DETR Model with Efficient Channel Attention
Amirreza Rouhbakhshmeghrazi - Shayan Nalbandian - Sheida Shadman - Mohammad Reza Hassannezhad - Shuyuan Yang - Bo Li
FaaScaler: An Automatic Vertical and Horizontal Scaler for Serverless Computing Environments
Zahra Rezaei - Saeid Abrishami - Seid Nima Moeintaghavi
Farsi Text in Scene: A new dataset
Ali Salmasi - Ehsanollah Kabir
Iris Detection and Segmentation Using Deep Learning
Ali Khaki - Ali Aghagolzadeh - Bagher Rahimpour Cami
An optimal workflow scheduling method in cloud-fog computing using three-objective Harris-Hawks algorithm
Ahmadreza Montazerolghaem - Maryam Khosravi - Fatemeh Rezaee
Compressing Deep Neural Networks Using Explainable AI
Kimia Soroush - Mohsen Raji - Behnam Ghavami
Frame Classification in Video Capsule Endoscopy Using an Improved Capsule Network
Amirhossein Ghaemi - Habibollah Danyali - Alireza Ghaemi
A New Inter-layer Similarity metric for link prediction in multilayer networks
Alireza Abdollahpouri - Samira Rafiee
Damage Detection After the Earthquake Using Sentinel-1 and 2 Images and Machine Learning Algorithms (Case Study: Sarpol-e Zahab Earthquake)
Niloofar Alizadeh - Behnam Asghari Beirami - Mehdi Mokhtarzade
more
Samin Hamayesh - Version 43.7.0