0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
Degarbayan-SC: A Colloquial Paraphrase Farsi Subtitles Dataset
Authors :
Mohammad Javad Aghajani
1
Mohammad Ali Keyvanrad
2
1- Maleke-ashtar University of Technology
2- Maleke-ashtar University of Technology
Keywords :
colloquial dataset،deep learning،movie subtitles،Natural Language Processing،paraphrase detection،paraphrase generation،Persian dataset،supervised learning
Abstract :
Paraphrase generation and paraphrase detection are important tasks in Natural Language Processing (NLP), such as information retrieval, text simplification, question answering, and chatbots. The lack of comprehensive datasets in the Persian paraphrase is a major obstacle to progress in this area. In spite of their importance, no large-scale corpus has been made available so far, given the difficulties in its creation and the intensive labor required. In this paper, the construction process of Degarbayan-SC using movie subtitles, together with some of the difficulties we experienced during data extraction and sentence alignment, is addressed. As you know, movie subtitles are in Colloquial language. It is different from formal language. To the best of our knowledge, Degarbayan-SC is the first freely released large-scale (in the order of a million words) Persian paraphrase corpus. Furthermore, this newly introduced dataset will help the growth of Persian paraphrase. We have tested our dataset on neural network models and compared the performances of different attention-based models (transformers) and the GRU model on it. We have also declared the sentences generated by the neural networks and performed human metrics on them.
Papers List
List of archived papers
A Vision-Based Method for Human Activity Recognition Using Local Binary Pattern
Babak Goodarzi - Reza Javidan - Mohammad Sadegh Rezaei
A Chaotic Crow Search Algorithm for Overlapping Clustering
Mostafa Sabzekar - Seyed Vahid Mousavainejad
Paddy Plant Stress Identification Using Few-Shot Learning Framework
Ervin Gubin Moung - Pavindrah Naidu a/l Narayanasamy Naiidu - Maisarah Mohd Sufian - Valentino Liaw - Ali Farzamnia - Lorita Angeline
A New Time Series Approach in Churn Prediction with Discriminatory Intervals
Hedieh Ahmadi - Seyed Mohammad Hossein Hasheminejad
A Deep Reinforcement Learning Approach Combining Technical and Fundamental Analyses with a Large Language Model for Stock Trading
Mahan Veisi - Sadra Berangi - Mahdi Shahbazi Khojasteh - Armin Salimi-Badr
An Efficient Planning Method for Autonomous Navigation of a Wheeled-Robot based on Deep Reinforcement Learning
Ali Salimi Sadr - Mahdi Shahbazi Khojasteh - Hamed Malek - Armin Salimi-Badr
A New Application of Machine Learning Based Methods for Disk Space Variation Fault Diagnosis in Transformer Windings
Reza Behkam - Amir Lotfi - Gevork B. Gharehpetian
Cardiology Disease Diagnosis by Analyzing Histological Microscopic Images Using Deep Learning
Maria Salehpanah - Jafar Tanha - Zahra Jafari - SeyedEhsan Roshan - Sajad Rezaei
A Survey on Semi-Automated and Automated Approaches for Video Annotation
Samin Zare - Mehran Yazdi
An effective hybrid algorithm for locating splicing forgery image
Seyed Hesamoddin Hosseini - Amene Vatanparast - Amir Hossein Taherinia
more
Samin Hamayesh - Version 42.4.1