0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
Degarbayan-SC: A Colloquial Paraphrase Farsi Subtitles Dataset
Authors :
Mohammad Javad Aghajani
1
Mohammad Ali Keyvanrad
2
1- Maleke-ashtar University of Technology
2- Maleke-ashtar University of Technology
Keywords :
colloquial dataset،deep learning،movie subtitles،Natural Language Processing،paraphrase detection،paraphrase generation،Persian dataset،supervised learning
Abstract :
Paraphrase generation and paraphrase detection are important tasks in Natural Language Processing (NLP), such as information retrieval, text simplification, question answering, and chatbots. The lack of comprehensive datasets in the Persian paraphrase is a major obstacle to progress in this area. In spite of their importance, no large-scale corpus has been made available so far, given the difficulties in its creation and the intensive labor required. In this paper, the construction process of Degarbayan-SC using movie subtitles, together with some of the difficulties we experienced during data extraction and sentence alignment, is addressed. As you know, movie subtitles are in Colloquial language. It is different from formal language. To the best of our knowledge, Degarbayan-SC is the first freely released large-scale (in the order of a million words) Persian paraphrase corpus. Furthermore, this newly introduced dataset will help the growth of Persian paraphrase. We have tested our dataset on neural network models and compared the performances of different attention-based models (transformers) and the GRU model on it. We have also declared the sentences generated by the neural networks and performed human metrics on them.
Papers List
List of archived papers
An Overview of Regression Methods in Early Prediction of Movie Ratings
Houmaan Chamani - Zhivar Sourati Hassanzadeh - Behnam Bahrak
A Language-Independent Approach to Classification of Textual File Fragments: Case Study of Persian, English, and Chinese Languages
Fatemeh Mansouri Hanis - Hamidreza Khoshvaghti - Mehdi Teimouri - Hadi Veisi
Zone-Based Federated Learning in Indoor Positioning
Omid Tasbaz - Vahideh Moghtadaiee - Bahar Farahani
Low-Cost and Hardware Efficient Implementation of Pooling Layers for Stochastic CNN Accelerators
Mobin Vaziri - Hadi Jahanirad
An Exploratory Study of the Relationship between SATD and Other Software Development Activities
Shima Esfandiari - Ashkan Sami
Standardized ReACT Logits: An Effective Approach for Anomaly Segmentation in Self-driving Cars
Mahdi Farhadi - Seyede Mahya Hazavei - Shahriar Baradaran Shokouhi
Computational Microscopy Based on Fourier Ptychography using Embedded Architecture
Rezvan Mir - Abedin Vahedian
Data Clustering using Chimp Optimization Algorithm
SAYED PEDRAM HAERI BOROUJENI - ELNAZ PASHAEI
A Cloud Broker with Gap Analysis Perspective for Scheduling Multi-Workflows Across On-Demand and Reserved Resources
Negin Shafinezhad - Hamidreza Abrishami - Saeid Abrishami
Real-Time Gender Recognition with a Deep Neural Network
Samad Azimi Abriz - Majid Meghdadi
more
Samin Hamayesh - Version 41.3.1