0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
Degarbayan-SC: A Colloquial Paraphrase Farsi Subtitles Dataset
Authors :
Mohammad Javad Aghajani
1
Mohammad Ali Keyvanrad
2
1- Maleke-ashtar University of Technology
2- Maleke-ashtar University of Technology
Keywords :
colloquial dataset،deep learning،movie subtitles،Natural Language Processing،paraphrase detection،paraphrase generation،Persian dataset،supervised learning
Abstract :
Paraphrase generation and paraphrase detection are important tasks in Natural Language Processing (NLP), such as information retrieval, text simplification, question answering, and chatbots. The lack of comprehensive datasets in the Persian paraphrase is a major obstacle to progress in this area. In spite of their importance, no large-scale corpus has been made available so far, given the difficulties in its creation and the intensive labor required. In this paper, the construction process of Degarbayan-SC using movie subtitles, together with some of the difficulties we experienced during data extraction and sentence alignment, is addressed. As you know, movie subtitles are in Colloquial language. It is different from formal language. To the best of our knowledge, Degarbayan-SC is the first freely released large-scale (in the order of a million words) Persian paraphrase corpus. Furthermore, this newly introduced dataset will help the growth of Persian paraphrase. We have tested our dataset on neural network models and compared the performances of different attention-based models (transformers) and the GRU model on it. We have also declared the sentences generated by the neural networks and performed human metrics on them.
Papers List
List of archived papers
Intensity-Image Reconstruction Using Event Camera Data by Changing in LSTM Update
Arezoo Rahmati Soltangholi - Ahad Harati - Abedin Vahedian
Reliability Evaluation of 4:2 Compressors Based on Hammock Networks
Farshad Safaei - Mohammad mahdi Emadi Kouchak - Sara Talebpour
Weakly Supervised Convolutional Neural Network for Automatic Gleason Grading of Prostate Cancer
Maryam Kamareh - Mohammad Sadegh Helfroush - Kamran Kazemi
SAT Based Analogy Evaluation Framework For Persian Word Embeddings
Seyed Ehsan Mahmoudi - Mehrnoush Shamsfard
Optimal PMU Placement Considering Reliability of Measurement System in Smart Grids
Mohammad Shahraeini - Shahla Khormali - Ahad Alvandi
EEMC: Energy Efficient Multi-Clustering Using Grey Wolf Optimizer in WSNs
Maryam Ghorbanvirdi - Sayyed Majid Mazinani
An influence maximization algorithm based on community detection using topological features
Zahra Aghaee - Afsaneh Fatemi
Lempel-Ziv-based Hyper-Heuristic Solution for Longest Common Subsequence Problem
Mahdi Nasrollahi - Reza Shami Tanha - Mohsen Hooshmand
Enhancing Lighter Neural Network Performance with Layer-wise Knowledge Distillation and Selective Pixel Attention
Siavash Zaravashan - Sajjad Torabi - Hesam Zaravashan
Sports News Summarization Using Ensebmle Learning
Moein Sartakhti.salimi@gmail.com - Mohammad Javad Maleki Kahaki - Ahmad Yoosofan - Seyyed Vahid Moravvej
more
Samin Hamayesh - Version 41.5.3