0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
Degarbayan-SC: A Colloquial Paraphrase Farsi Subtitles Dataset
Authors :
Mohammad Javad Aghajani
1
Mohammad Ali Keyvanrad
2
1- Maleke-ashtar University of Technology
2- Maleke-ashtar University of Technology
Keywords :
colloquial dataset،deep learning،movie subtitles،Natural Language Processing،paraphrase detection،paraphrase generation،Persian dataset،supervised learning
Abstract :
Paraphrase generation and paraphrase detection are important tasks in Natural Language Processing (NLP), such as information retrieval, text simplification, question answering, and chatbots. The lack of comprehensive datasets in the Persian paraphrase is a major obstacle to progress in this area. In spite of their importance, no large-scale corpus has been made available so far, given the difficulties in its creation and the intensive labor required. In this paper, the construction process of Degarbayan-SC using movie subtitles, together with some of the difficulties we experienced during data extraction and sentence alignment, is addressed. As you know, movie subtitles are in Colloquial language. It is different from formal language. To the best of our knowledge, Degarbayan-SC is the first freely released large-scale (in the order of a million words) Persian paraphrase corpus. Furthermore, this newly introduced dataset will help the growth of Persian paraphrase. We have tested our dataset on neural network models and compared the performances of different attention-based models (transformers) and the GRU model on it. We have also declared the sentences generated by the neural networks and performed human metrics on them.
Papers List
List of archived papers
Intelligent Interpretation of Frequency Response Signatures to Diagnose Radial Deformation in Transformer Windings Using Artificial Neural Network
Reza Behkam - Hossein Karami - Mehdi Salay Naderi - Gevork B. Gharehpetian
Stock market prediction using multi-objective optimization
Mahshid Zolfaghari - Hamid Fadishei - Mohsen Tajgardan - Reza Khoshkangini
Probabilistic Short-Term Load Forecasting Using GBDT-Based Sister Forecasts and Ensemble Methods
Hossein Shahinzadeh - Hamed Nafisi - Amirafshin Zamani - Saiedeh Mehrabani-Najafabadi - Arezou Mahmoudi - Farshad Ebrahimi
Span-prediction of Unknown Values for Long-sequence Dialogue State Tracking
Marzieh Naghdi Dorabati - Reza Ramezani - Mohammad Ali Nematbakhsh
Instance Selection from Skewed Class Distributions by Using the multi-objective optimizer
Mona Moradi - Javad Hamidzadeh
A Comparative Analysis of Clinical Note Categories for Mortality Prediction in ICU Patients
Maryam Karrabi - Mohsen Kahani - Mina Afzali - Nadieh Armin
Financial Market Prediction Using Deep Neural Networks with Hardware Acceleration
Dara Rahmati - Mohammad Hadi Foroughi - Ali Bagherzadeh - Mehdi Foroughi - Saeid Gorgin
Deep Learning-based Processing of Autonomous Vehicle Radar Data to Achieve High Resolution
Nima Abdolrahimi Shahamat - Vahideh Moghtadaiee - Esfandiar Mehrshahi
Multi-Layered Defense Against Modern Phishing: A Dual-Sandbox and CDR Approach
Mahdi Seyfipoor - Mohammad Mahdi Eskandari
SingAll: Scalable Control Flow Checking for Multi-Process Embedded Systems
Mehdi Amininasab - Ahmad Patooghy - Mahdi Fazeli
more
Samin Hamayesh - Version 42.7.0