0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
Degarbayan-SC: A Colloquial Paraphrase Farsi Subtitles Dataset
Authors :
Mohammad Javad Aghajani
1
Mohammad Ali Keyvanrad
2
1- Maleke-ashtar University of Technology
2- Maleke-ashtar University of Technology
Keywords :
colloquial dataset،deep learning،movie subtitles،Natural Language Processing،paraphrase detection،paraphrase generation،Persian dataset،supervised learning
Abstract :
Paraphrase generation and paraphrase detection are important tasks in Natural Language Processing (NLP), such as information retrieval, text simplification, question answering, and chatbots. The lack of comprehensive datasets in the Persian paraphrase is a major obstacle to progress in this area. In spite of their importance, no large-scale corpus has been made available so far, given the difficulties in its creation and the intensive labor required. In this paper, the construction process of Degarbayan-SC using movie subtitles, together with some of the difficulties we experienced during data extraction and sentence alignment, is addressed. As you know, movie subtitles are in Colloquial language. It is different from formal language. To the best of our knowledge, Degarbayan-SC is the first freely released large-scale (in the order of a million words) Persian paraphrase corpus. Furthermore, this newly introduced dataset will help the growth of Persian paraphrase. We have tested our dataset on neural network models and compared the performances of different attention-based models (transformers) and the GRU model on it. We have also declared the sentences generated by the neural networks and performed human metrics on them.
Papers List
List of archived papers
Word-level Persian Lipreading Dataset
Javad Peymanfard - Ali Lashini - Samin Heydarian - Hossein Zeinali - Nasser Mozayani
Leveraging the Power of Object Detection Models in Identifying Litter for a Significant Reduction in Environmental Pollution
Lim Zhen Xian - Ervin Gubin Moung - Jason Teo Tze Wi - Nordin Saad - Farashazillah Yahya - Tiong Lin Rui - Ali Farzamnia
Adaptive Multi-Scale Attentional Network for Semantic Segmentation of Remote Sensing Images
Melika Zare - Sattar Hashemi
HiCAP: Hierarchical Clustering-based Attention Pooling for Graph Representation Learning
Parsa Haddadian - Rooholah Abedian - Ali Moeini
Improving Machine Learning Classification of Heart Disease Using the Graph-Based Techniques
Abolfazl Dibaji - Sadegh Sulaimany
Classification of COVID-19 and Nodule in CT Images using Deep Convolutional Neural Network
Amirhossein Ghaemi - Seyyed Amir Mousavi mobarakeh - Habibollah Danyali - Kamran Kazemi
Attention-Boosted Ensemble of Pre-trained Convolutional Neural Networks for Accurate Diabetic Retinopathy Detection
Benyamin Mirab Golkhatmi - Mohammad Hossein Moattar
Fine-tuned Generative Adversarial Network-based Model for Medical Image Super-Resolution
Alireza Aghelan - Modjtaba Rouhani
Improving the classification of high dimensional class-imbalanced data using the Chaos particle swarm optimization with Levy Flight
Mohammad Ali Zarif - Javad Hamidzadeh
Non-Functional Requirement Extracting Methods for AI-based Systems: A Survey
Reza Damirchi - Amineh Amini
more
Samin Hamayesh - Version 41.7.6