0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
PeQa: a Massive Persian Quenstion-Answering and Chatbot Dataset
Authors :
Fatemeh Zahra Arshia
1
Mohammad Ali Keyvanrad
2
Saeedeh Sadat Sadidpour
3
Sayyid Mohammad Reza Mohammadi
4
1- Faculty of Electrical & Computer Engineering Malek-Ashtar University of Technology Tehran, Iran
2- Faculty of Electrical & Computer Engineering Malek-Ashtar University of Technology Tehran, Iran
3- Faculty of Electrical & Computer Engineering Malek-Ashtar University of Technology Tehran, Iran
4- Faculty of Electrical & Computer Engineering Malek-Ashtar University of Technology Tehran, Iran
Keywords :
Question-Answering System،Tweeter Dataset،Persian QA،Chatbot
Abstract :
TA question-answering (QA) system is an application able to communicate with humans using natural language processing. Modelling a dialogue between humans and machines is considered one of the most important tasks of Artificial Intelligence (AI). Creating a Chatbot with a good performance in modelling human-machine conversations is still one of the unsolved challenges in this field. Although Chatbots have many applications, in general, they should understand users’ meaning through their words and provide them with relevant answers. In the past, Chatbot architectures mainly relied on rules or statistical methods. With the advent of deep learning methods, trainable neural networks soon replaced the traditional models. These sorts of deep models are highly affected by the dataset that would be fed into them, and there is no big enough one available in the Persian language! We present a huge dataset of 14 million Persian tweets from tweeter that is meticulously processed to create a rich collection of 420,000 pairs of question-answer data. We also present modelling results on Transformers, including Sensibleness and Specificity Average (SSA) and the BLEU metric. We will release our dataset, modelling code, and models publicly.
Papers List
List of archived papers
Enhanced Principal-curve based Classifiers for Time-series Label Prediction
Seyed Aref Hakimzadeh - Koorush Ziarati
Graph-Theoretic Approach and Advanced Data Balancing for Liver Disease Diagnosis Improvement
Soheib Kiani - Sadegh Sulaimany
Hybrid navigation based on GPS data and SIFT-based place recognition using Biologically-inspired SLAM
Sahar Salimpour Kasebi - Hadi Seyedarabi - Javad Musevi Niya
Optimizing Foreign Exchange Trading Performance Through Reinforcement Machine Learning Framework
Ervin Gubin Moung - Hani Yasmin Binti Murnizam - Maisarah Mohd Sufian - Valentino Liaw - Ali Farzamnia - Lorita Angeline
TriFuse-PdM: High-Fidelity Machine Failure Prediction Using Hybrid Resampling and Model Calibration
Saghar Shafaati - Javad Mohammadzadeh
An effective hybrid algorithm for locating splicing forgery image
Seyed Hesamoddin Hosseini - Amene Vatanparast - Amir Hossein Taherinia
Designing an IT2 Fuzzy Rule-based System for Emotion Recognition Using Biological Data
Mahsa Keshtkar - Hooman Tahayori
Uncertainty-Aware Deep Ensembles for Confident Customer Churn Prediction with Rejection Option
Fatemeh Moradi - Mehran Tarif - Mohammadhossein Homaei
A Systematic Embedded Software Design Flow for Robotic Applications
Navid Mahdian - Seyed-Hosein Attarzadeh-Niaki - Armin Salimi-Badr
Detecting Non-Spherical Clusters Using Modified CURE Algorithm
Arezou Safdari - Pedram Salehpour
more
Samin Hamayesh - Version 43.7.0