0% Complete
Home
/
12th International Conference on Computer and Knowledge Engineering
PeQa: a Massive Persian Quenstion-Answering and Chatbot Dataset
Authors :
Fatemeh Zahra Arshia
1
Mohammad Ali Keyvanrad
2
Saeedeh Sadat Sadidpour
3
Sayyid Mohammad Reza Mohammadi
4
1- Faculty of Electrical & Computer Engineering Malek-Ashtar University of Technology Tehran, Iran
2- Faculty of Electrical & Computer Engineering Malek-Ashtar University of Technology Tehran, Iran
3- Faculty of Electrical & Computer Engineering Malek-Ashtar University of Technology Tehran, Iran
4- Faculty of Electrical & Computer Engineering Malek-Ashtar University of Technology Tehran, Iran
Keywords :
Question-Answering System،Tweeter Dataset،Persian QA،Chatbot
Abstract :
TA question-answering (QA) system is an application able to communicate with humans using natural language processing. Modelling a dialogue between humans and machines is considered one of the most important tasks of Artificial Intelligence (AI). Creating a Chatbot with a good performance in modelling human-machine conversations is still one of the unsolved challenges in this field. Although Chatbots have many applications, in general, they should understand users’ meaning through their words and provide them with relevant answers. In the past, Chatbot architectures mainly relied on rules or statistical methods. With the advent of deep learning methods, trainable neural networks soon replaced the traditional models. These sorts of deep models are highly affected by the dataset that would be fed into them, and there is no big enough one available in the Persian language! We present a huge dataset of 14 million Persian tweets from tweeter that is meticulously processed to create a rich collection of 420,000 pairs of question-answer data. We also present modelling results on Transformers, including Sensibleness and Specificity Average (SSA) and the BLEU metric. We will release our dataset, modelling code, and models publicly.
Papers List
List of archived papers
Bipartite link prediction improvement using the effective utilization of edge betweenness centrality
Sadegh Sulaimany Sulaimany - Yasin Amini
SUT: a new multi-purpose synthetic dataset for Farsi document image analysis
Elham Shabaninia - Fatemeh sadat Eslami - Ali Afkari Fahandari - Hossein Nezamabadi-pour
Improvement of Credit Scoring by LSTM Autoencoder Model
Milad Sattari Maleki - Seyedeh Niusha Motevallian - Faezehsadat Hosseini - Mohammad Sabokrou - Hamidreza Soltanalizadeh Maleki
Disturbance Rejection in Quadruple-Tank System by Proposing New Method in Reinforcement Learning
Alireza Nezamzadeh - Mohammadreza Esmaeilidehkordi
Weakly Supervised Learning in a Group of Learners with Communication
Ali Ganjbakhsh - Ahad Harati
Leveraging Self-Supervised Models for Automatic Whispered Speech Recognition
Aref Farhadipour - Homa Asadi - Volker Dellwo
Link Prediction for Recommendation based on Complex Representation of Items Similarities
Masoumeh Alinia - Seyed Mohammad Hossein Hasheminejad - Hadi Shakibian
An Exploratory Study of the Relationship between SATD and Other Software Development Activities
Shima Esfandiari - Ashkan Sami
A New Application of Machine Learning Based Methods for Disk Space Variation Fault Diagnosis in Transformer Windings
Reza Behkam - Amir Lotfi - Gevork B. Gharehpetian
Depression Diagnosis Using Optimization of Nonlinear EEG Features Based on Parametric Learning Tactics
Ali Asadi Zeidabadi - Melika Changizi - Mahdi Zolfagharzadeh Kermani - Sara Bargi Barkouk
more
Samin Hamayesh - Version 42.4.1