0% Complete
Home
/
15th International Conference on Computer and Knowledge Engineering
PersianILP: Construction and Evaluation of a Standard Persian Dataset for Inductive Link Prediction
Authors :
Mohammad Rahimi
1
Afsaneh Fatemi
2
Ahmad Baraani
3
1- Dept. of Software Engineering
2- Dept. of Software Engineering
3- Dept. of Software Engineering
Keywords :
Inductive Link Prediction،Knowledge Graph Completion،PersianILP Dataset
Abstract :
Link prediction in knowledge graphs is a key task aimed at addressing the challenge of graph sparsity. In inductive link prediction, a model is trained on one graph and evaluated on another containing unseen entities. While twelve inductive datasets have been introduced for English to benchmark models in this domain, no such dataset exists for Persian. This study introduces PersianILP, the first Persian dataset designed for inductive link prediction. PersianILP is constructed through a purposeful combination of real-world data extracted from the FarsiBase knowledge graph and synthetic data generated using the DeepSeek language model. To evaluate PersianILP, key criteria such as structural and semantic diversity, statistical alignment between synthetic and real data, and adherence to inductive evaluation principles were considered. The dataset is compared with twelve benchmark datasets, including WN18RR, FB237, and NELL995. PersianILP contains 16,306 semantic triples, 10,693 entities, and 432 unique relations, exhibiting a highly sparse structure with a sparsity rate of 0.99. Evaluation using a baseline inductive link prediction model confirms the dataset’s high quality and effectiveness. Statistical analyses further demonstrate that PersianILP meets all essential requirements for research in inductive link prediction and can serve as a standard resource for studies in Persian language processing, semantic web, and recommender systems.
Papers List
List of archived papers
Age Estimation Based on Facial Images Using Hybrid Features and Particle Swarm Optimization
NILOUFAR MEHRABI - SAYED PEDRAM HAERI BOROUJENI
Farsi Text in Scene: A new dataset
Ali Salmasi - Ehsanollah Kabir
PeQa: a Massive Persian Quenstion-Answering and Chatbot Dataset
Fatemeh Zahra Arshia - Mohammad Ali Keyvanrad - Saeedeh Sadat Sadidpour - Sayyid Mohammad Reza Mohammadi
IranITJobs2021: a Dataset for Analyzing Iranian Online IT Job Advertisements Collected Using a New Crowdsourcing Process
Fakhroddin Noorbehbahani - Nikta Akbarpour - Mohammad Reza Saeidi
An Attention-Based Model for Clinical Time Series Prediction: Enhancing ICU Readmission Prediction
Hananeh Sadat Madinei - Mohammad Reza Keyvanpour - Seyed Vahab Shojaedini
Automatic Generation of XACML Code using Model-Driven Approach
Athareh Fatemian - Bahman Zamani - Marzieh Masoumi - Mehran Kamranpour - Behrouz Tork Ladani - Shekoufeh Kolahdouz Rahimi
Facial Mask Wearing Condition Detection Using SSD MobileNetV2
Amirhossein Tighkhorshid - Yasamin Borhani - Javad Khoramdel - Esmaeil Najafi
Multi-Digit Handwritten Recognition: A CNN-LSTM Hybrid Approach with Wavelet Transforms
Amin Kazempour - Jafar Tanha
Overview of Electric Vehicles Charging Stations in Smart Grids
Mohammed Wadi - Wisam Elmasry - Mohammed Jouda - Hossein Shahinzadeh - Gevork B. Gharehpetian
Adversarial Robustness Evaluation with Separation Index
Bahareh Kaviani Baghbaderani - Afsaneh Hasanebrahimi - Ahmad Kalhor - Reshad Hosseini
more
Samin Hamayesh - Version 43.7.0