International Conference on Computer and Knowledge Engineering

Home / 15th International Conference on Computer and Knowledge Engineering

Towards Transparent and Accurate Story Point Estimation via Interpretable BERT-based Modeling

Authors :

Seyed Emad Baradaran Hosseini¹ Maryam Khodabakhsh² Alireza Tajary³ Seyedehfatemeh Karimi⁴

1- Master Student of Computer Engineering, Shahrood University of Technology 2- Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran 3- Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran 4- Department of Engineering Ferdowsi University of Mashhad Mashhad, Iran

Keywords :

Agile Software Development،Story Point Estimation،Natural Language Processing،BERT Classifier،Interpretable

Abstract :

This study proposes a novel approach for estimating story points in agile software projects by leveraging advanced natural language processing (NLP) models combined with interpretability techniques. Task descriptions are first transformed into semantic embedding vectors, and then classified into four categories—Small, Medium, Large, and Huge—using a BERT-based classifier. To enhance model interpretability, the CLS embedding vectors are extracted, dimensionally reduced, and clustered via K-Means to clearly reveal class boundaries and overlaps. Experimental results demonstrate a high overall accuracy of 88.71% and an average F1-score exceeding 0.87, significantly outperforming baseline methods. Analysis of confusion matrices and semantic clustering indicates challenges in distinguishing between the small and medium classes, which could be alleviated by incorporating richer contextual features. The proposed framework, by providing interpretable insights alongside robust accuracy, represents an important step towards increasing transparency and trustworthiness in intelligent story point estimation systems for agile projects. Finally, recommendations for future work include employing more advanced language models, optimizing model performance, and expanding training datasets for improved generalizability.