0% Complete
Home
/
13th International Conference on Computer and Knowledge Engineering
AgeNet-AT: An End-to-End Model for Robust Joint Speaker Age Estimation and Gender Recognition Based on Attention Mechanism and Titanet
Authors :
Mahsa Zamani Tarashandeh
1
Amirhossein Torkanloo
2
Mohammad Hossein Moattar
3
1- Department of Electrical Engineering, Faculty of Engineering Ferdowsi University of Mashhad
2- Department of Computer Engineering Ferdowsi University of Mashhad Mashhad, Iran
3- ,Department of Computer Engineering Mashhad Branch, Islamic Azad University, Mashhad, IRAN
Keywords :
Age estimation،Gender classification،Multi-task learning،Attention mechanism،Titanet
Abstract :
Speaker age estimation has become popular in recent years due to its potential applications in various fields, including forensics and human-computer interaction. However, noise and utterance length robustness is a key factor in the performance of the approaches. In this work, a robust age estimation and gender recognition model named AgeNet-AT is proposed based on an attention mechanism and Titanet model. The proposed approach applies Titanet as the embedding extractor, and attention mechanism to create an end-to-end architecture for age estimation. Since Titanet is a model designed to distinguish different speaker identities, it is hypothesized that some of its extracted features may contain properties that can differentiate speakers’ age and gender. Therefore, Titanet is chosen as the embedding approach in this study. Additionally, an attention layer is used to focus on the most valuable features for age estimation. Furthermore, an auxiliary task of gender classification is added to the model in order to improve the estimation performance. The experiments are conducted on TIMIT dataset for different evaluation conditions, such as various utterance lengths and noise levels. The experimental results indicate the robustness of the AgeNet-AT model. The model has outperformed the state-of-the-art age estimation results on TIMIT dataset with Root Mean Square Error (RMSE) of 5.92 and 6.85 and Mean Absolute Error (MAE) of 4.30 and 4.73 for male and female speakers, respectively.
Papers List
List of archived papers
Intelligent Interpretation of Frequency Response Signatures to Diagnose Radial Deformation in Transformer Windings Using Artificial Neural Network
Reza Behkam - Hossein Karami - Mehdi Salay Naderi - Gevork B. Gharehpetian
Optimizing the controller placement problem in SDN with uncertain parameters with robust optimization
Mohammad Kazemi - AhmadReza Montazerolghaem
User Behavior Analysis : A Framework for Web Systems with Adaptive User Interfaces Using Unsupervised Modeling
Ali Bajelan - Ehsan Khadangi (Corresponding Author)
Attentional Bi-LSTM for Multivariate Time Series Forecasting on Edge Devices: A Case Study on NanoPi Neo Plus2
Navid Hajizadeh - Saeed Yazdani - Sara Ershadi-Nasab
Graph Attention Networks for Modeling Multi-Sensor Relationships in Early Prediction of Critical Events in ICU Patients
Amir Akhavan Saffar - Danial Eskandari Faruji - Javad Hamidzadeh
DEW-WIN: A Dynamic Energy-aware Window-based Scheduler for Mixed-criticality Systems
Mahin Moradiyan - Yasser Sedaghat - Pouria Hosseini - Yousef Rezazadeh
Bridging the Synthetic-to-Real Gap (BSRG): Creating Simulated Datasets for Domain Adaptation to Enhance Vehicle Detection
Behnaz Sadeghigol - Mohammad Ali Keyvanrad
A New Time Series Approach in Churn Prediction with Discriminatory Intervals
Hedieh Ahmadi - Seyed Mohammad Hossein Hasheminejad
Sum Rate Analysis and Power Allocation in Massive MIMO Systems with Power Constraints
Abdolrasoul Sakhaei Gharagezlou - Mahdi Nangir
Improving Machine Learning Classification of Heart Disease Using the Graph-Based Techniques
Abolfazl Dibaji - Sadegh Sulaimany
more
Samin Hamayesh - Version 43.7.0