0% Complete
Home
/
13th International Conference on Computer and Knowledge Engineering
AgeNet-AT: An End-to-End Model for Robust Joint Speaker Age Estimation and Gender Recognition Based on Attention Mechanism and Titanet
Authors :
Mahsa Zamani Tarashandeh
1
Amirhossein Torkanloo
2
Mohammad Hossein Moattar
3
1- Department of Electrical Engineering, Faculty of Engineering Ferdowsi University of Mashhad
2- Department of Computer Engineering Ferdowsi University of Mashhad Mashhad, Iran
3- ,Department of Computer Engineering Mashhad Branch, Islamic Azad University, Mashhad, IRAN
Keywords :
Age estimation،Gender classification،Multi-task learning،Attention mechanism،Titanet
Abstract :
Speaker age estimation has become popular in recent years due to its potential applications in various fields, including forensics and human-computer interaction. However, noise and utterance length robustness is a key factor in the performance of the approaches. In this work, a robust age estimation and gender recognition model named AgeNet-AT is proposed based on an attention mechanism and Titanet model. The proposed approach applies Titanet as the embedding extractor, and attention mechanism to create an end-to-end architecture for age estimation. Since Titanet is a model designed to distinguish different speaker identities, it is hypothesized that some of its extracted features may contain properties that can differentiate speakers’ age and gender. Therefore, Titanet is chosen as the embedding approach in this study. Additionally, an attention layer is used to focus on the most valuable features for age estimation. Furthermore, an auxiliary task of gender classification is added to the model in order to improve the estimation performance. The experiments are conducted on TIMIT dataset for different evaluation conditions, such as various utterance lengths and noise levels. The experimental results indicate the robustness of the AgeNet-AT model. The model has outperformed the state-of-the-art age estimation results on TIMIT dataset with Root Mean Square Error (RMSE) of 5.92 and 6.85 and Mean Absolute Error (MAE) of 4.30 and 4.73 for male and female speakers, respectively.
Papers List
List of archived papers
Soccer Video Event Detection Using Metric Learning
Ali Karimi - Ramin Toosi - Mohammad Ali Akhaee
Hybrid navigation based on GPS data and SIFT-based place recognition using Biologically-inspired SLAM
Sahar Salimpour Kasebi - Hadi Seyedarabi - Javad Musevi Niya
Zone-Based Federated Learning in Indoor Positioning
Omid Tasbaz - Vahideh Moghtadaiee - Bahar Farahani
Impossible differential and zero-correlatin linear cryptanalysis of Marx, Marx2, Chaskey andSpeck32
Mahshid Saberi - Nasour Bagheri - Sadegh Sadeghi
Design and Simulation of a Low PDP Full Adder by Combining Majority Function and TGDI Technique in CNTFET Technology
Mahsa Mohammadi
Fast and Accurate Motif Discovery in Protein Sequences Using Parallel Processing with OpenMP
Rahele Mohammadi - Mahmoud Naghibzadeh - Abdorreza Savadi
An intelligent linguistic error detection approach to automated diagnosis of Dyslexia disorder in Persian speaking children
Fatemeh Asghari - Mahsa Khorasani - Mohsen Kahani - Seyed Amir Amin Yazdi - Mahdi Arkhodi Ghalenoei
Speech Emotion Recognition Using a Hierarchical Adaptive Weighted Multi-Layer Sparse Auto-Encoder Extreme Learning Machine with New Weighting and Spectral/SpectroTemporal Gabor Filter Bank Features
Fatemeh Daneshfar - Seyed Jahanshah Kabudian
Generating Hand-Written Symbols With Trajectory Planning Using A Robotic Arm
Arya Parvizi - Armin Salimi-Badr
Spatio-Temporal Graph Neural Networks for Accurate Crime Prediction
Rojan Roshankar - Mohammad Reza Keyvanpour
more
Samin Hamayesh - Version 41.5.3