0% Complete
Home
/
13th International Conference on Computer and Knowledge Engineering
Distilling Knowledge from CNN-Transformer Models for Enhanced Human Action Recognition
Authors :
Hamid Ahmadabadi
1
Omid Nejati Manzari
2
Ahmad Ayatollahi
3
1- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
2- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
3- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
Keywords :
Action Recognition،Still Image،Deep Learning،Knowledge Distillation
Abstract :
This paper presents a study on improving human action recognition through the utilization of knowledge distillation, and the combination of CNN and ViT models. The research aims to enhance the performance and efficiency of smaller student models by transferring knowledge from larger teacher models. The proposed method employs a Transformer vision network as the student model, while a convolutional network serves as the teacher model. The teacher model extracts local image features, whereas the student model focuses on global features using an attention mechanism. The Vision Transformer (ViT) architecture is introduced as a robust framework for capturing global dependencies in images. Additionally, advanced variants of ViT, namely PVT, Convit, MVIT, Swin Transformer, and Twins, are discussed, highlighting their contributions to computer vision tasks. The ConvNeXt model is introduced as a teacher model, known for its efficiency and effectiveness in computer vision. The paper presents performance results for human action recognition on the Stanford 40 dataset, comparing the accuracy and mAP of student models trained with and without knowledge distillation. The findings illustrate that the suggested approach significantly improves the accuracy and mAP when compared to training networks under regular settings. These findings emphasize the potential of combining local and global features in action recognition tasks.
Papers List
List of archived papers
Adaptive Ensemble Learning for Software Defect Prediction: A Dynamic Weighted Hybrid Model Using SVM, DT, and ANFIS-PSO
Mohsen EsfandyariDoulabi - Amin Esfandiyari Doulabi - Javad Khaligh
Dual Memory Structure for Memory Augmented Neural Networks for Question-Answering Tasks
Amir Bidokhti - Shahrokh Ghaemmaghami
TD-PINNs: Efficient Shared-Memory Parallelization of Physics-Informed Neural Networks for Time-Dependent PDEs
Mahdi Movahedian Moghaddam - Kourosh Parand
ExaASC: A General Target-Based Stance Detection Corpus in Arabic Language
Mohammad Mehdi Jaziriyan - Ahmad Akbari - Hamed Karbasi
FarCQA: A Farsi Community Dataset for Question Classification and Answer Selection
Saba Emami - Maedeh Mosharraf
Optimizing Foreign Exchange Trading Performance Through Reinforcement Machine Learning Framework
Ervin Gubin Moung - Hani Yasmin Binti Murnizam - Maisarah Mohd Sufian - Valentino Liaw - Ali Farzamnia - Lorita Angeline
An Analysis of Botnet Detection Using Graph Neural Network
Faezeh Alizadeh - Mohammad Khansari
Analysis of Address Lifespans in Bitcoin and Ethereum
Amir Mohammad Karimi Mamaghan - Amin Setayesh - Behnam Bahrak
Smart Home Connectivity: Identifying the Best IoT Application Layer Protocols
Hossein Shahinzadeh - Zohreh Azani - Sundus F. Al-Hameedawi - S. Mohammadali Zanjani - Saiedeh Mehrabani-Najafabadi - Mohammadreza Hemmati
Traffic Sign Recognition Using Local Vision Transformer
Ali Farzipour - Omid Nejati Manzari - Shahriar B. Shokouhi
more
Samin Hamayesh - Version 43.7.0