0% Complete
Home
/
13th International Conference on Computer and Knowledge Engineering
Distilling Knowledge from CNN-Transformer Models for Enhanced Human Action Recognition
Authors :
Hamid Ahmadabadi
1
Omid Nejati Manzari
2
Ahmad Ayatollahi
3
1- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
2- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
3- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
Keywords :
Action Recognition،Still Image،Deep Learning،Knowledge Distillation
Abstract :
This paper presents a study on improving human action recognition through the utilization of knowledge distillation, and the combination of CNN and ViT models. The research aims to enhance the performance and efficiency of smaller student models by transferring knowledge from larger teacher models. The proposed method employs a Transformer vision network as the student model, while a convolutional network serves as the teacher model. The teacher model extracts local image features, whereas the student model focuses on global features using an attention mechanism. The Vision Transformer (ViT) architecture is introduced as a robust framework for capturing global dependencies in images. Additionally, advanced variants of ViT, namely PVT, Convit, MVIT, Swin Transformer, and Twins, are discussed, highlighting their contributions to computer vision tasks. The ConvNeXt model is introduced as a teacher model, known for its efficiency and effectiveness in computer vision. The paper presents performance results for human action recognition on the Stanford 40 dataset, comparing the accuracy and mAP of student models trained with and without knowledge distillation. The findings illustrate that the suggested approach significantly improves the accuracy and mAP when compared to training networks under regular settings. These findings emphasize the potential of combining local and global features in action recognition tasks.
Papers List
List of archived papers
A Chaotic Crow Search Algorithm for Overlapping Clustering
Mostafa Sabzekar - Seyed Vahid Mousavainejad
A Hybrid Architecture to Optimize Persian FAQ Retrieval using Semantic Similarity Search
Seyed Amir Mohammad Hosseini - Fatemeh Dehbashi - Setare Kahnemuee - Mohsen Kahani - Morteza Fardin
A Semi-supervised Fake News Detection using Sentiment Encoding and LSTM with Self-Attention
Pouya Shaeri - Ali Katanforoush
Sum Rate Analysis and Power Allocation in Massive MIMO Systems with Power Constraints
Abdolrasoul Sakhaei Gharagezlou - Mahdi Nangir
Blind image quality assessment based on Multi-resolution Local Structures
Seyed Majid Khorashadizadeh - Mehdi Sadeghi Bakhi - Fatemeh Seifishahpar - AliMohammad Latif
Multi-Digit Handwritten Recognition: A CNN-LSTM Hybrid Approach with Wavelet Transforms
Amin Kazempour - Jafar Tanha
Prediction of West Texas Intermediate Crude-oil Price Using Hybrid Attention-based Deep Neural Networks: A Comparative Study
Alireza Jahandoost - Mahboobeh Houshmand - Seyyed Abed Hosseini
Decentralized Federated Learning in IoT Environments: A Hierarchical Approach
Majid Mohammadpour - Seyedakbar Mostafavi
An Interactive Approach for Query-based Multi-Document Scientific Text Summarization
Mohammadsadra Nejati - Azadeh Mohebi - Abbas Ahmadi
Damage Detection After the Earthquake Using Sentinel-1 and 2 Images and Machine Learning Algorithms (Case Study: Sarpol-e Zahab Earthquake)
Niloofar Alizadeh - Behnam Asghari Beirami - Mehdi Mokhtarzade
more
Samin Hamayesh - Version 43.7.0