0% Complete
Home
/
14th International Conference on Computer and Knowledge Engineering
Improve the utility of tensor cores by compacting sparse matrix technique
Authors :
Mohammad.S Abazari
1
Mahsa Zahedi
2
Abdorreza Savadi
3
1- Ferdowsi university of mashhad
2- Ferdowsi university of mashhad
3- Ferdowsi university of mashhad
Keywords :
Tensor Cores،Neural Networks،Convolution Operations،Graphics Processing Unit
Abstract :
Neural networks have demanding computational requirements, particularly in matrix multiplication operations. To address this challenge, we propose a model that combines network pruning and matrix compression techniques. Our approach leverages NVIDIA's tensor cores, which excel at efficient matrix operations. We compress the network weights based on the tensor core structure and perform convolutions using the compressed weight matrix on the tensor cores. Our model incorporates neural network pruning, mixed-precision training, and compression of network weight tensors using the im2col algorithm and CSR format. We also utilize tensor kernels with a block size of 16x16 for multiplication. We evaluate the performance of various models, including pruned, AMP-optimized, combined pruning and AMP techniques, and our proposed model. Our evaluation reveals a significant improvement in performance compared to a simple baseline model. Through an extensive analysis of related works, we establish foundational concepts, present our proposed model, and share the obtained results.
Papers List
List of archived papers
Simulating Human Visual Cortex and Recall System with Convolutional Neural Networks
Sina Saadati - Abdolah Sepahvand
Innovative Customer Segmentation based on Multi-Step Sequential Deep Clustering in the Telecommunication Industry
Fatemeh Jalali Farahani - Shima Tabibian
Graph-Theoretic Approach and Advanced Data Balancing for Liver Disease Diagnosis Improvement
Soheib Kiani - Sadegh Sulaimany
Hardware-Efficient Pruned CNN Optimized by Neural Architecture Search and Genetic Algorithm for Diabetic Retinopathy Detection on STM32F746
Omid Askari Haddad - Sara Ershadi-Nasab
Sotfware defined content popularity estimation for wireless D2D caching networks
Maede Rezaei - AhmadReza Montazerolghaem
A Deep Reinforcement Learning Approach Combining Technical and Fundamental Analyses with a Large Language Model for Stock Trading
Mahan Veisi - Sadra Berangi - Mahdi Shahbazi Khojasteh - Armin Salimi-Badr
Driving Violation Detection Using Vehicle Data and Environmental Conditions
Masood Ghasemi - Mahmood Fathy - Mohammad Shahverdy
Evolutionary Approach to GAN Hyperparameter Tuning: Minimizing Discriminator and Generator Loss Functions
Sajad Haghzad Klidbary - Anahita Babaei - Ramin Ghorbani
Deep Inside Tor: Exploring Website Fingerprinting Attacks on Tor Traffic in Realistic Settings
Amirhossein Khajehpour - Farid Zandi - Navid Malekghaini - Mahdi Hemmatyar - Naeimeh Omidvar - Mahdi Jafari Siavoshani
Adaptive Pattern Reconstruction Using Linear Regression for Improved TPS Anomaly Detection
Ali Azarsina - Alireza Safarzadeh - MohammadReza Jamali - Abdolhossein Vahabie
more
Samin Hamayesh - Version 43.7.0