International Conference on Computer and Knowledge Engineering

Home / 12th International Conference on Computer and Knowledge Engineering

MultiPath ViT OCR: A Lightweight Visual Transformer-based License Plate Optical Character Recognition

Authors :

Alireza Azadbakht¹ Saeed Reza Kheradpisheh² Hadi Farahani³

1- Shahid Beheshti University 2- Shahid Beheshti University 3- Shahid Beheshti University

Keywords :

Visual Transformer،Optical Character Recognition (OCR)،License Plate OCR،Persian License Plate OCR

Abstract :

Because of natural conditions of license plates images, the Optical Character Recognition (OCR) of these images is generally a challenging problem, and it is utilized in edge devices with limited computation power. Despite the considerable progress of deep neural networks, the state-of-the-art models are not always a good solution for this problem. Most of the models have a large number of parameters and in practice, they need a lot of resources to train, maintain and implement on edge devices. We propose a lightweight model based on Visual Transformer architecture and we achieve competitive results against traditional CRNN models, due to the lack of a rich and large scale dataset for Persian license plates we gather and annotate 1.3M images of license plates in various natural conditions from a different point of views and different cameras. We call this dataset as LicenseNet. Our proposed model achieves 77.25% accuracy against CNN models with 75.18% accuracy and embedded OCR models in cameras with 60.37% accuracy on the LicenseNet test set. Furthermore, we achieved better accuracy with 3.21 times fewer number of training parameters in comparison to previously proposed models.