0% Complete
Home
/
14th International Conference on Computer and Knowledge Engineering
TriMAE: Fashion visual search with Triplet Masked Auto Encoder Vision Transformer
Authors :
Lachin Zamani
1
Reza Azmi
2
1- Department of Computer Engineering, Faculty of Engineering, Alzahra University, Tehran, Iran
2- Department of Computer Engineering, Faculty of Engineering, Alzahra University, Tehran, Iran
Keywords :
Visual Search،Triplet Network،Masked Auto Encoders Vision Transformer
Abstract :
Visual search is a technology that identifies images similar to a provided query image and presents results ranked by similarity. In the realm of apparel, this innovative tool revolutionizes shopping by enabling users to effortlessly find desired items based on visual preference. Visual search remains a challenging problem despite its potential to significantly enhance user experience. The existence of differences in minute details, the presence of multiple garments in a single image, discrepancies between user-taken and catalog images, and the inherent flexibility of clothing are among the challenges associated with this issue. By selecting robust features and improving the learning of similarity and dissimilarity between images, superior results can be obtained. Consequently, a method has been proposed to yield enhanced outcomes. Convolutional Neural Networks and Vision Transformers are commonly used as the backbone of triplet neural networks for visual search tasks. These networks are designed to better learn the similarities and differences between images. In this research, we employ a combination of triplet neural networks and a masked auto-encoder vision transformer model. A triplet loss function is used during network training to learn the similarity between images. We evaluate our method on the DeepFashion In-shop dataset, which comprises different categories of clothing images. Through extensive experiments on this benchmark, our model achieves an impressive Recall@1 of 93.2% for visual search.
Papers List
List of archived papers
MIPS-Core Application Specific Instruction-Set Processor for IDEA Cryptography − Comparison between Single-Cycle and Multi-Cycle Architectures
Ahmad Ahmadi - Reza Faghih Mirzaee
Cross-project Defect Prediction with An Enhanced Transfer Boosting Algorithm
Nazgol Nikravesh - Mohammad Reza Keyvanpour
An influence maximization algorithm based on community detection using topological features
Zahra Aghaee - Afsaneh Fatemi
Deep Learning-Based Malaysian Sign Language (MSL) Recognition: Exploring the Impact of Color Spaces
Ervin Gubin Moung - Precilla Fiona Suwek - Maisarah Mohd Sufian - Valentino Liaw - Ali Farzamnia - Wei Leong Khong
Distilling Knowledge from CNN-Transformer Models for Enhanced Human Action Recognition
Hamid Ahmadabadi - Omid Nejati Manzari - Ahmad Ayatollahi
Hardware-Efficient Pruned CNN Optimized by Neural Architecture Search and Genetic Algorithm for Diabetic Retinopathy Detection on STM32F746
Omid Askari Haddad - Sara Ershadi-Nasab
FaaScaler: An Automatic Vertical and Horizontal Scaler for Serverless Computing Environments
Zahra Rezaei - Saeid Abrishami - Seid Nima Moeintaghavi
A Hybrid Echo State Network for Hypercomplex Pattern Recognition, Classification, and Big Data Analysis
Mohammad Jamshidi - Fatemeh Daneshfar
Object Detection on Detecting Skin Lesion using Dab-DETR
Sheida Shadman - Amirreza Rouhbakhshmeghrazi - Shayan Nalbandian - Bo Li - Shaghayegh Shadman - Malik Muhammad Owais Siddique
Enhanced Hate Speech Detection Using Focal Loss and Multi-Head Attention for Imbalanced Social Media Text
Ali Rezazadeh - Hadi Shahriar Shahhoseini
more
Samin Hamayesh - Version 43.7.0