0% Complete
Home
/
14th International Conference on Computer and Knowledge Engineering
TriMAE: Fashion visual search with Triplet Masked Auto Encoder Vision Transformer
Authors :
Lachin Zamani
1
Reza Azmi
2
1- Department of Computer Engineering, Faculty of Engineering, Alzahra University, Tehran, Iran
2- Department of Computer Engineering, Faculty of Engineering, Alzahra University, Tehran, Iran
Keywords :
Visual Search،Triplet Network،Masked Auto Encoders Vision Transformer
Abstract :
Visual search is a technology that identifies images similar to a provided query image and presents results ranked by similarity. In the realm of apparel, this innovative tool revolutionizes shopping by enabling users to effortlessly find desired items based on visual preference. Visual search remains a challenging problem despite its potential to significantly enhance user experience. The existence of differences in minute details, the presence of multiple garments in a single image, discrepancies between user-taken and catalog images, and the inherent flexibility of clothing are among the challenges associated with this issue. By selecting robust features and improving the learning of similarity and dissimilarity between images, superior results can be obtained. Consequently, a method has been proposed to yield enhanced outcomes. Convolutional Neural Networks and Vision Transformers are commonly used as the backbone of triplet neural networks for visual search tasks. These networks are designed to better learn the similarities and differences between images. In this research, we employ a combination of triplet neural networks and a masked auto-encoder vision transformer model. A triplet loss function is used during network training to learn the similarity between images. We evaluate our method on the DeepFashion In-shop dataset, which comprises different categories of clothing images. Through extensive experiments on this benchmark, our model achieves an impressive Recall@1 of 93.2% for visual search.
Papers List
List of archived papers
Stock market prediction using multi-objective optimization
Mahshid Zolfaghari - Hamid Fadishei - Mohsen Tajgardan - Reza Khoshkangini
Speech Emotion Recognition Using a Hierarchical Adaptive Weighted Multi-Layer Sparse Auto-Encoder Extreme Learning Machine with New Weighting and Spectral/SpectroTemporal Gabor Filter Bank Features
Fatemeh Daneshfar - Seyed Jahanshah Kabudian
Multi-Layer Collaborative Graph with BPR Similarity Embedding for Recommender System
Mostafa Ghorbani - Azadeh Mansouri
Enhancing Cloud Security with Federated CNN-LSTM: A Novel Approach to Intrusion Detection
Reyhaneh Ilaghi - Raheleh Ilaghi - Fereshteh Rahmani - Seyyed hamid Ghafoori
Leveraging the Power of Object Detection Models in Identifying Litter for a Significant Reduction in Environmental Pollution
Lim Zhen Xian - Ervin Gubin Moung - Jason Teo Tze Wi - Nordin Saad - Farashazillah Yahya - Tiong Lin Rui - Ali Farzamnia
Automatic Generation of XACML Code using Model-Driven Approach
Athareh Fatemian - Bahman Zamani - Marzieh Masoumi - Mehran Kamranpour - Behrouz Tork Ladani - Shekoufeh Kolahdouz Rahimi
Cardiology Disease Diagnosis by Analyzing Histological Microscopic Images Using Deep Learning
Maria Salehpanah - Jafar Tanha - Zahra Jafari - SeyedEhsan Roshan - Sajad Rezaei
Density Estimation Helps Adversarial Robustness
Afsaneh Hasanebrahimi - Bahareh Kaviani Baghbaderani - Reshad Hosseini - Ahmad Kalhor
Enhanced Principal-curve based Classifiers for Time-series Label Prediction
Seyed Aref Hakimzadeh - Koorush Ziarati
Multi-source Ensemble Model for Scene Recognition
Amir Hossein Saleknia - Ahmad Ayatollahi
more
Samin Hamayesh - Version 42.2.1