Size : 4738 Images of animals
This is a partial implementation of the pipeline of this paper with some modifications and a new dataset.
The images are first trained on the following convolutional AutoEncoder model as the backbone network to extract local features of size 256 * 32 * 32. The loss is minimized by calculating MSE of the decoded output and the original image.
The highlights of the pipeline for image retrieval are summarized below.
- Generalized Mean Pooling : The outputs of the network is passed through a generalized mean pooling (GeM) layer.
- L2 norm feature selection : 200 out of 256 2D local featurea are selected based on high L2 norm values.
- Cosine Similarity : The similarity of two images are computed by calculating the cosine similarity between them.
Note : K-means clustering is not implemented here as shown in the pipeline and many steps are also excluded. This is a very simplified and modified pipeline.
This project is done as a part of the Advanced Digital Image Processing Course.
The codebase is modified from this work here.