Tuesday, October 3, 2017

Similar image finder by CNN and Distance

Overview

On this article, I’ll show one of the methods to find similar images to some specific target image.

Usually, when we try to make the system to find some similar items, we have some choices and should choose one or some of them in response to the purpose. Here, I’ll adapt distance-based method using supervised learning model’s prediction.

For example, when we try to find the images which are similar to the leftmost image, the other images below are picked up by this.
enter image description here



How to find similar image?


Actually, there are some methods to make the system to find similar images. We should choose the method in response to the purpose and other factors such as resources.

It is one of the most popular ways to use distance.
enter image description here

For example, on the image above, when we try to find the similar item to the red circle, we calculate two distance information (red, blue) and (red, green). On this case, the distance between red and blue circles is shorter than the distance between red and green circles. So we can judge the blue circle is more similar to the red circle than the green circle is.

There are some types of distance. About this point I’ll write later.
Anyway, as an approach, I used the distance-based method. The point is how to get features for distance calculation.
Roughly, the features should express the original image especially from the viewpoint of differences between images. And we can cite some ideas for that.

For example, the output of autoencoder’s hidden layer can be used as the features. On the image below, when you make model, input and output layers have same image. This network is horizontally symmetric and the red circles layer has the smallest number of nodes. We can use the red circle’s output as the features.

enter image description here

I think this is very good-manner way. But on this article, I adapted simpler method.
When we make neural network model for classification, softmax function can be used to give scores to each labels. Usually, when we want to know the predicted image’s label, we just check the label which has the highest score.

Here, for the calculation of similarity, those scores can be regarded as the features.

enter image description here

On the image above, the network gives scores to each label, meaning the red circles. I used the scores for similarity calculation.

Data


For this experiment, on this article, I used cifar-10, color image data set. This has 10 classes.
The mosaic image below is composed of some images of it.

enter image description here

Classification model


To get scores by softmax function, of course we need to make classification model at first. This time, I’ll use InceptionV3 fine-tuning model.
About fine-tuning model itself, please check the article below.

How to make Fine tuning model by Keras

Fine-tuning is one of the important methods to make big-scale model with a small amount of data. Usually, deep learning model needs a massive amount of data for training. But it is not always easy to get enough amount of data for that.


From now on, let’s make the mode.

import random
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Dropout
from keras.optimizers import SGD
from keras.applications.inception_v3 import InceptionV3
from keras.datasets import cifar10
from keras.utils import to_categorical
import cv2
import numpy as np

# read data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

The code above is to import libraries and load data, cifar-10.
Because fine-tuning model is used here, the amount of data can be smaller.

Actually, on my laptop, Macbook Pro, it takes too much time to use all of it. So I’ll just limit the amount of data for training. To be added, about the image size, fine-tuning model needs certain size or bigger.

# limit the amount of the data
# train data
ind_train = random.sample(list(range(x_train.shape[0])), 2000)
x_train = x_train[ind_train]
y_train = y_train[ind_train]

def resize_data(data):
    data_upscaled = np.zeros((data.shape[0], 140, 140, 3))
    for i, img in enumerate(data):
        large_img = cv2.resize(img, dsize=(140, 140), interpolation=cv2.INTER_CUBIC)
        data_upscaled[i] = large_img

    return data_upscaled

# resize train and  test data
x_train_resized = resize_data(x_train)
x_test_resized = resize_data(x_test)

# make explained variable hot-encoded
y_train_hot_encoded = to_categorical(y_train)
y_test_hot_encoded = to_categorical(y_test)

Until here, the data preparation was done. Now we can tackle with the fine tuning. Simply it takes the following steps.
  • load trained model
  • add new layers
  • select the layers to train
  • re-training

Fine-tuning on keras needs some procedures which don’t usually appear on other work. So about the detail, please check the article below.

How to make Fine tuning model by Keras

Fine-tuning is one of the important methods to make big-scale model with a small amount of data. Usually, deep learning model needs a massive amount of data for training. But it is not always easy to get enough amount of data for that.


inc_model = InceptionV3(weights='imagenet', include_top=False)

# get layers and add average pooling layer
x = inc_model.output
x = GlobalAveragePooling2D()(x)

# add fully-connected layer
x = Dense(512, activation='relu')(x)

# add output layer
predictions = Dense(10, activation='softmax')(x)

model = Model(inputs=inc_model.input, outputs=predictions)

# freeze pre-trained model area's layer
for layer in inc_model.layers:
    layer.trainable = False

# update the weight that are added
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit(x_train_resized, y_train_hot_encoded)

# choose the layers which are updated by training
layer_num = len(model.layers)
for layer in model.layers[:279]:
    layer.trainable = False

for layer in model.layers[279:]:
    layer.trainable = True

# training
model.compile(optimizer=SGD(lr=0.01, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(x_train_resized, y_train_hot_encoded, batch_size=128, epochs=5, shuffle=True,  validation_split=0.3)

The model was re-trained.
By the model, we can get the scores by softmax function. The scores are like below.

def predict_scores(img):
    resized_img = resize_data(img)
    return history.model.predict(resized_img)

print(predict_scores(x_test[:5]))
[[  1.61296407e-06   9.51382099e-05   2.37692002e-05   9.76684690e-01
    4.52165346e-04   2.16267668e-02   7.62085139e-04   1.12884321e-04
    2.09616283e-05   2.19967915e-04]
 [  3.01904708e-01   1.32671535e-01   1.43772051e-01   6.68983087e-02
    1.24166757e-02   9.27795283e-03   1.82968169e-03   3.77773158e-02
    1.57927290e-01   1.35524467e-01]
 [  6.83718361e-03   4.18923125e-02   1.57647650e-03   7.44932215e-04
    1.16984920e-04   8.38102715e-05   8.74071702e-05   1.66274331e-04
    9.28343117e-01   2.01514605e-02]
 [  1.94290634e-02   2.89632641e-02   2.46831984e-03   3.44800651e-02
    6.95743016e-04   3.16253165e-03   1.41907483e-03   5.08636050e-03
    1.34879360e-02   8.90807629e-01]
 [  6.60003908e-03   2.27375864e-03   2.24278837e-01   1.55730307e-01
    1.05771888e-02   2.47983169e-02   5.67786813e-01   4.16748226e-03
    1.84030388e-03   1.94699306e-03]]

Find similar images


Roughly, all the flow to find similar images are as followings.
By prediction, the images gets the scores.

enter image description here

When we want to get the similar images to “image-1”, the distances between “image-1“‘s score and other image’s scores show the similar images. Concretely, the shorter the distance is, the more similar the image is to “image-1”.

Anyway on this small experiment, it takes much time to predict all the test images. So I just limited the data size.

# sampling test data
ind_test = random.sample(list(range(x_train.shape[0])), 2000)
x_test = x_test[ind_test]

On the code below, the scores are stored in the variable, predicted.

predicted = predict_scores(x_test)

The library, scipy, makes it easy to get the distance of each scores. Here, I don’t talk about the kind of distance. But actually, there are some kinds of distances and we should think about the proper one matching the situation. For example, although I don’t use here, on the case of natural language and images, cosine distance can be strong candidate.

from scipy.spatial.distance import cdist

def find_similar_image(target, predicted, imgs, n):
    score = predict_scores(np.array([target]))
    return [target, imgs[np.argsort(cdist(score, predicted))[0][:n]]]

similar_imgs = find_similar_image(x_test[11], predicted, x_test, 10)

To visually check if it really got the similar images, we can make a plot.

def make_plot(similar_imgs):
    plt.figure(figsize=(32,32))
    plt.subplot(1, 11, 1)
    plt.imshow(similar_imgs[0])
    for i, img in enumerate(similar_imgs[1]):
        plt.subplot(1, 11, i+2)
        plt.imshow(img)
    plt.show()

make_plot(similar_imgs)
enter image description here

On my case , this outputs this plot. The leftmost image is the target image and other ten images are the similar images to the target image.
On this example, the target data is included in the test data. So, the image, second from the left, is same as the left.

Let’s show some images.
Cat
enter image description here

Cat again
enter image description here

Flog
enter image description here

Bird
enter image description here

Truck
enter image description here

With rough model, rough distance function, rough sampling, the accuracy is so, so. With proper model, features, distance function, sampling, the outcome will become more interesting.