Overview
Fine-tuning is one of the important methods to make big-scale model with a small amount of data.Usually, deep learning model needs a massive amount of data for training. But it is not always easy to get enough amount of data for that. To be added, in many cases, it takes much time to make model from the viewpoint of training. I know you don’t like to see one epoch of training using the time from sunrise to sunset. In some areas like image classification, you can use fine-tune method to solve this situation.
For example, when you try to make image classification model, very deep CNN model works well(sometimes and other time not). To make that kind of model, it is necessary to prepare a huge amount of data. However, by using the model trained by other data, it is enough to add one or some layers to that model and train those. It saves much time and data.
Here, I show this type of method, fine-tuning, by Keras.
What is fine-tuning?
Fine-tuning is simple method. It uses already trained network and re-trains the part of that by the new data set.On the image below, each line means the weight. On this case, the neural network architecture which is made by red lines and blue nodes is fixed and the red line, meaning weight, is already trained by huge amount of data. You can add one or some layers just after the architecture and train just those part(sometimes including some layers before).
Data
This time, I used cifar-10, the free color image data sets, for training and evaluating. This data set has 10 class and huge amount of color images per class. The images below is the parts of cifa-10 datasets.On this article, the purpose is to try some fine-tuning models. The advantages of that are to save time and the amount of data for training. So, I randomly limited the amount of data to 1000 for training and 1000 for evaluating.
Model
From Keras, we can easily use some image classification models.- Xception
- VGG16
- VGG19
- ResNet50
- InceptionV3
Those models are huge scaled and already trained by huge amount of data. About details, you can check Applications page of Keras’s official documents. And I strongly recommend to check and read the article of each model to deepen the know-how about neural network architecture.
References
- Xception: Deep Learning with Depthwise Separable Convolutions
- Very Deep Convolutional Networks for Large-Scale Image Recognition
- Deep Residual Learning for Image Recognition
- Rethinking the Inception Architecture for Computer Vision
Check one by one
The whole code is shown at the bottom of the article. From here, I just show how to make fine-tuning model by Keras step by step.Import libraries
This just imports necessary libraries.import random
import cv2
from keras.datasets import cifar10
from keras.utils import to_categorical
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras.optimizers import SGD
from keras.applications.inception_v3 import InceptionV3
from keras.applications.resnet50 import ResNet50
from keras.applications.vgg19 import VGG19
from keras.applications.vgg16 import VGG16
from keras.applications.xception import Xception
import numpy as np
Making data
This prepares data for training and evaluation. There are three points to take care about.- Limit the amount of data
- Resize the images
- Make explained variable hot-encoded
# read data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# limit the amount of the data
# train data
ind_train = random.sample(list(range(x_train.shape[0])), 1000)
x_train = x_train[ind_train]
y_train = y_train[ind_train]
# test data
ind_test = random.sample(list(range(x_test.shape[0])), 1000)
x_test = x_test[ind_test]
y_test = y_test[ind_test]
def resize_data(data):
data_upscaled = np.zeros((data.shape[0], 320, 320, 3))
for i, img in enumerate(data):
large_img = cv2.resize(img, dsize=(320, 320), interpolation=cv2.INTER_CUBIC)
data_upscaled[i] = large_img
return data_upscaled
# resize train and test data
x_train_resized = resize_data(x_train)
x_test_resized = resize_data(x_test)
# make explained variable hot-encoded
y_train_hot_encoded = to_categorical(y_train)
y_test_hot_encoded = to_categorical(y_test)
If I do it in good manner, I should care about the data amount balance between the classes. In some cases, simple random number way leads to not-favorite situation. So actually when you do that, it is better to check the balance after sampling. Set model
This code sets fine-tuning. On this article, this part has the responsibility of the main theme. So let’s follow one by one by splitting this into small pieces.def model(x_train, y_train, base_model):
# get layers and add average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# add fully-connected layer
x = Dense(512, activation='relu')(x)
# add output layer
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
# freeze pre-trained model area's layer
for layer in base_model.layers:
layer.trainable = False
# update the weight that are added
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit(x_train, y_train)
# choose the layers which are updated by training
layer_num = len(model.layers)
for layer in model.layers[:int(layer_num * 0.9)]:
layer.trainable = False
for layer in model.layers[int(layer_num * 0.9):]:
layer.trainable = True
# update the weights
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=5)
return history
inception_model = InceptionV3(weights='imagenet', include_top=False)
res_50_model = ResNet50(weights='imagenet', include_top=False)
vgg_19_model = VGG19(weights='imagenet', include_top=False)
vgg_16_model = VGG16(weights='imagenet', include_top=False)
xception_model = Xception(weights='imagenet', include_top=False)
On this part, new layers are added to the pre-trained model. Concretely, one average pooling layer and fully-connected layer and output layer are added.
# get layers and add average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# add fully-connected layer
x = Dense(512, activation='relu')(x)
# add output layer
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
After adding new layers, you can train those layers, to say concretely the weights. On this case, it is not necessary to update the weights pre-trained model layers have. So, by setting False on the each layer’s attribute, trainable, you can make training on those layers frozen.
# freeze pre-trained model area's layer
for layer in base_model.layers:
layer.trainable = False
# update the weight that are added
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit(x_train, y_train)
Here, I chose the layers to train and not to train. This time I just set that first 90% layers are not trained and next 10% are trained. In some articles, I saw just last two layers are updated by training and the others are remained as they are. Or you just train the part which you added. This time, I didn't stick to this point, because I think the precise number of layers to update is not the identity of fine-tune.
By the image below you can imagine what this part does. The part of the blue circles and the red links is pre-trained part. You don't need to update weights of this part by training. And the part of the orange circles and the green links is what you added to the pre-trained part. This is training target.
# choose the layers which are updated by training
layer_num = len(model.layers)
for layer in model.layers[:int(layer_num * 0.9)]:
layer.trainable = False
for layer in model.layers[int(layer_num * 0.9):]:
layer.trainable = True
# update the weights
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=5)
Evaluation
Let’s evaluate the model by test data. Actually, this kind of hold-out method is not enough robust to evaluate model. But the purpose of this article is not about appropriate evaluation. So I just used simple way. To be added, when you practically make models, I suggest that you check train and validation accuracy changes to check the training goes well or not and if over fitting can be seen or not.# check accuracy
evaluation_inception_v3 = history_inception_v3.model.evaluate(x_test_resized,y_test_hot_encoded)
evaluation_res_50 = history_res_50.model.evaluate(x_test_resized,y_test_hot_encoded)
evaluation_vgg_19 = history_vgg_19.model.evaluate(x_test_resized,y_test_hot_encoded)
evaluation_vgg_16 = history_vgg_16.model.evaluate(x_test_resized,y_test_hot_encoded)
evaluation_xception = history_xception.model.evaluate(x_test_resized,y_test_hot_encoded)
print("inception_v3:{}".format(evaluation_inception_v3))
print("res_50:{}".format(evaluation_res_50))
print("vgg_19:{}".format(evaluation_vgg_19))
print("vgg_16:{}".format(evaluation_vgg_16))
print("xception:{}".format(evaluation_xception))
The outcome is like this. inception_v3:[1.5071683092117309, 0.622]
res_50:[1.324682321548462, 0.55100000000000005]
vgg_19:[1.0588858013153075, 0.63600000000000001]
vgg_16:[2.7452965526580813, 0.57899999999999996]
xception:[0.89050176906585699, 0.70999999999999996]
Easily, we can make new model with pre-trained model. Of course, full trained model is better than fine-tuning model from the viewpoint of accuracy. But this is easy to make. It is not necessary to use much time and data. For example this time just 5 epochs training. Summary
Fine-tune is very versatile. Without much time and data, you can make “so so” model. In many cases such as small experiment, the situation where you need to wait and see or you just want to check the scale of accuracy to the data, fine-tune method works well with small cost. And although it depends on the purpose of the model, with respectable tuning and evaluation, this can be final output.On this article, I show how you can make fine-tune model by Keras. If you read the article until here, I suggest you read the Keras’s official document. This is very concise and easy to read. But it contains many important factors for deep learning.
Whole code
import random
import cv2
from keras.datasets import cifar10
from keras.utils import to_categorical
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras.optimizers import SGD
from keras.applications.inception_v3 import InceptionV3
from keras.applications.resnet50 import ResNet50
from keras.applications.vgg19 import VGG19
from keras.applications.vgg16 import VGG16
from keras.applications.xception import Xception
import numpy as np
# read data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# limit the amount of the data
# train data
ind_train = random.sample(list(range(x_train.shape[0])), 1000)
x_train = x_train[ind_train]
y_train = y_train[ind_train]
# test data
ind_test = random.sample(list(range(x_test.shape[0])), 1000)
x_test = x_test[ind_test]
y_test = y_test[ind_test]
def resize_data(data):
data_upscaled = np.zeros((data.shape[0], 320, 320, 3))
for i, img in enumerate(data):
large_img = cv2.resize(img, dsize=(320, 320), interpolation=cv2.INTER_CUBIC)
data_upscaled[i] = large_img
return data_upscaled
# resize train and test data
x_train_resized = resize_data(x_train)
x_test_resized = resize_data(x_test)
# make explained variable hot-encoded
y_train_hot_encoded = to_categorical(y_train)
y_test_hot_encoded = to_categorical(y_test)
def model(x_train, y_train, base_model):
# get layers and add average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# add fully-connected layer
x = Dense(512, activation='relu')(x)
# add output layer
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
# freeze pre-trained model area's layer
for layer in base_model.layers:
layer.trainable = False
# update the weight that are added
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit(x_train, y_train)
# choose the layers which are updated by training
layer_num = len(model.layers)
for layer in model.layers[:int(layer_num * 0.9)]:
layer.trainable = False
for layer in model.layers[int(layer_num * 0.9):]:
layer.trainable = True
# update the weights
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=5)
return history
inception_model = InceptionV3(weights='imagenet', include_top=False)
res_50_model = ResNet50(weights='imagenet', include_top=False)
vgg_19_model = VGG19(weights='imagenet', include_top=False)
vgg_16_model = VGG16(weights='imagenet', include_top=False)
xception_model = Xception(weights='imagenet', include_top=False)
history_inception_v3 = model(x_train_resized, y_train_hot_encoded, inception_model)
history_res_50 = model(x_train_resized, y_train_hot_encoded, res_50_model)
history_vgg_19 = model(x_train_resized, y_train_hot_encoded, vgg_19_model)
history_vgg_16 = model(x_train_resized, y_train_hot_encoded, vgg_16_model)
history_xception = model(x_train_resized, y_train_hot_encoded, xception_model)
# check accuracy
evaluation_inception_v3 = history_inception_v3.model.evaluate(x_test_resized,y_test_hot_encoded)
evaluation_res_50 = history_res_50.model.evaluate(x_test_resized,y_test_hot_encoded)
evaluation_vgg_19 = history_vgg_19.model.evaluate(x_test_resized,y_test_hot_encoded)
evaluation_vgg_16 = history_vgg_16.model.evaluate(x_test_resized,y_test_hot_encoded)
evaluation_xception = history_xception.model.evaluate(x_test_resized,y_test_hot_encoded)
print("inception_v3:{}".format(evaluation_inception_v3))
print("res_50:{}".format(evaluation_res_50))
print("vgg_19:{}".format(evaluation_vgg_19))
print("vgg_16:{}".format(evaluation_vgg_16))
print("xception:{}".format(evaluation_xception))