Friday, June 16, 2017

Convolutional neural network scale experiment by keras

Overview


It is not easy to understand about convolutional neural network how the goodness changes when the nodes each layer has, layer’s number and other factors change.
For practical use of convolutional neural network, I experimented some types of convolutional neural network.



Data



This time, I used cifar-10.
Cifar-10 is one of the most popular image data set like mnist. It has 10 categories and all images have colour.

The purpose of the model is to predict the image's category.

Preparation


Import library and prepare for data.

import numpy as np
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten, Activation
from keras.utils import to_categorical
from keras import backend as K



# read data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

If it is first time to do “cifar10.load_data()”, it takes bit time to download.

Model


Convolutional neural network has so many selective points.
  • number of convolutional layers
  • number of nodes each layer has
  • position of pooling layer
  • number of layers after flatten
  • dropout
  • regularization

These above are just some of all. In my environment, Macbook pro, it takes much time even to try some patterns.
So, I fixed almost all except for number of nodes each layer has and number of nodes the layer after flatten has.

Caution: On this article, I tried to show the difference of the models with batch normalization and without it. But I forgot. So, the models below doesn't have batch normalization. Actually, you should set batch normalization.

# make some type of model
def make_model_1(x_train, y_train, conv_num, dense_num):
    input_shape = x_train.shape[1:]


    # make teacher hot-encoded
    y_train = to_categorical(y_train, 10)

    # set model
    model = Sequential()
    model.add(Conv2D(conv_num, (3,3), activation='relu', input_shape=input_shape))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Flatten())
    model.add(Dense(dense_num, activation='relu'))
    model.add(Dropout(0.4))
    model.add(Dense(10, activation='softmax'))
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
    # training
    history =model.fit(x_train, y_train, batch_size=256, epochs=50, shuffle=True,  validation_split=0.1)
    return history

On the model above, I used dropout. About dropout, please check the article, How Dropout works on Neural Network.

The model's architecture can be expressed as the image below. This is very simple and actually not enough to classify images well.





On the code above, conv_num and dense_num are as followings.

  • conv_num: the number of nodes the convolutional layer has
  • dense_num: the number of nodes the dense(I don’t know how to call neural network’s “usual” layer) layer has

Execute

conv_num = [i for i in range(2, 12)]
dense_num = [pow(2, i+1) for i in range(8)]
outcome = []
for conv in conv_num:
    for dense in dense_num:
        history = make_model_1(x_train, y_train, conv, dense)
        outcome.append((conv, dense, history.history['val_acc'][-1]))

On the loop, it makes the model with parameters and we can check validation accuracy relevant to each parameters.

Check the outcome


import pandas as pd
import matplotlib.pyplot as plt

outcome_df = pd.DataFrame(outcome)
outcome_df.columns = ['conv_num', 'dense_num', 'accuracy']
plt.scatter(outcome_df['conv_num'], outcome_df['dense_num'], c=outcome_df['accuracy'], cmap='Blues') 
plt.colorbar()


The outcome is like this. The degree of deepness of the color means the accuracy. By this graph, we can see the tendency that as the values go up, the accuracy goes up.

Examine more


I can say the bigger the number of nodes become the more the model can trace the data. Before I didn't check train accuracy. I just check validation accuracy. From here, I fix conv_num and dense_num as 10 and 256. I add one more convolutional and “dense” layers.

# add one more convolutional layer
def make_model_2(x_train, y_train, conv_num, dense_num):
    input_shape = x_train.shape[1:]


    # make teacher hot-encoded
    y_train = to_categorical(y_train, 10)

    # set model
    model = Sequential()
    model.add(Conv2D(conv_num, (3,3), activation='relu', input_shape=input_shape))
    model.add(Conv2D(conv_num, (3,3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Flatten())
    model.add(Dense(dense_num, activation='relu'))
    model.add(Dropout(0.4))
    model.add(Dense(10, activation='softmax'))
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
    # training
    history =model.fit(x_train, y_train, batch_size=256, epochs=50, shuffle=True,  validation_split=0.1)
    return history

history = make_model_2(x_train, y_train, 10, 256)

The model's architecture is this.



The outcome is as following.

Epoch 50/50
45000/45000 [==============================] - 49s - loss: 0.3162 - acc: 0.8898 - val_loss: 1.9862 - val_acc: 0.5562

The train accuracy is almost 0.9 and validation accuracy is 0.55. We can judge this as overfitting.

To solve this overfitting, there are some ways.
  • set dropout more(it de-activate some node randomly on the training)
  • set regularization like l1, l2, l1_l2
Regularization adds item to loss function and prevent from parameters to become too big. By those, the model can get generalization performance.

But it has some problems. If you set dropout and strong regularization on each layers, sometimes even the train accuracy does not go up, meaning the model is not even enough to trace train data.

So, about the order, it can be better that we set and make deep model to get good accuracy even if it has overfitting and after that we try to reduce the overfitting by adding those solvers. It is sometimes hard to make trial to get better model with thinking about overfitting.

Deep model

To check more, let's try more complex model.

def make_model_3(x_train, y_train, conv_num, dense_num):
    input_shape = x_train.shape[1:]

    # make teacher hot-encoded
    y_train = to_categorical(y_train, 10)

    # set model
    model = Sequential()
    model.add(Conv2D(conv_num, (3,3), activation='relu', input_shape=input_shape))
    model.add(Conv2D(conv_num, (3,3), activation='relu'))
    model.add(Dropout(0.2))
    model.add(MaxPooling2D(pool_size=(2,2)))

    model.add(Conv2D(conv_num * 2, (3,3), activation='relu'))
    model.add(Conv2D(conv_num * 2, (3,3), activation='relu'))
    model.add(Dropout(0.2))
    model.add(MaxPooling2D(pool_size=(2,2)))

    model.add(Flatten())
    model.add(Dense(dense_num, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(int(dense_num * 0.6), activation='relu'))
    model.add(Dense(10, activation='softmax'))
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
    # training
    history =model.fit(x_train, y_train, batch_size=256, epochs=15, shuffle=True,  validation_split=0.1)
    return history
history = make_model_3(x_train, y_train, 32, 256)

The model's architecture is this.


From here, my Macbook pro actually cries with huge fan noise.

The outcome is

Epoch 15/50
45000/45000 [==============================] - 194s - loss: 0.6627 - acc: 0.7702 - val_loss: 0.8667 - val_acc: 0.7092

This model has many layers and each layer has many nodes. It takes too much time to wait for the finish of the 50 epochs training. So this one is just 15 epochs training outcome.

The validation accuracy is over 0.7 and the training accuracy is 0.77.
It has two characteristics.
  • big and many layers lead to goodness
  • you can see overfitting

Next, I try to solve this over fitting.

def make_model_4(x_train, y_train, conv_num, dense_num):
    input_shape = x_train.shape[1:]

    # make teacher hot-encoded
    y_train = to_categorical(y_train, 10)

    # set model
    model = Sequential()
    model.add(Conv2D(conv_num, (3,3), activation='relu', input_shape=input_shape))
    model.add(Conv2D(conv_num, (3,3), activation='relu'))
    model.add(Dropout(0.2))
    model.add(MaxPooling2D(pool_size=(2,2)))

    model.add(Conv2D(conv_num * 2, (3,3), activation='relu'))
    model.add(Conv2D(conv_num * 2, (3,3), activation='relu'))
    model.add(Dropout(0.2))
    model.add(MaxPooling2D(pool_size=(2,2)))

    model.add(Flatten())
    model.add(Dense(dense_num, activation='relu', W_regularizer = l1_l2(.01)))
    model.add(Dropout(0.2))
    model.add(Dense(int(dense_num * 0.6), activation='relu', W_regularizer = l1_l2(.01)))
    model.add(Dense(10, activation='softmax'))
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
    # training
    history =model.fit(x_train, y_train, batch_size=256, epochs=15, shuffle=True,  validation_split=0.1)
    return history
history = make_model_4(x_train, y_train, 32, 256)

On the layer after convolutional, I set L1_L2 regularization. This can prevent from the weights(parameters) to become too big. Here, I expect two points, to solve overfitting and not to decrease the validation accuracy(if it improves by generalization, it is nice.).
But the outcome is like this.

Epoch 15/15
45000/45000 [==============================] - 207s - loss: 3.9736 - acc: 0.5038 - val_loss: 4.0522 - val_acc: 0.4896

You can’t see overfitting. But train and validation accuracy are more or less 0.5. I check in deep by plot.

import matplotlib.pyplot as plt
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train_accuracy', 'test_accuracy'], loc='best')
plt.show()

When the accuracy is not preferable, we need to check if it is because the ceiling of the network(model) is like that or training epoch is not enough. This plot image shows on the area from 0.4 to 0.5 the angle of accuracy improving becomes smooth. But it doesn't have enough epochs to judge if it is the model’s ceiling.
I tried to make the model train 100 epochs.
The outcome is like this.

Epoch 100/100
45000/45000 [==============================] - 200s - loss: 6.8363 - acc: 0.6719 - val_loss: 6.9292 - val_acc: 0.6504

This reached 65% and overfitting is not observed. But it takes 100 epoch and still 65%.

By adding regularization, although I succeeded to solve overfitting, the training speed becomes so slow and the rate of accuracy became bad.
It is important to choose appropriate regularization way and the point to set.