Sunday, June 25, 2017

How to write diverged type neural network by keras

How to write Diverged neural network

Overview


Simple style neural network as below is easy to write by deep learning frame work. 


This time, I make diverged neural network whose route to output is diverged and merged. The image of this is like below.



The purpose of this article is following two points.
  • see how to write diverged type neural network
  • see how accurate and good this type of model is
I used keras to write. Although it is bit annoying to write this kind of neural network compared with simple type, keras makes diverged type of model in relatively easy manner.
About the model's characteristics and accuracy, it’s difficult to judge, because there is no simple model which is relevant to the diverged model. So, we can just check roughly.



Make model using simple data


As a first step, I wrote diverged type nural network by using iris data set.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from keras.models import Model
from keras.layers import Dense, Input, merge
import keras
from keras.utils import np_utils

# data preparation
iris = datasets.load_iris()
features = iris.data
targets = np_utils.to_categorical(iris.target)

x_train, x_test, y_train, y_test = train_test_split(features, targets, train_size=0.7)

# model
inputs = Input(shape=(4,))

x_1 = Dense(8, activation='relu')(inputs)
x_1 = Dense(5, activation='relu')(x_1)
x_2 = Dense(7, activation='relu')(inputs)
x_2 = Dense(5, activation='relu')(x_2)

x = merge([x_1, x_2], mode='concat')
predictions = Dense(3, activation='softmax')(x)

model = Model(input=inputs, output=predictions)


model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

# training
history = model.fit(x_train, y_train, batch_size=5, epochs=100, shuffle=True,  validation_split=0.1)
When you write this type of neural network by keras, you need to use keras functional API. You can write model in the manner of functional language.
Let’s look at the model part’s detail.
# model
inputs = Input(shape=(4,))

x_1 = Dense(8, activation='relu')(inputs)
x_1 = Dense(5, activation='relu')(x_1)
x_2 = Dense(7, activation='relu')(inputs)
x_2 = Dense(5, activation='relu')(x_2)

x = merge([x_1, x_2], mode='concat')
predictions = Dense(3, activation='softmax')(x)

model = Model(input=inputs, output=predictions)

By ‘Input’, we can difine the input data’s dimension. This time, iris has 4 explaining variables.

Functional API’s writing is easy to understand. Layer function can take input data and you can assign the output on new or exsisting variables.
In this case, at the point of the input, the route is diverged into two, x_1 and x_2.

Those routes is merged into x later.
This model is just to check how to write diverged type model but just in case, I plotted the training procedure.

import matplotlib.pyplot as plt
def show_history(history):
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train_accuracy', 'test_accuracy'], loc='best')
    plt.show()

show_history(history)


make diverged type convolutional neural network(cnn)


Now, we know how to write diverged type neural network. By using cifar-10 data, chck how it works.

At first, I made usual CNN model to compare.

from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D, Dropout, MaxPooling2D,Flatten
from keras.regularizers import l1_l2
from keras.datasets import cifar10

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

def model_1(x_train, y_train, conv_num, dense_num):
    input_shape = x_train.shape[1:]

    # make teacher hot-encoded
    y_train = to_categorical(y_train, 10)

    # set model
    model = Sequential()
    model.add(Conv2D(conv_num, (3,3), activation='relu', input_shape=input_shape))
    model.add(Dropout(0.2))
    model.add(Conv2D(conv_num, (3,3), activation='relu'))
    model.add(Dropout(0.2))
    model.add(MaxPooling2D(pool_size=(2,2)))

    model.add(Conv2D(conv_num * 2, (3,3), activation='relu'))
    model.add(Conv2D(conv_num * 2, (3,3), activation='relu'))
    model.add(Dropout(0.2))
    model.add(MaxPooling2D(pool_size=(2,2)))

    model.add(Flatten())
    model.add(Dense(dense_num, activation='relu', W_regularizer = l1_l2(.01)))
    model.add(Dropout(0.2))
    model.add(Dense(int(dense_num * 0.6), activation='relu', W_regularizer = l1_l2(.01)))
    model.add(Dense(10, activation='softmax'))
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
    # training
    history =model.fit(x_train, y_train, batch_size=256, epochs=50, shuffle=True,  validation_split=0.1)
    return history
history_1 = model_1(x_train, y_train, 32, 256)
show_history(history_1)

The outcome is like this.





Unfortunately, on my environment, Macbook pro, even one epoch of training takes more or less 180 seconds. So, although I know the training epoch is not enough, I just tried 50 epochs.

And the code below is for diverged type neural network.

from keras.layers import Input, merge
def model_2(x_train, y_train):
    inputs = Input(x_train.shape[1:])

    # make teacher hot-encoded
    y_train = to_categorical(y_train, 10)

    # set model
    x_orig = Conv2D(32, (3,3), activation='relu')(inputs)
    x_orig = Dropout(0.2)(x_orig)

    x_1 = Conv2D(32, (3,3), activation='relu', border_mode='same')(x_orig)
    x_1 = Conv2D(24, (3,3), activation='relu', border_mode='same')(x_1)

    x_2 = Conv2D(12, (3,3), activation='relu', border_mode='same')(x_orig)
    x_2 = Conv2D(8, (3,3), activation='relu', border_mode='same')(x_2)
    x_2 = Conv2D(4, (3,3), activation='relu', border_mode='same')(x_2)

    x = merge([x_1, x_2], mode='concat')

    x = Conv2D(32, (3,3), activation='relu')(x)
    x = Dropout(0.2)(x)

    x = Conv2D(64, (3,3), activation='relu')(x)
    x = Conv2D(64, (3,3), activation='relu')(x)
    x = Dropout(0.2)(x)
    x = MaxPooling2D(pool_size=(2,2))(x)

    x = Flatten()(x)
    x = Dense(256, activation='relu', W_regularizer=l1_l2(0.01))(x)

    predictions = Dense(10, activation='softmax')(x)

    model = Model(input=inputs, output=predictions)

    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
    # training
    history =model.fit(x_train, y_train, batch_size=256, epochs=50, shuffle=True,  validation_split=0.1)
    return history
history_2 = model_2(x_train, y_train)
show_history(history_2

The outcome is like this.



On the phase of 50 epochs, this model’s accuracy is bit lower than usual convolutional model. About training time, one epoch of training takes more or less 700 seconds.

Summary


On this article, I showed how to write diverged type of neural network and failed to show what kind of accuracy scale it can have because train step is not enough.