Data Science Struggle: Practical hack to make deep learning model

Saturday, July 1, 2017

Practical hack to make deep learning model

Overview

Neural network has a lot of flexibility in its design. You can choose and set many components and options. Because of that, to make more optimized network, you need to know and care about the procedures to adjust those to update your network efficiently.
Here, I arranged neural network’s components and in which procedure those should be adjusted.

Neural network’s components

The components of neural network can be separated into necessary elements and optional elements.

Necessary elements

Layer
Node
Activation function

Layer, Node

L_1, L_2, L_3 of the drawing above are layers. The circles are nodes.
The scale of neural network is fixed by the number of layers and nodes.
On this drawing, L_1, L_2, L_3 are relevant to input layer, hidden layer and output layer. Fundamentally, input layer depends on the shape of input data and output layer does on the shape of the output shape, meaning on what you hope on this model. So, actually the point you need to optimize is the hidden layer.
As the number of layers and nodes increases, the model can trace the input data more and it takes more time for training.
The major part of model’s accuracy depends on the adjustment and scale of those, meaning when you start to make model, the first thing you tackle with is to do try and error about hidden layer’s scale.

Activation function

Activation function is the function which is adapted to the input data of the node.
This function as you can see on the drawing is set on each nodes. When you make neural network model by library, usually you set the functions per units of layers.
Output layer’s activation function depends on the output shape.
About hidden layer’s activation functions, there are many types and by the choices, the accuracy changes.
These below are some examples.

Linear
ReLU
sigmoid
Leaky ReLU
Parametric ReLU
maxout

Optional elements

The main purpose of optional elements is to prevent from over fitting. Neural network can trace input data(train data) well by huge scale of layers and nodes. So generalization is very important to adjust model to unknown data.

Dropout
Regularization

Dropout

If you set dropout on the layer, on the training step, some rates of the nodes are de-activated. By training in the situation some rate of nodes are de-activated, over fitting can be eased.
The optional point is where you set dropout and how much you de-activate. If you set too many dropout or the rate of de-activation is too high, the training doesn’t go well.
So, you need to care about “so, so” point.

Regularization

By adding regularization item to loss function, you can prevent the model’s parameter from being too big. By this, you can solve over fitting.
There are some types of regularizations.

L1 regularization
L2 regularization

You need to choose the regularization methods and fix hyper parameters.
When the hyper parameters are too strong, the training doesn’t go well.

Procedures to make Neural network model

The points to make neural network are as followings.

It should have enough scale to trace data
Big scale over fitting doesn’t happen

Neural network model needs to have enough accuracy to fulfill the purpose and to have enough ability to keep good accuracy even on unknown data.
To attain those, the following step is important.

make small model with few layers and nodes
by increasing the number of layers and nodes, try to decrease loss
if you can see over fitting on the phase of decreasing loss, try to solve the over fitting

So, at first you try to make good accuracy model by necessary elements. After that, try to fix over fitting by optional elements.

examples by keras

For example, you can make model which has those elements like this by keras.

model.add(Dense(8, input_dim=4, W_regularizer = l1_l2(.01)))
model.add(Activation('relu'))
model.add(Dropout(0.2))