Wednesday, May 30, 2018

Convolutional Neural Network with Julia: Flux

Abstract

Here, I'll make a convolutional neural network model by Flux with Julia. In the article, Deep learning with Julia: introduction to Flux, I made simple neural network with Flux. Neural network, especially convolutional neural network, is quite efficient in image classification area. So, this time, I'll make the convolutional neural network model to image classification.



Flux

Flux is one of the deep learning packages. When we tackle with deep learning task, we have some choices about libraries. This time, because I read the reddit's post, Julia and “deep learning” and Flux sounded great, I'll touch Flux as a trial.
About the detail of Flux, I recommend that you read the official document. Actually, same as Keras’s document, the document is very informative and helpful even to grasp the basis of deep learning itself.

GitHub

Official document

Code

This time, I'll make the following model. With Flux, we need to write model architecture, loss function and optimizer. On this phase, I'm curious about the following points.
  1. Because Dense layer needs input and output information, we need to calculate the input information. Can we make it easier???
  2. For example, with Keras, functional api enables us to make complex network like diverged one. With Flux, how can we make it???
  3. On the optimizer’s set phase, we specify the weights. Probably, by this system, we can use fine-tuning technique.
I don't know well about those three above. When I get, I'll write other articles about those.

using Flux
using Flux: crossentropy
using Flux: @epochs

model = Chain(
    Conv((2,2), 1=>16, relu),
    x -> maxpool(x, (2,2)),
    Conv((2,2), 16=>8, relu),
    x -> maxpool(x, (2,2)),
    x -> reshape(x, :, size(x, 4)),
    Dense(288, 10),
    softmax)

loss(x, y) = crossentropy(model(x), y)
opt = ADAM(params(model))

Anyway, by the code above, the model was written. To know the form of data the model will receive, let’s check the documentation of the Conv layer.

@doc(Conv)
Conv(size, in=>out)
Conv(size, in=>out, relu)
Standard convolutional layer. size should be a tuple like (2, 2). in and out specify the number of input and output channels respectively.

Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a 100×100×3 array, and a batch of 50 would be a 100×100×3×50 array.

Takes the keyword arguments pad and stride.

As the document said, the data should be stored in WHCN(width, height, channels and number) order. We need to arrange the data to this form.
On this article, I’ll use MNIST data set, one of the most standard and popular data sets. Flux package contains this data set. But, here, I'll use MLDatasets's one.

We need to care about the form of the data. The Array contains Tuples which contain explaining variables and label. One Tuple is equivalent to one batch in training.

using MLDatasets
using Base.Iterators: partition
using Flux: onehotbatch

train_x, train_y = MLDatasets.MNIST.traindata()
test_x,  test_y  = MLDatasets.MNIST.testdata()

trainIndex = randperm(size(train_x)[end])[1:2000]

trainData = []
for batch in partition(1:2000, 50)
    trainXFloat = Float64.(train_x[:, :, batch])
    trainXReshaped = reshape(trainXFloat, (28, 28, 1, length(batch)))
    trainY = onehotbatch(train_y[batch], 0:9)
    push!(trainData, (trainXReshaped, trainY))
end

The training updates the weights.

@epochs 100 Flux.train!(loss, trainData, opt)

By the trained model, the following code does prediction and checks the accuracy. Julia's index starts from 1 not from 0. So, we need to subtract 1 from the argument max's value.

using Flux: argmax

predicted = model(reshape(Float64.(test_x), (28, 28, 1, 10000)))

accuracy = mean(argmax(predicted)-1 .== test_y)
println(accuracy)
0.8956