Abstract
Here, I'll make a convolutional neural network model by Flux with Julia. In the article, Deep learning with Julia: introduction to Flux, I made simple neural network with Flux. Neural network, especially convolutional neural network, is quite efficient in image classification area. So, this time, I'll make the convolutional neural network model to image classification.Flux
Flux is one of the deep learning packages. When we tackle with deep learning task, we have some choices about libraries. This time, because I read the reddit's post, Julia and “deep learning” and Flux sounded great, I'll touch Flux as a trial.About the detail of Flux, I recommend that you read the official document. Actually, same as Keras’s document, the document is very informative and helpful even to grasp the basis of deep learning itself.
GitHub
Official document
Code
This time, I'll make the following model. With Flux, we need to write model architecture, loss function and optimizer. On this phase, I'm curious about the following points.- Because Dense layer needs input and output information, we need to calculate the input information. Can we make it easier???
- For example, with Keras, functional api enables us to make complex network like diverged one. With Flux, how can we make it???
- On the optimizer’s set phase, we specify the weights. Probably, by this system, we can use fine-tuning technique.
using Flux
using Flux: crossentropy
using Flux: @epochs
model = Chain(
Conv((2,2), 1=>16, relu),
x -> maxpool(x, (2,2)),
Conv((2,2), 16=>8, relu),
x -> maxpool(x, (2,2)),
x -> reshape(x, :, size(x, 4)),
Dense(288, 10),
softmax)
loss(x, y) = crossentropy(model(x), y)
opt = ADAM(params(model))
Anyway, by the code above, the model was written. To know the form of data the model will receive, let’s check the documentation of the Conv layer.
@doc(Conv)
Conv(size, in=>out)
Conv(size, in=>out, relu)
Standard convolutional layer. size should be a tuple like (2, 2). in and out specify the number of input and output channels respectively.
Data should be stored in WHCN order. In other words, a 100×100 RGB image would be a 100×100×3 array, and a batch of 50 would be a 100×100×3×50 array.
Takes the keyword arguments pad and stride.
As the document said, the data should be stored in WHCN(width, height, channels and number) order. We need to arrange the data to this form.
On this article, I’ll use MNIST data set, one of the most standard and popular data sets. Flux package contains this data set. But, here, I'll use MLDatasets's one.
We need to care about the form of the data. The Array contains Tuples which contain explaining variables and label. One Tuple is equivalent to one batch in training.
using MLDatasets
using Base.Iterators: partition
using Flux: onehotbatch
train_x, train_y = MLDatasets.MNIST.traindata()
test_x, test_y = MLDatasets.MNIST.testdata()
trainIndex = randperm(size(train_x)[end])[1:2000]
trainData = []
for batch in partition(1:2000, 50)
trainXFloat = Float64.(train_x[:, :, batch])
trainXReshaped = reshape(trainXFloat, (28, 28, 1, length(batch)))
trainY = onehotbatch(train_y[batch], 0:9)
push!(trainData, (trainXReshaped, trainY))
end
The training updates the weights.
@epochs 100 Flux.train!(loss, trainData, opt)
By the trained model, the following code does prediction and checks the accuracy. Julia's index starts from 1 not from 0. So, we need to subtract 1 from the argument max's value.
using Flux: argmax
predicted = model(reshape(Float64.(test_x), (28, 28, 1, 10000)))
accuracy = mean(argmax(predicted)-1 .== test_y)
println(accuracy)
0.8956