Monday, June 12, 2017

Simple guide for Tensorflow

Overview


This article is to roughly understand Tensorflow and make easy model.
These days if you are machine-oriented person, you can't pass even a day without hearing the name of Tensorflow. This is very useful tool but not so easily approachable.
Let't check what Tensorflow is and how you can use it.




What is Tensorflow?


Tensorflow is open source software library released by Google. The official document is here.

These days, in many places, we can hear Tensorflow. Here, we will see what Tensorflow is and in the world of machine learning how it's positioned on.


1. What is Tensorflow?

As simple understanding to use Tensorflow, you can grab this as calculator to automatically update parameters of neural network.


2. What kind of role does Tensorflow fulfill?

Roughly, how to use machine learning algorithm is categorized into 3 patterns as followings.
  • By knowledge and understanding of mathmatics and machine learning algorithms, you can make model by yourself from scratch.
  • By using library such as sklearn, you can make model just by 2 or 3 lines.
  • By machine learning-oriented calculating tool, you can make model by yourself.

Tensorflow falls under the third one.

The situations of those above are as followings.

If you want to know and understand well the machine learning system, algorithms, mathematical aspect, it's good practice to write all algorithms by yourself from scratch. And sometimes you are faced with the situation which is difficult to be solved by existing libraries. At this kind of situation, it can be necessary to write new algorithm by yourself from scratch.

But in many situations, it is not pragmatic to write those algorithms by yourself, because it takes much time and it just increases the number of points you need to care about. You should not make buggy code in vein.

So, we need to know how to use libraries and tackle with problems with those. In many cases, those libraries are enough to solve the problems. Of course it depends on the workplace and situation, but usually in most of the cases, you spends a long time with those libraries.

About the situation to use learning-oriented calculating tool, it is almost same as the situation that you need to use neural network.
This algorithm is very powerful especially to huge dimensional data.
Neural network can have many layers, meaning the network architecture is too complex to express by 3 lines by library. The code can become very complex if you write from scratch.

Some calculation library like Tensorflow can solve the situation above, because Tensorflow is not the library to make model by 3 lines but the library to help users to write model by yourself.

It means you need at least to know how to set model and what kind of model works well.

How does Tensorflow calculate


Let’s look at how Tensorflow calculates.

At first, I made Tensorflow solve following calculation.

5 + 3

By Tensorflow, it becomes like this.

import tensorflow as tf
a = tf.constant(5)
b = tf.constant(3)
added = tf.add(a, b)

with tf.Session() as sess:
    print sess.run(added)

Look at the variable 'added'.

Tensor("Add_1:0", shape=(), dtype=int32)

In this case, 'added' is not concrete value but network information meaning a plus b and the actual calculation is done in the Session running.

This network information is called graph in Tensorflow.

Variables of Tensorflow


Tensorflow has one more important point to care about. Variables.

Machine learning-oriented library needs to deal with multi dimensions. And as I wrote above, Tensorflow makes graph at first and calculates in the session running.

Shortly, Tensorflow has multi-dimensional and constant variables, placeholder as followings.

# constant
a = tf.constant(3)

# variable
b = tf.Variable(0)

# placeholder
c = tf.placeholder(tf.float32)

Concretely, let's calculate by using placeholder.

a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
added = tf.add(a, b)
with tf.Session() as sess:
    print sess.run(added, feed_dict = {a: 3.0, b: 5.0})

On the example above, first 3 lines mean graph information and on the running phase, concrete numbers are assigned by feed_dict.

Classifications by Tensorflow


By using iris data, let's try to make classification model.

From here, I separate all the phases into making blueprint and build.
making blueprint means what kind of model and graph it makes.
build means by assigning data on the blueprint, it updates the parameters.

The code below is the example.

import pandas as pd
import tensorflow as tf
from sklearn import cross_validation

data = pd.read_csv('https://dl.dropboxusercontent.com/u/432512/20120210/data/iris.txt', sep = "\t")
data = data.ix[:,1:]
train_data, test_data, train_target, test_target = cross_validation.train_test_split(data.ix[:,['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width']], data.ix[:,['Species']], test_size = 0.4, random_state = 0)

The code above gets iris data and split it into train and test data.
By using train_data and train_target set, I will make model.
Usually the test data should be used to try prediction by model. But this time, I only make model and don't use test data.(Why did I separate data...... I don't know...)
Let's make blue_print and do build.

# make blueprint
# set placeholder
X = tf.placeholder(tf.float32, shape = [None, 4])
Y = tf.placeholder(tf.float32, shape = [None, 3])

# set parameters
W = tf.Variable(tf.random_normal([4, 3], stddev=0.35))

# activation function
y_ = tf.nn.softmax(tf.matmul(X, W))

# build
# loss function
cross_entropy = -tf.reduce_sum(Y * tf.log(y_))

# training
optimizer = tf.train.GradientDescentOptimizer(0.001)
train = optimizer.minimize(cross_entropy)

# execute
with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())
    for i in range(1000):
        x = train_data
        y = pd.get_dummies(train_target)
        print(sess.run(W))

        sess.run(train, feed_dict = {X: x, Y: y})

    test = sess.run(W)

See one by one.

The blueprint phase.

When you assign explaining variables matrix on X and explained variables on Y, weights on W, the blueprint can be expressed as following.(This time I don't use bias.)

X = [None, 4]
W = [4, 3]
Y = [None, 3]

[None, 4] * [4, 3] = [None, 3]

Let's check.

# set placeholder
X = tf.placeholder(tf.float32, shape = [None, 4])
Y = tf.placeholder(tf.float32, shape = [None, 3])

Setting placeholders. The X and Y mean explaining variables and explained variables on the data. Later data will be assigned on those.

Iris data has 4 explaining and 3 explained variables. The size of data assigned on later is unknown. So it is expressed as None on the 'making blueprint' phase.

# set parameters
W = tf.Variable(tf.random_normal([4, 3], stddev=0.35))

Here the parameters are set, meaning that I set the weights. On the build phase, passing data updates these. tf.random_normal and stddev assign random number generated by selected distributions on initial values.

# activation function
y_ = tf.nn.softmax(tf.matmul(X, W))

Select activation functions.

The activation functions depend on the form of output, layers(it is hidden layer or output layer). You need to choose depending on those factors.

This time, this ‘y_’ is the final layer.The blue print phase is done. From here, the build part starts.
Tensorflow does confusing caluculations. And we need to understand what kind of things this does.

By simple expression, what Tensorflow does is to define loss function and update parameters on the conditions that those lessen the loss.


# loss function
cross_entropy = -tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_, Y)

The loss function depends on what kind of task you try to make Tensorflow solve.
It is quite better to check what kind of loss function Tensorflow has.

# training
optimizer = tf.train.GradientDescentOptimizer(0.001)
train = optimizer.minimize(cross_entropy)

optimizer defines how to update the parameters by data.
This 'train' updates parameters.

# execute
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(1000):
        x = train_data
        y = pd.get_dummies(train_target)
        print(sess.run(W))

        sess.run(train, feed_dict = {X: x, Y: y})

Here build is done. This time all the data is regarded as 1 batch and it is read 1000 times.

On the part of sess.run(), actual data is passed on the placeholders.

This tries to update parameters by lessening loss.