Wednesday, July 5, 2017

Perceptron from scratch

Overview


There are good libraries to make machine learning model and usually, it’s enough to use those to attain the goal you set on the model.

It’s not necessary to write algorithm by yourself. To say precisely, to write and use your full-scratch written model makes more bugs than prevalent library’s one. So you should use prevalent libraries except for the time that those don’t fulfill what you want to get.

But to deepen your understandings and knowledge to machine leaning, writing existing algorithm by yourself is very good trial.
Here, I show how to write perceptron algorithm.


Why perceptron


Although there are many machine learning algorithms, I chose perceptron as first full-scratch trial. Perceptron is relatively easy to write and the system of that is very fundamental to other algorithms.

Many people think to write machine learning algorithm is mathematically difficult, takes much time, needs long codes. But peceptron is not. This needs basic linear-algebra knowledge. Just this.

So, this is very good theme as choice of first full-scratch trial.

What is perceptron


Perceptron is the algorithm which takes input data and outputs the predicted class the data to belong to. To say precisely, the procedure is as following.
  1. get input and make linear combination value with weights
  2. pass the linear combination to the acrivation function

The drawing below shows how data flows.



Only parameters we need to care about are those weights. On training phase, the model updates the weights and tries to find weights to separate data well.




means activation function. This takes linear combination of weights and data as argument.

Code

import numpy as np

class Perceptron:

    def __init__(self, eta=0.1, iter_num=100):
        self.eta = eta
        self.iter_num = iter_num

    @staticmethod
    def activate(linear_combination):
        return np.where(linear_combination >= 0, 1, -1)

    def predict(self, x):
        linear_combination = np.dot(x, self.weights[1:]) + self.weights[0]
        y_pred = Perceptron.activate(linear_combination)
        return y_pred


    def fit(self, X, Y):
        self.weights = np.zeros(1 + X.shape[1])

        for _ in range(self.iter_num):
            self.error = 0
            for x, y in zip(X, Y):
                y_pred = self.predict(x)
                update = self.eta * (y - y_pred)
                self.weights[1:] += update * x
                self.weights[0] += update
                self.error += int(update != 0.0)

            print(self.error/len(Y))

        return self


from sklearn import datasets
# prepare for data
iris = datasets.load_iris()

features = iris.data
iris.target = np.where(iris.target == 0, -1, 1)

perceptron = Perceptron()
perceptron.fit(features, iris.target)

Let’s check one by one.

def __init__(self, eta=0.1, iter_num=100):
    self.eta = eta
    self.iter_num = iter_num

In init(), eta and iter_num are set. eta defines the update width of weights. iter_num is the iteration number of the time in train.

@staticmethod
def activate(linear_combination):
    return np.where(linear_combination >= 0, 1, -1)

This is activation function. It takes linear combination and outputs 1 or -1. The threshold is 0.

def predict(self, x):
    linear_combination = np.dot(x, self.weights[1:]) + self.weights[0]
    y_pred = Perceptron.activate(linear_combination)
    return y_pred

This function is to predict.
linear_combination is the dot product of input data and weights. self.weights[0] is the bias item. y_pred is the outcome of prediction.

def fit(self, X, Y):
    self.weights = np.zeros(1 + X.shape[1])

    for _ in range(self.iter_num):
        self.error = 0
        for x, y in zip(X, Y):
            y_pred = self.predict(x)
            update = self.eta * (y - y_pred)
            self.weights[1:] += update * x
            self.weights[0] += update
            self.error += int(update != 0.0)

        print(self.error/len(Y))

    return self

This function is to train model by data. On the first line, weights are initialized by 0. The number of weights is the data’s variable number + 1.
This +1 is for bias item.



By the system above, the model’s weights are updated.