Sunday, October 1, 2017

Perceptron by scikit-learn

Overview

I sometimes use Perceptron, one of the machine learning algorithms, as practice of algorithm writing from scratch.

But in many cases, it is highly recommended to use machine learning library. Although there are not many cases in practice that we use Perceptron, it is not wasted to know how to write Perceptron by the library, concretely scikit-learn.

On this article, I’ll show how to write Perceptron by scikit-learn.



Perceptron


On the articles below, I wrote Perceptron algorithm by Python and Go from scratch. To know what Perceptron is, please read those.

Perceptron by Golang from scratch

I tried perceptron, almost "Hello world" in machine learning, by Golang. Go has matrix calculation library like numpy on Python. But this time I just used default types. Usually on machine leaning, R and Python are frequently used and almost all from-scratch code of machine learning is shown by those or by C++.

Perceptron from scratch

There are good libraries to make machine learning model and usually, it's enough to use those to attain the goal you set on the model. It's not necessary to write algorithm by yourself. To say precisely, to write and use your full-scratch written model makes more bugs than prevalent library's one.

Data


At first, we need to import necessary libraries.

from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score

I used iris data set, which is one of the most popular data set for experiments.

# prepare data
iris = datasets.load_iris()

x = iris.data
y = iris.target

This data set has four explaining variables.

print(x[:10])
[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]
 [ 5.4  3.9  1.7  0.4]
 [ 4.6  3.4  1.4  0.3]
 [ 5.   3.4  1.5  0.2]
 [ 4.4  2.9  1.4  0.2]
 [ 4.9  3.1  1.5  0.1]]

For the evaluation, by train_test_split, I splitted data into train and test data.

# split data into train and test
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7)

Perceptron needs standardization of data for better classification. We can use StandardScaler().

# standardize data
ss = StandardScaler()
ss.fit(x_train)
ss_x_train = ss.transform(x_train)
ss_x_test = ss.transform(x_test)

Make model


Data preparation and pre-processing finished. By scikit-learn, the model can easily be made.

# perceptron
pp = Perceptron(n_iter=100, eta0=0.01, shuffle=True)
pp.fit(ss_x_train, y_train)

Evaluation


You can get the accuracy by accuracy_score().

# evaluate
predict = pp.predict(ss_x_test)
print(accuracy_score(y_test, predict))