Wednesday, January 3, 2018

Get started with golearn: Machine Learning with Go

Overview

On this article, I’ll try golearn package on Golang.
These days, typical environment sets for data science and machine learning are Python, R, R-studio and Jupyter, although it depends on the purposes and phases. When I personally do something, I always use Python and Jupyter. But of course other programming languages have machine learning libraries and those are sometimes used.
Here, I’ll try golearn package, which is the package for machine learning.
This is the official page.
As a first step, through kNN algorithms, I’ll follow the basic step of that.



Follow the code

At first, we need to import libraries. Simply here, I imported those for following purposes.
  • fmt: to print the outcome
  • golearn/base: to read data
  • golearn/evaluation: to evaluate the model
  • golearn/knn: to make model

package main

import (
    "fmt"
    "github.com/sjwhitworth/golearn/base"
    "github.com/sjwhitworth/golearn/evaluation"
    "github.com/sjwhitworth/golearn/knn"
)

From here, every code is in the function main().
Roughly, I splitted the whole code into three parts, meaning data part, model part and evaluation part.
The code below is for data part.
This library has iris data inside and I used this.
base.ParseCSVToInstances() reads the data and returns DenseInstances. From the code’s comment, DenseInstances is

// DenseInstances stores each Attribute value explicitly
// in a large grid.

By base.InstancesTrainTestSplit, I splitted the data into train data and test data.

// data path
filePath := "..../go/src/github.com/sjwhitworth/golearn/examples/datasets/iris.csv"

// read data
rawData, err := base.ParseCSVToInstances(filePath, false)

if err != nil {
    panic(err)
}

// split data into train and test
trainData, testData := base.InstancesTrainTestSplit(rawData, 0.8)

Next, model part.
Same as Python’s sklearn manner, on golearn, we can make the model. On this case, this uses the nearest three points for model, although I don’t like to use expression “model” about kNN.

// initialize the knn instance
cls := knn.NewKnnClassifier("euclidean", "linear", 3)

// training
cls.Fit(trainData)

Finally, evaluation part.
By the model, we can predict the data. To evaluate the outcome, I used confusion matrix.

// predict
predictions, err := cls.Predict(testData)

if err != nil {
    panic(err)
}

// evaluation
confusionMatrix, err := evaluation.GetConfusionMatrix(testData, predictions)

if err != nil {
    panic(err)
}

If we print the confusion matrix, the outcome is like this. It is not user-friendly.

Optimisations are switched off
map[Iris-virginica:map[Iris-virginica:41 Iris-versicolor:1] Iris-setosa:map[Iris-setosa:44] Iris-versicolor:map[Iris-versicolor:36 Iris-virginica:5]]

evaluation has some functions. Roughly, by candidates, we can know there are many functions which takes ConfusionMatrix as argument.

enter image description here

This time I just used evaluation.GetSummary.

fmt.Println(evaluation.GetSummary(confusionMatrix))
 Optimisations are switched off
Reference ClassnTrue Positives  False Positives True Negatives  Precision   Recall  F1 Score
--------------- --------------  --------------- --------------  ---------   ------  --------
Iris-virginica  41      5       80      0.8913      0.9762  0.9318
Iris-setosa 44      0       83      1.0000      1.0000  1.0000
Iris-versicolor 36      1       85      0.9730      0.8780  0.9231
Overall accuracy: 0.9528

Code

package main

import (
    "fmt"
    "github.com/sjwhitworth/golearn/base"
    "github.com/sjwhitworth/golearn/evaluation"
    "github.com/sjwhitworth/golearn/knn"
)

func main() {
    // data path
    filePath := "****/go/src/github.com/sjwhitworth/golearn/examples/datasets/iris.csv"

    // read data
    rawData, err := base.ParseCSVToInstances(filePath, false)

    if err != nil {
        panic(err)
    }

    // split data into train and test
    trainData, testData := base.InstancesTrainTestSplit(rawData, 0.8)

    // initialize the knn instance
    cls := knn.NewKnnClassifier("euclidean", "linear", 3)

    // training
    cls.Fit(trainData)
    // predict
    predictions, err := cls.Predict(testData)

    if err != nil {
        panic(err)
    }

    // evaluation
    confusionMatrix, err := evaluation.GetConfusionMatrix(testData, predictions)
    if err != nil {
        panic(err)
    }

    fmt.Println(evaluation.GetSummary(confusionMatrix))
}