Overview
On this article, I’ll try golearn package on Golang.These days, typical environment sets for data science and machine learning are Python, R, R-studio and Jupyter, although it depends on the purposes and phases. When I personally do something, I always use Python and Jupyter. But of course other programming languages have machine learning libraries and those are sometimes used.
Here, I’ll try golearn package, which is the package for machine learning.
This is the official page.
As a first step, through kNN algorithms, I’ll follow the basic step of that.
Follow the code
At first, we need to import libraries. Simply here, I imported those for following purposes.- fmt: to print the outcome
- golearn/base: to read data
- golearn/evaluation: to evaluate the model
- golearn/knn: to make model
package main
import (
"fmt"
"github.com/sjwhitworth/golearn/base"
"github.com/sjwhitworth/golearn/evaluation"
"github.com/sjwhitworth/golearn/knn"
)
From here, every code is in the function main().
Roughly, I splitted the whole code into three parts, meaning data part, model part and evaluation part.
The code below is for data part.
This library has iris data inside and I used this.
base.ParseCSVToInstances() reads the data and returns DenseInstances. From the code’s comment, DenseInstances is
// DenseInstances stores each Attribute value explicitly
// in a large grid.
By base.InstancesTrainTestSplit, I splitted the data into train data and test data.
// data path
filePath := "..../go/src/github.com/sjwhitworth/golearn/examples/datasets/iris.csv"
// read data
rawData, err := base.ParseCSVToInstances(filePath, false)
if err != nil {
panic(err)
}
// split data into train and test
trainData, testData := base.InstancesTrainTestSplit(rawData, 0.8)
Next, model part.
Same as Python’s sklearn manner, on golearn, we can make the model. On this case, this uses the nearest three points for model, although I don’t like to use expression “model” about kNN.
// initialize the knn instance
cls := knn.NewKnnClassifier("euclidean", "linear", 3)
// training
cls.Fit(trainData)
Finally, evaluation part.
By the model, we can predict the data. To evaluate the outcome, I used confusion matrix.
// predict
predictions, err := cls.Predict(testData)
if err != nil {
panic(err)
}
// evaluation
confusionMatrix, err := evaluation.GetConfusionMatrix(testData, predictions)
if err != nil {
panic(err)
}
If we print the confusion matrix, the outcome is like this. It is not user-friendly.
Optimisations are switched off
map[Iris-virginica:map[Iris-virginica:41 Iris-versicolor:1] Iris-setosa:map[Iris-setosa:44] Iris-versicolor:map[Iris-versicolor:36 Iris-virginica:5]]
evaluation has some functions. Roughly, by candidates, we can know there are many functions which takes ConfusionMatrix as argument.
This time I just used evaluation.GetSummary.
fmt.Println(evaluation.GetSummary(confusionMatrix))
Optimisations are switched off
Reference ClassnTrue Positives False Positives True Negatives Precision Recall F1 Score
--------------- -------------- --------------- -------------- --------- ------ --------
Iris-virginica 41 5 80 0.8913 0.9762 0.9318
Iris-setosa 44 0 83 1.0000 1.0000 1.0000
Iris-versicolor 36 1 85 0.9730 0.8780 0.9231
Overall accuracy: 0.9528
Code
package main
import (
"fmt"
"github.com/sjwhitworth/golearn/base"
"github.com/sjwhitworth/golearn/evaluation"
"github.com/sjwhitworth/golearn/knn"
)
func main() {
// data path
filePath := "****/go/src/github.com/sjwhitworth/golearn/examples/datasets/iris.csv"
// read data
rawData, err := base.ParseCSVToInstances(filePath, false)
if err != nil {
panic(err)
}
// split data into train and test
trainData, testData := base.InstancesTrainTestSplit(rawData, 0.8)
// initialize the knn instance
cls := knn.NewKnnClassifier("euclidean", "linear", 3)
// training
cls.Fit(trainData)
// predict
predictions, err := cls.Predict(testData)
if err != nil {
panic(err)
}
// evaluation
confusionMatrix, err := evaluation.GetConfusionMatrix(testData, predictions)
if err != nil {
panic(err)
}
fmt.Println(evaluation.GetSummary(confusionMatrix))
}