Monday, May 21, 2018

How to make HTTP server for prediction of machine learning model with Julia

Abstract

On this article, I'll try Julia's HTTP server. Concretely, the goal is to make HTTP server that execute k-means’s prediction. About machine learning task, it is usual to set the learned model to HTTP server and post the data to that. So, as a first step of it on Julia, I'll try it. The package used here is HTTP.jl.
Here, I'll just touch the initial step and won’t follow the good or proper manner. When you make the HTTP server for machine learning task, I strongly recommend that you read the official document after this article.



Try sample

On Julia, we have some choices for HTTP server. On this article, I'll use one of them, HTTP.jl.
At first, I'll check the server side. The following code is from the official site’s sample code, although some lines are omitted. We can save this as server.jl.

using HTTP

HTTP.listen() do request::HTTP.Request
    try
        return HTTP.Response("Hello")
    catch e
        return HTTP.Response(404, "Error: $e")
    end
end

On your terminal, you can launch the server.

julia server.jl
I- Listening on: 127.0.0.1:8081

Next, client side. As an initial step, I'll send simple HTTP Request Message and check the return. By using HTTP.request(), we can use GET and POST.
Let's use GET method and check the response.

julia> using HTTP

julia> res = HTTP.request("GET", "http://localhost:8081")
HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Transfer-Encoding: chunked

Hello"""

The type of the response is HTTP.Messages.Response. It has five elements.

julia> typeof(res)
HTTP.Messages.Response

julia> fieldnames(res)
5-element Array{Symbol,1}:
 :version
 :status
 :headers
 :body
 :request

As a response, we expect that the body is “HELLO”. We can check by accessing the body element. The response is the Array of UInt8. By using String(), we can see the expected output.

julia> res.body
5-element Array{UInt8,1}:
 0x48
 0x65
 0x6c
 0x6c
 0x6f

julia> String(res.body)
"Hello"

Embed function to calculate 2x

To the server side, I'll set the arbitrary function. As an example, I'll make it return the value which is two times as the value thrown from client side.
The server side code is below. About the message parse, probably there are some methods. It is actually better to check.

using HTTP

twoTimes = function(x)
        return 2x
    end

HTTP.listen() do request::HTTP.Request

    body = parse(Float64, String(request.body))
    try
        return HTTP.Response(string(twoTimes(body)))
    catch e
        return HTTP.Response(404, "Error: $e")
    end
end

After launching the server, from your terminal, post the value “3”. And the response body is “6”.

julia> res = HTTP.request("POST", "http://localhost:8081", [], "3")
HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Transfer-Encoding: chunked

6.0"""

julia> res.body
3-element Array{UInt8,1}:
 0x36
 0x2e
 0x30

julia> String(res.body)
"6.0"

Embed k-means

Finally, I'll embed the predict function of k-means to the server. First of all, we need to touch the “prediction” of k-means. About k-means itself, please check the following article.
On k-means, as a consequence of the clustering, we can get the centroids of the clusters. So, on the prediction phase, the nearest centroid from the incoming data point indicates the cluster the data point belongs to.

Roughly, if the server side program has the centroids information, by calculating the distance between the incoming data point and them, we can predict the cluster the data point belong to.
You can get the k-means code I’ll use here from the following repository.
If you clone the code above, the preparation is okay.
We will follow the following steps.

  1. make data for k-means
  2. k-means clustering
  3. embed k-means information to server side
  4. post the new data point from client side

make data for k-means

The following code is to make the data and load the necessary packages.

include("./Clustering/src/kmeans.jl")

using DataFrames
using Distributions
using PyPlot

function makeData()
    groupOne = rand(MvNormal([10.0, 10.0], 10.0 * eye(2)), 100)
    groupTwo = rand(MvNormal([0.0, 0.0], 10 * eye(2)), 100)
    groupThree = rand(MvNormal([15.0, 0.0], 10.0 * eye(2)), 100)
    return hcat(groupOne, groupTwo, groupThree)'
end

data = makeData()
scatter(data[1:100, 1], data[1:100, 2], color="blue")
scatter(data[101:200, 1], data[101:200, 2], color="red")
scatter(data[201:300, 1], data[201:300, 2], color="green")
enter image description here
As you can see, the data is composed of three distributions.

k-means clustering

We can do clustering by following code.

result = kMeans(DataFrame(data), 3)
The returned values is the composite type. By fieldnames() function, we can check the fields.

fieldnames(result)
6-element Array{Symbol,1}:
 :x             
 :k             
 :estimatedClass
 :centroids     
 :iterCount     
 :costArray    

The outcome of clustering is as following. As you know, the data is composed of three groups, groupOne, groupTwo and groupThree. Here, the correspondence is as below.
  • groupOne: 3
  • groupTwo: 1
  • groupThree: 2
I didn't set the seed. So, if you do execute the code by yourself, the outcome can be different.

println(result.estimatedClass)
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

Only information the server side needs to have is the centroids. As a default, the centroids field keeps all the information over iterations.

println(result.centroids)
Array[Array{Float64,1}[[2.39863, -3.66735], [14.4972, 1.66306], [4.40598, 6.32293]], Array{Float64,1}[[-0.278207, -1.37692], [14.5054, 2.35933], [6.62845, 8.77454]], Array{Float64,1}[[-0.339455, -0.343924], [14.5123, 1.29915], [9.11203, 10.33]], Array{Float64,1}[[-0.144399, 0.0686747], [14.5367, 0.318053], [10.3518, 10.4564]], Array{Float64,1}[[-0.0691016, 0.197866], [14.5357, -0.00769022], [10.6975, 10.3584]], Array{Float64,1}[[-0.0691016, 0.197866], [14.3853, -0.188593], [10.9187, 10.3307]], Array{Float64,1}[[-0.062882, 0.300898], [14.3853, -0.188593], [11.0214, 10.329]], Array{Float64,1}[[-0.062882, 0.300898], [14.3853, -0.188593], [11.0214, 10.329]]]
What we need is only the final result.
println(result.centroids[end])
Array{Float64,1}[[-0.062882, 0.300898], [14.3853, -0.188593], [11.0214, 10.329]]

embed k-means information to server side

In the practical situation, we should make appropriate system to embed centroids information to server side. But, here, this is not the point. So, I'll write the centroids information directly.

On the server side, it calculates the distance between the incoming data point and centroids. The nearest centroid’s index which is corresponding to the cluster the data point belong to will be returned as a response.

using HTTP
include("./Clustering/src/kmeans.jl")

centroids = [[-0.062882, 0.300898], [14.3853, -0.188593], [11.0214, 10.329]]

function findNearestCentroid(centroids, dataPoint)
    distances = []
    for centroid in centroids
        push!(distances, calcDist(centroid, dataPoint))
    end
    return indmin(distances)
end

HTTP.listen() do request::HTTP.Request

    body = parse.(Float64, split(String(request.body), ","))
    try
        return HTTP.Response(string(findNearestCentroid(centroids, body)))
    catch e
        return HTTP.Response(404, "Error: $e")
    end
end

post the new data point from client side

The following three data points are for prediction. The variable's names are corresponding the expected clusters.

three = rand(MvNormal([10.0, 10.0], 10.0 * eye(2)), 1)
one = rand(MvNormal([0.0, 0.0], 10 * eye(2)), 1)
two = rand(MvNormal([15.0, 0.0], 10.0 * eye(2)), 1)

@show(one)
@show(two)
@show(three)
one = [-2.80662; -0.859332]
two = [8.60476; -4.64956]
three = [12.3454; 11.0062]

If we post the data points, the predictions are done on the server. From the response, we can see the prediction are properly done.

julia> res = HTTP.request("POST", "http://localhost:8081", [], "-2.80662,-0.859332")
HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Transfer-Encoding: chunked

1"""

julia> res = HTTP.request("POST", "http://localhost:8081", [], "8.60476,-4.64956")
HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Transfer-Encoding: chunked

2"""

julia> res = HTTP.request("POST", "http://localhost:8081", [], "12.3454,11.0062")
HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Transfer-Encoding: chunked

3"""