Monday, April 2, 2018

Note for speeding up Julia's code

Abstract

On this article, I'll re-write the Julia code, which I wrote before, for speed up.
One of the huge advantages of Julia is its performance, speed. But to make advantage of it, we need to write the code in proper rule. So, here, I’ll re-write the code to more efficient one.




Target Code

I'll fix the code from the following article.
The code was written as practice of Julia.

For Speed Up

For speed up, Julia has some tips.
  • write in function
  • set type on array
  • secure array

set type on array

On the following function, I made empty array at first and appended in “for loop”.

function predict(data::KNN, testData::DataFrames.DataFrame, k=5)
    predictedLabels = []
    for i in 1:size(testData, 1)
        sourcePoint = Array(testData[i,:])
        distances = []
        for j in 1:size(data.x, 1)
            destPoint = Array(data.x[j,:])
            distance = calcDist(sourcePoint, destPoint)
            push!(distances, distance)
        end
        sortedIndex = sortperm(distances)
        targetCandidates = Array(data.y)[sortedIndex[1:k]]
        predictedLabel = extractTop(targetCandidates)
        push!(predictedLabels, predictedLabel)
    end
    return predictedLabels
end

Here, by setting type information on array, it becomes faster.

predictedLabels = String[]
distances = Float64[]

secure array

It is slow to append to array in “for loop”. So, secure the array with length and substitute values.
The code becomes as following.

function predict(data::KNN, testData::DataFrames.DataFrame; k=5, method="euclidean")
    targetPointsNum = size(testData, 1)
    predictedLabels = Array{String}(targetPointsNum)
    for i in 1:targetPointsNum
        sourcePoint = Array(testData[i,:])
        trainPointsNum = size(data.x, 1)
        distances = Array{Float64}(trainPointsNum)
        for j in 1:trainPointsNum
            destPoint = Array(data.x[j,:])
            distance = calcDist(sourcePoint, destPoint; method=method)
            distances[j] = distance
        end
        sortedIndex = sortperm(distances)
        targetCandidates = Array(data.y)[sortedIndex[1:k]]
        predictedLabel = extractTop(targetCandidates)
        predictedLabels[i] = predictedLabel
    end
    return predictedLabels
end