Friday, June 29, 2018

An insight into AUC objective function from the viewpoint of evaluation

Abstract

This article is to think about the model which is with AUC objective function by some evaluation methods. Also, I can say this is to think about the AUC objective function from the viewpoint of evaluation as the title of the article shows.
On the article, AUC as an objective function: with Julia and Optim.jl package, I made a model with AUC objective function. The predicted score by that was distributed in really narrow area, because AUC objective function is based on the order without caring the distance from explained variable. With some evaluation norms, the model's score seems not nice.
 
About this point, just in case, I'll leave the simple experiment.

Julia: version 0.6.3




Data

This is easy and simple experiment. So, here, I'll use IRIS data set, which is easy to access and many people are used to using.

using RDatasets

iris = dataset("datasets", "iris")

# standardization
for col in names(iris)
    if col == :Species
        continue
    end
    iris[col] = (iris[col] - mean(iris[col]))/std(iris[col])
end
iris[:Species] = map(x->x=="versicolor"?1:0, Array(iris[:Species]))

The task is to make a model to classify IRIS's versicolor and the others.

To compare

The purpose of this article is to check how good (or bad) the model by AUC objective function is, when you evaluate the model with some evaluation methods. For the comparison, I'll prepare another model with GLM package.

using GLM

logit = glm(@formula(Species ~ SepalLength + SepalWidth + PetalLength + PetalWidth), iris, Binomial(), LogitLink())
lrPred = predict(logit, iris[[:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]])

Basically, with good manner, we should split the data into train and test ones. But, here, I don't. With the model trained by whole IRIS data, I did prediction to the same data.

AUC objective function

I'll make a model with naive AUC objective function. About the detail, please check the article below.

using Optim

global sepalLength = Array(iris[:SepalLength])
global sepalWidth = Array(iris[:SepalWidth])
global petalLength = Array(iris[:PetalLength])
global petalWidth = Array(iris[:PetalWidth])
global species = Array(iris[:Species])

function objective(w::Array{Float64})

    # logistic regression
    logistic(t) = (1 + e^(-t))^(-1)
    t = w[1] + w[2] * sepalLength + w[3] * sepalWidth + w[4] * petalLength + w[5] * petalWidth
    yPred = map(logistic, t)

    positiveIndex = find(1 .== species)
    negativeIndex = find(0 .== species)

    function opt()

        s = 0
        for pI in positiveIndex
            for nI in negativeIndex
                if yPred[pI] > yPred[nI]
                    s += 1
                end
            end
        end

        return s
    end

    return -opt()
end

aucLr = optimize(objective, [0.0, 0.0, 0.0, 0.0, 0.0])

Here, I also made a model with whole IRIS data set and do prediction to the same data.

w = aucLr.minimizer
logistic(t) = (1 + e^(-t))^(-1)
t = w[1] + w[2] * sepalLength + w[3] * sepalWidth + w[4] * petalLength + w[5] * petalWidth
aucPred = map(logistic, t)

By visualizing the score distribution of those, we can see the difference. We will check what kind of effect this difference has on the phase of evaluation.

using PyPlot

plt[:hist](Float64.(lrPred), color="blue", alpha=0.5)
plt[:hist](aucPred, color="red", alpha=0.5)

enter image description here

Evaluate with some methods

The variables, and , hold the predicted scores. I'll evaluate those with some methods. Here, I'll adapt accuracy, AUC and brier.

About AUC and brier, you can use function from the link below.
About accuracy, because the models give scores in the form with the range from 0 to 1, we need to think about the threshold. If I talk about this point, I'll deviate the way of the main theme of this article. So, by trying many thresholds, the highest accuracy will be adapted as “accuracy”, here.
Anyway, to and , I'll evaluate with those evaluation methods.

include("./MLUtils.jl/src/utils.jl")
using MLBase

function accuracy(yTruth::Array{Int}, yPred::Array{Float64})
    accuracies = []
    for i in range(0,0.001,1000)
        yPredInt = map(x->x>=i?1:0, yPred)
        cr = correctrate(yTruth, yPredInt)
        push!(accuracies, cr)
    end
    return maximum(accuracies)
end

@show auc(species, Float64.(lrPred))
@show auc(species, aucPred)
@show brier(species, Float64.(lrPred))
@show brier(species, aucPred)
@show accuracy(species, Float64.(lrPred))
@show accuracy(species, aucPred)
auc(species, Float64.(lrPred)) = 0.8258
auc(species, aucPred) = 0.8312
brier(species, Float64.(lrPred)) = 0.16281623291323666
brier(species, aucPred) = 0.24872485102698297
accuracy(species, Float64.(lrPred)) = 0.7666666666666667
accuracy(species, aucPred) = 0.76

About the accuracy and the AUC scores, there are not big difference. But we can see the difference about brier score.

The brier score is the mean squared error of the forecast in effect. It can be expressed as following. So, about brier score, the lower is better.

  • : the score that was forecast
  • : the actual outcome of the event

On this case, usual GLM package's model is better than the one with AUC objective function from the viewpoint of brier score. Although, here, I'll avoid the mathematical detail, this outcome matches our intuition. The AUC objective function uses the order of the points which belong to positive and negative. Different from likelihood function, it doesn't use the distance information between the forecast score and the actual outcome.