Thursday, September 7, 2017

Observation of Beta distribution

Overview


PRML(Pattern Recognition and Machine leaning) is one of the best text books about machine learning. I review it from time to time.
As one of the reviews, I write down some notes of part of it.
This article is about Beta distribution, which is shown on chapter 2 on PRML.




Beta distribution


Simple characteristics of Beta distribution is as followings.
  • Probability density function
  • Mean
  • Variance

Random number generated by Beta distribution


On Python, you can generate random numbers which follow Beta distribution by numpy.

import numpy as np

print(np.random.beta(1,1, 100))

For example, the code above generates 100 random numbers which follow Beta distribution with hyper parameters, a=1 and b=1.

[  5.34856926e-01   7.07345454e-01   5.61466576e-01   6.91701656e-01
   1.54850335e-01   1.28053789e-01   4.23788060e-01   2.01342509e-01
   4.87994055e-01   3.57029497e-01   7.38978586e-04   1.63660589e-01
   9.90037697e-01   8.85320239e-01   2.44763962e-01   2.61206714e-01
   4.82513684e-01   5.21427679e-03   9.83831172e-02   3.33494117e-01
   8.04820284e-01   3.37816238e-01   4.71796756e-01   2.85765170e-01
   1.54595224e-01   9.15739786e-01   8.02060768e-01   4.01245774e-01
   4.21217704e-01   1.14193305e-01   3.66476060e-01   1.58499684e-01
   5.20226412e-01   6.41653581e-01   6.94521597e-01   4.75622825e-01
   5.07729756e-01   9.73502745e-01   6.49962896e-01   7.84912916e-01
   7.95714239e-01   7.08042942e-01   6.86251509e-01   9.02642258e-01
   5.19977005e-01   1.75804558e-01   8.67964785e-01   1.50810249e-01
   7.23541197e-01   7.90211326e-01   5.31006768e-01   3.58911478e-01
   2.25396593e-01   6.04231553e-01   8.01812744e-01   2.85428115e-01
   1.37861181e-01   8.37206500e-01   4.85577760e-01   1.48057230e-01
   3.78959016e-01   6.08834111e-01   9.39663123e-01   2.37056929e-01
   8.04156806e-01   8.22368721e-01   1.66089581e-01   5.91654121e-01
   7.24688266e-01   4.63584556e-01   5.10722604e-01   6.61874693e-01
   1.00639938e-01   2.02431414e-02   5.95295899e-01   6.96220808e-01
   6.32929860e-01   1.76818310e-01   4.32867586e-01   9.54770764e-01
   5.39987359e-01   3.52098728e-01   1.40169119e-01   2.18503233e-01
   4.17898264e-01   9.17939592e-01   1.53984277e-01   2.89736212e-01
   3.95020553e-01   4.91880034e-01   2.57430439e-01   9.18114654e-01
   8.84602378e-01   5.35798499e-01   4.99493615e-01   7.84179023e-01
   7.20292707e-01   6.02798162e-01   2.91095549e-02   7.10361744e-01]

Let’s try some hyper parameters which are shown on PRML. This time I generated random numbers followings the distribution and plot the histograms.

%matplotlib inline
import matplotlib.pyplot as plt
plt.title('a=0.1 b=0.1')
plt.hist(np.random.beta(0.1, 0.1, 10000))
plt.show()
plt.title('a=1 b=1')
plt.hist(np.random.beta(1, 1, 10000))
plt.show()
plt.title('a=2 b=3')
plt.hist(np.random.beta(2, 3, 10000))
plt.show()
plt.title('a=8 b=4')
plt.hist(np.random.beta(8, 4, 10000))
plt.show()



Although the images on PRML are distribution’s graph, the random number’s frequencies are almost same.

To observe the changes by the hyper parameters, a and b, the code below plotted the histograms whose row means the change of parameter a and columns does the change of parameter b.

i = 1
fig = plt.figure(figsize=(50, 50))
for a in range(1, 11):
    for b in range(1, 11):
        fig.add_subplot(10, 10, i)
        plt.hist(np.random.beta(a, b, 10000), alpha=0.5)
        i += 1



Visually, you can see the following points.
  • Uniform distribution is special case of Beta distribution.
  • When hyper parameter a is equal to b, the plot becomes symmetric.