Monday, October 23, 2017

How to complement missing values in data on Python

As data pre-processing, we frequently need to deal with missing values. There are some ways to deal with those and one of them is to complement those by representative values.

On Python, by scikit-learn, we can do it.
I'll use air quality data to try it.

To prepare the data, on R console, execute the following code on your working directory.

write.csv(airquality, "airquality.csv", row.names=FALSE)


Friday, October 20, 2017

TensorFlow Machine Learning Cookbook:book memo




Recently, TensorFlow Machine Learning Cookbook has been published in Japan and I took one.

Wednesday, October 18, 2017

Image generator of Keras: to make neural network with little data

Keras has image generator which works well when we don’t have enough amount of data. I’ll try this by simple example.

Overview


To make nice neural network model about images, we need much amount of data. In many cases, the shortage of data can be one of the big obstacles for goodness.
Keras has image generator and it can solves the problem.

Monday, October 16, 2017

InceptionV3 Fine-tuning model: the architecture and how to make

Overview

InceptionV3 is one of the models to classify images. We can easily use it from TensorFlow or Keras.
On this article, I’ll check the architecture of it and try to make fine-tuning model.

There are some image classification models we can use for fine-tuning.
Those model’s weights are already trained and by small steps, you can make models for your own data.

About the fine-tuning itself, please check the followings.

Or TensorFlow and Keras have nice documents of fine-tuning.

From TensorFlow
From Keras

Sunday, October 15, 2017

How to interpret the summary of linear regression with log-transformed variable

How should we interpret the coefficients of linear regression when we use log-transformation?

On the area of econometrics and data science, we sometimes use log-transformed weights for linear regression. Usually, one of the advantages of linear regression is that we can easily interpret the outcome. But by log-transformation, how should we interpret the outcome?

Overview


In many cases, we adopt linear regression to analyze data. That lets us understand how influential each feature is.

So when we use it, to make the way of interpretation easy, we want as simple features as possible. If you transform the features, you need to adjust your interpretation to that.


Friday, October 13, 2017

I got started with JupyterLab

I just got started with JupyterLab.

From the official page, JupyterLab is
An extensible environment for interactive and reproducible computing, based on the Jupyter Notebook and Architecture.

As you know, Jupyter is very useful tool. To use it efficiently directly improves your efficiency of work. JupyterLab showed the possibility at least to me.

Thursday, October 12, 2017

Hierarchical Bayesian model's parameter Interpretation on Stan

Usually, Hierarchical Bayesian model has many parameters. So apparently, the interception to the sampled point’s statistical information looks complex.

On the article below, I made a Hierarchical Bayesian model to the artificial data. Here, by using almost same but simpler data, I’ll make a model and try to interpret.

Hierarchical Bayesian model by Stan: Struggling

I'll try to make Hierarchical Bayesian model to the artificial data by Stan. Hierarchical Bayesian model lets us write the model with a high degree of freedom.

Wednesday, October 11, 2017

Hierarchical Bayesian model by Stan: Struggling

I’ll try to make Hierarchical Bayesian model to the artificial data by Stan. Hierarchical Bayesian model lets us write the model with a high degree of freedom.

From Wikipedia,
Bayesian hierarchical modelling is a statistical model written in multiple levels (hierarchical form) that estimates the parameters of the posterior distribution using the Bayesian method.[1] The sub-models combine to form the hierarchical model, and the Bayes’ theorem is used to integrate them with the observed data, and account for all the uncertainty that is present. The result of this integration is the posterior distribution, also known as the updated probability estimate, as additional evidence on the prior distribution is acquired.

Tuesday, October 10, 2017

Bayesian modeling to data with heteroscedasticity by Stan

Before, I wrote about the data with heteroscedasticity.


What is heteroscedasticity and How to check it on R

Linear regression with OLS is simple and strong method to analyze data. By the coefficients, we can know the influence each variables have. Although it looks easy to use linear regression with OLS because of the simple system from the viewpoint of necessary code and mathematics, it has some important conditions which should be kept to get proper coefficients and characteristics.

How to deal with heteroscedasticity

On the article below, I wrote about heteroscedasticity. Linear regression with OLS is simple and strong method to analyze data. By the coefficients, we can know the influence each variables have. Although it looks easy to use linear regression with OLS because of the simple system from the viewpoint of necessary code and mathematics, it has some important conditions which should be kept to get proper coefficients and characteristics.
This time, I’ll make the model again but with Python and Stan.

Sunday, October 8, 2017

Bayesian multiple regression by Stan

Overview

On the article, Simple Bayesian modeling by Stan, I made a simple linear regression by Stan and PyStan. So, as an extension of it, I made multiple regression model on the same manner to show how to do Bayesian modeling roughly.

Saturday, October 7, 2017

Simple Bayesian modeling by Stan

Overview

About Bayesian modeling, we can use some languages and tools. BUGS, PyMC, Stan. On this article, I made simple regression model by using Stan from Python.


Tuesday, October 3, 2017

Similar image finder by CNN and Distance

Overview

On this article, I’ll show one of the methods to find similar images to some specific target image.

Usually, when we try to make the system to find some similar items, we have some choices and should choose one or some of them in response to the purpose. Here, I’ll adapt distance-based method using supervised learning model’s prediction.

For example, when we try to find the images which are similar to the leftmost image, the other images below are picked up by this.
enter image description here

Sunday, October 1, 2017

Perceptron by scikit-learn

Overview

I sometimes use Perceptron, one of the machine learning algorithms, as practice of algorithm writing from scratch.

But in many cases, it is highly recommended to use machine learning library. Although there are not many cases in practice that we use Perceptron, it is not wasted to know how to write Perceptron by the library, concretely scikit-learn.

On this article, I’ll show how to write Perceptron by scikit-learn.