Categorical Variables Dummy Coding

Converting categorical variables into numerical dummy coded variable is generally a requirement in machine learning libraries such as Scikit as they mostly work on numpy arrays.

In this blog, let’s look at how we can convert bunch of categorical variables into numerical dummy coded variables using two different methods-

  1. Scikit learn preprocessing LabelEncoder
  2.  Pandas getdummies

We will work with a dataset from IBM Watson blog as this has plenty of categorical variables. You can find the data here.  In this data, we are trying to build a model to predict “churn”, which has two levels “Yes” and “No”.

We will convert the dependent variable using Scikit LabelEncoder and the independent categorical variables using Pandas getdummies. Please note that LabelEncoder will not necessarily create additional columns, whereas the getdummies will create additional columns in the data. We will see that in the below example-

clf1clf2clf3clf4clf5clf6clf7

Cheers!

3 thoughts on “Categorical Variables Dummy Coding

  1. Pingback: Learn Python Step by Step | RP's Blog on data science

  2. Pingback: Logistic Regression using Scikit Python | RP's Blog on data science

  3. Pingback: Decision Tree using Python Scikit | RP's Blog on data science

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s