Exploratory Data Analysis (EDA) using Panda-Profiling Package

In this article, we will talk about how to do simple, fast and yet very powerful exploratory data analysis (EDA) to understand pattern in your data before doing more elaborate analyses such as customized EDA or modeling.

First thing, we need to do is to install the package as it doesn’t come with the default installation. In this example, we will use Google Colab. Since we already have this package installed, it says all requirements are already satisfied. However, if you don’t have this package installed, then it should install it. For more on this package and related dependencies, please refer to the Git link below.

https://github.com/pandas-profiling/pandas-profiling

Once we have installed the package, it’s time to import the package in the environment, import ‘seaborn’ library and load Titanic dataset that is in-built in the seaborn library. Then we do the head()- top 5 rows of the data.

Here is how the data looks like-

Now let’s do the EDA using the package that we just imported. We can either print the output in the notebook environment or save to an HTML file that can be downloaded and shared with anyone. Here we will leverage google.colab download files option to download the file.

Here are few snippets on how the default output from this report looks like-

Thanks for reading. Please share with others.