Learn Data Science using Python Step by Step

Here is how you can learn Python step by step

  1. Setup Python environment
  2. How to start jupyter notebook
  3. Install and check Packages
  4. Arithmetic operations
  5. Comparison or logical operations
  6. Assignment and augmented assignment in Python
  7. Variables naming conventions
  8. Types of variables in Python and typecasting
  9. Python Functions
  10. Exception handling in Python
  11. String manipulation and indexing
  12. Conditional and loops in Python
  13. Python data structure and containers
  14. Introduction to Python Numpy
  15. Introduction to Python SciPy
  16. Introduction to Python Pandas
  17. Python pivot tables
  18. Pandas join tables
  19. Missing value treatment
  20. Dummy coding of categorical variables 
  21. Basic statistics and visualization
  22. Data standardization or normalization
  23. Linear Regression with scikit- learn (Machine Learning library)
  24. Logistic Regression with scikit- learn (Machine Learning library)
  25. Hierarchical clustering with Python
  26. K-means clustering with Scikit Python
  27. Decision trees using Scikit Python
  28. Principal Component Analysis (PCA) using Scikit Python- Dimension Reduction
  29. Linear Discriminant Analysis (LDA) using Scikit Python- Dimension Reduction and Classification
  30. Market Basket Analysis or Association Rules or Affinity Analysis or Apriori Algorithm
  31. Recommendation Engines using Scikit-Surprise
  32. Price Elasticity of Demand using Log-Log Ordinary Least Square (OLS) Model
  33. Timeseries Forecasting using Facebook Prophet Package
  34. Deep Learning- Introduction to deep learning and environment setup
  35. Deep Learning- Multilayer perceptron (MLP) in Python
  36. Other topics (coming soon)


Time-series Forecasting Using Facebook Prophet Package

Forecasting is a technique that is used for a variety of different purposes and situations such as sales forecasting, operational and budget planning etc. Similarly there are a variety of different techniques such as moving averages, smoothening, ARIMA etc to statistically make a forecast.

In this article we will talk about an open source package called “Prophet” from Facebook which takes away the complexity of other techniques without compromising on accuracy of the forecast. The guiding principle of this approach is General Additive Models (GAMs). More on which can be found over here.

Let’s look at an example of how to deploy Prophet in Python.



Fundamentals of Deep Learning and Artificial Intelligence

Here are some good links that ought to give you a broader context of what is machine learning, deep learning, artificial intelligence etc.

Is machine learning learning same as deep learning?

HBR article explaining what is machine learning and deep learning

Opportunities and challenges in AI

Introduction to neural network and deep learning

Setup deep learning environment in Python

Deep learning with Keras

Free book

Tensorflow Playground


How to Start Jupyter Notebook From Anaconda Prompt

Jupyter Notebook can be started using many ways, most common ones are-

  1. From the Windows or Mac search interface. Type “Jupyter Notebook” and it should show you to application to start
  2. From Anaconda prompt by typing “jupyter notebook” at the anaconda prompt
  3. For high graphics display such as with plotly package, you are advised to start the jupyter notebook using the following command- “jupyter notebook –NotebookApp.iopub_data_rate_limit=1e10”
Jupyter Notebook Start from Anaconda

Jupyter Notebook Start from Anaconda for High Resolution Graphics

otherwise you can get an error message similar to the one shown below-

IOPub data rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_data_rate_limit`.

Python Error Message for High Graphics Images


Markov Chains and Markov Model

Markov chains or Markov models are statistical sequential models leveraging probability concepts to predict outcome. This model is named after a Russian mathematician Andrey Markov.  The main guiding principle for Markov chains is that the probability of certain events to happen in the future depend on past events. They can be broken down into different categories-

  • First order Markov model- probability of next event depends on the current event
  • Second order Markov model- probability of the next event depends on the current and the previous event
  • A Hidden Markov Model ( HMM) is where the previous states from which the current state is generated are hidden or unobserved or unknown.

and so on…

Markov models have been proven to be very effective in sequential modeling such as-

  • Speech recognition
  • Handwriting recognition
  • Stock market forecasting
  • Online sales attribution to channels


Here is a pictorial depiction of HMM-



This link has a very good visual explanation of the Markov Models and guiding principles.

R has an in built package called “ChannelAttribution” for solving online multi channel attribution. This package has also an excellent explanation of the Markov Model and working example.

Python also has a library to build Markov models in Python.


Price Elasticity of Demand

Price elasticity of demand (PED) is a measure that has been used in econometric to show how demand of a particular product changes when the price of the product is changed. More particularly, it measures the % change in demand of a product when the price changes by 1%.

It can be expressed as the following formula-


Let’s look at example- Let’s say that demand of a particular Bluetooth headset decreases by 2% when the price is increased by 1%. In this case the PED will be defined as = -2%/1% or -2.

Now, let’s talk about how we interpret PED-

PED of greater than 1 (absolute value) shows highly elastic product. In other words, the change in price will cause a more than proportionate change in demand. This is generally the case with non-essential or luxury products such as the example shown above. On the other hand, PED of less than 1 shows relatively inelastic products such as groceries and daily necessities. Furthermore, for most product PED will be negative, i.e. when the price is increased demand falls.

There are few other practical applications of PED that we should be aware of-

  • PED for a given product or product category can change over time and hence it’s imperative to measure PED over of time.
  • PED for a given product or product category can vary by customer segments. For example, low income customers may have higher PED for the same product
  • Pricing of a product should be optimized taking in account the PED. For example, if a product is showing lower price elasticity or inelasticity, pricing can be increased on the product to maximize revenue

Here is an article that gives some examples from the retail world.

Let’s now step into how we can estimate PED in Python. For this, we will working with the beef price and demand data from USDA Red Meat Yearbook-


You can download the data from here

We will be building a log-log linear model to estimate PED. Please see here for the theoretical discussion on this topic. The coefficient from the log-log linear model shows the PED between two factors.

Let the Python show begin! In the below example PED comes out to be -0.53. It shows that when the price of beef is increased by 1% the demand for beef falls by 0.53%





Recommender Engines

Recommendation engines or systems are all around us. Few common examples are-

  • Amazon- People who buy this also buy this or who viewed this also viewed this
  • Facebook- Friends recommendation
  • Linkedin- Jobs that match you or network recommendation or who viewed this profile also viewed this profile
  • Netflix- Movies recommendation
  • Google- news recommendation, youtube videos recommendation

and so on…

The main objective of these recommendation systems is to do following-

  • Customization or personalizaiton
  • Cross sell
  • Up sell
  • Customer retention
  • Address the “Long Tail” phenomenon seen in Online stores vs Brick and Mortar stores


There are three main approaches for building any recommendation system-

  • Collaborative Filtering

Users and items matrix is built. Normally this matrix is sparse, i.e. most of the cells will be empty. The goal of any recommendation system is to find similarities among the users and items and recommend items which have high probability of being liked by a user given the similarities between users and items.

Similarities between users and items can be assessed using several similarity measures such as Correlation, Cosine Similarities, Jaccard Index, Hamming Distance. The most commonly used similarity measures are Cosine Similarity and Jaccard Index in a recommendation engine

  • Content Based-

This type of recommendation engine focuses on finding the characteristics, attributes, tags or features of the items and recommend other items which have some of the same features. Such as recommend another action movie to a viewer who likes action movies.

  • Hybrid- 

These recommendation systems combine both of the above approaches.

Read more here

Build Recommendation System in Python using ” Scikit – Surprise”-

Now let’s switch gears and see how we can build recommendation engines in Python using a special Python library called Surprise.

This library offers all the necessary tools such as different algorithms (SVD, kNN, Matrix Factorization),  in built datasets, similarity modules (Cosine, MSD, Pearson), sampling and models evaluations modules.

Here is how you can get started

  • Step 1- Switch to Python 2.7 Kernel, I couldn’t make it work in 3.6 and hence needed to install 2.7 as well in my Jupyter notebook environment
  • Step 2- Make sure you have Visual C++ compilers installed on your system as this package requires Cython Wheels. Here are couple of links to help you in this effort

Please note that if you don’t do the Step 2 correctly, you will get errors such as shown below – ” Failed building wheel for Scikit-surprise” or ” Microsoft Visual C++ 14 is required”c1c2

  • Step 3- Install Scikit- Surprise. Please make sure that you have Numpy installed before this

pip install numpy

pip install scikit-surprise

  • Step 4- Import scikit-surprise and make sure it’s correctly loaded

from surprise import Dataset

  • Step 5- Follow along the below examples


Getting Started

Movie Example



Market Basket Analysis or Association Rules or Affinity Analysis or Apriori Algorithm

First of all, if you are not familiar with the concept of Market Basket Analysis (MBA), Association Rules or Affinity Analysis and related metrics such as Support, Confidence and Lift, please read this article first.

Here is how we can do it in Python. We will look at two examples-

Example 1-

Data used for this example can be found here Retail_Data.csv


Example 2-



Linear Discriminant Analysis ( LDA) with Scikit

Linear Discriminant Analysis (LDA) is similar to Principal Component Analysis (PCA) in reducing the dimensionality. However, there are certain nuances with LDA that we should be aware of-

  • LDA is supervised (needs categorical dependent variable) to provide the best linear combination of original variables while providing the maximum separation among the different groups. On the other hand, PCA is unsupervised
  • LDA can be used for classification also, whereas PCA is generally used for unsupervised learning
  • LDA doesn’t need the numbers of discriminant to be passed on ahead of time. Generally speaking the number of discriminant will be lower of the number of variables or number of categories-1.
  • LDA is more robust and can be conducted without even standardizing or normalizing the variables in certain cases
  • LDA is preferred for bigger data sets and machine learning

Let the action begin now-



Principal Component Analysis ( PCA) using Scikit

Principal Component Analysis ( PCA) is generally used as an unsupervised algorithm for reducing the data dimensions to address Curse of Dimensionality, detecting outliers, removing noise, speech recognition and other such areas.

The underlying algorithm in PCA is generally a linear algebra technique called Singular Value Decomposition (SVD). PCAs take the original data and create orthogonal components (uncorrelated components) that capture the information contained in the original data however with significantly less number of components.

Either the components themselves or  key loading of the components can be plugged in any further modeling work, rather than the original data to minimize information redundancy and noise.

There are three main ways to select the right number of components-

  1. Number of components should explain at least 80% of the original data variance or information [Preferred One]
  2. Eigen value of each PCA component should be more than or equal to 1. This means that they should express at least one variable worth of information
  3. Elbow or Scree method- look for the elbow in the percentage of variance explained by each components and select the components where an elbow or kink is visible.

You can use any one of the above or combination of the above to select the right number of components. It is very critical to standardize or normalize data before conducting PCA.

In the below case study we will use the first criterion shown above, i.e. 80% or more of the original data variance should be explained by the selected number of components.