Price elasticity of demand (PED) is a measure that has been used in econometric to show how demand of a particular product changes when the price of the product is changed. More particularly, it measures the % change in demand of a product when the price changes by 1%.
It can be expressed as the following formula-
Let’s look at example- Let’s say that demand of a particular Bluetooth headset decreases by 2% when the price is increased by 1%. In this case the PED will be defined as = -2%/1% or -2.
Now, let’s talk about how we interpret PED-
PED of greater than 1 (absolute value) shows highly elastic product. In other words, the change in price will cause a more than proportionate change in demand. This is generally the case with non-essential or luxury products such as the example shown above. On the other hand, PED of less than 1 shows relatively inelastic products such as groceries and daily necessities. Furthermore, for most product PED will be negative, i.e. when the price is increased demand falls.
There are few other practical applications of PED that we should be aware of-
PED for a given product or product category can change over time and hence it’s imperative to measure PED over of time.
PED for a given product or product category can vary by customer segments. For example, low income customers may have higher PED for the same product
Pricing of a product should be optimized taking in account the PED. For example, if a product is showing lower price elasticity or inelasticity, pricing can be increased on the product to maximize revenue
Here is an article that gives some examples from the retail world.
Let’s now step into how we can estimate PED in Python. For this, we will working with the beef price and demand data from USDA Red Meat Yearbook-
We will be building a log-log linear model to estimate PED. Please see here for the theoretical discussion on this topic. The coefficient from the log-log linear model shows the PED between two factors.
Let the Python show begin! In the below example PED comes out to be -0.53. It shows that when the price of beef is increased by 1% the demand for beef falls by 0.53%
The main objective of these recommendation systems is to do following-
Customization or personalizaiton
Address the “Long Tail” phenomenon seen in Online stores vs Brick and Mortar stores
60% of video watch time on Youtube is driven by the recommendation engine.
How do we build a Recommendation Engine?
There are three main approaches for building any recommendation system-
Users and items matrix is built. Normally this matrix is sparse, i.e. most of the cells will be empty and hence some sort of matrix factorization ( such as SVD) is used to reduce dimensions. More on matrix factorization will be discussed later in this article.
The goal of these recommendation system is to find similarities among the users and items and recommend items which have high probability of being liked by a user given the similarities between users and items.
Similarities between users and items embeddings can be assessed using several similarity measures such as Correlation, Cosine Similarities, Jaccard Index, Hamming Distance. The most commonly used similarity measures are dotproducts, Cosine Similarity and Jaccard Index in a recommendation engine
These algorithms don’t require any domain expertise (unlike Content Based models) as it requires only a user and item matrix and related ratings/feedback and hence these algorithms can make a recommendation about an item to a user as long it can identify similar users and item in the matrix .
The flip side of these algorithms is that they may not be suitable for making recommendations about a new item that was not there in the user / item matrix on which the model was trained.
This type of recommendation engine focuses on finding characteristics, attributes, tags or features of the items and recommend other items which have some of the same features. Such as, recommend another action movie to a viewer who likes action movies.
Since this algorithm uses features of a product or service to make recommendations, this offers advantage of referring unique or niche items and can be scaled to make recommendations for a wide array of users. On the other hand, defining product features accurately will be key to success of these algorithms.
These recommendation systems combine both of the above approaches.
Build Recommendation System in Python using ” Scikit – Surprise”-
Now let’s switch gears and see how we can build recommendation engines in Python using a special Python library called Surprise. In this exercise, we will build a Collaborative Filtering algorithm using Singular Value Decomposition (SVD) for dimension reduction of a large User-Item Sparse matrix to provide more robust recommendations while avoiding computational complexity.
Here is how you can get started
Step 1- Please make sure that Anaconda and other packages such as Numpy are up to date
Step 2- Make sure you have Visual C++ compilers installed on your system as this package requires Cython Wheels. Here are couple of links to help you in this effort
Please note that if you don’t do the Step 2 correctly, you will get errors such as shown below – ” Failed building wheel for Scikit-surprise” or ” Microsoft Visual C++ 14 is required”
Step 3- Install Scikit- Surprise. Please make sure that you have Numpy installed before this
pip install numpy
pip install scikit-surprise
Step 4- Import scikit-surprise and make sure it’s correctly loaded
For sake of simplicity, you can also use Google Colab to work on the below example-
Let’s import Movielens small dataset for the purpose of building couple of Recommendation Engines using KNN and SVD algorithms. Please note the that the Surprise package offers many- many more algorithms to choose from. Data can be found at the link-https://grouplens.org/datasets/movielens/
Download the zip files and you will see the following files that you can import in Python to explore. However, for the purpose of CF models, we only need the ratings.csv file.
Here are some key steps that we will follow to build Recommendation Engine for this data
Install Scikit Surprise and Pandas Profiling Packages
Import necessary packages
Type Magic command to print multiple statements on a same line
First of all, if you are not familiar with the concept of Market Basket Analysis (MBA), Association Rules or Affinity Analysis and related metrics such as Support, Confidence and Lift, please read this article first.
Here is how we can do it in Python. We will look at two examples-
Linear Discriminant Analysis (LDA) is similar to Principal Component Analysis (PCA) in reducing the dimensionality. However, there are certain nuances with LDA that we should be aware of-
LDA is supervised (needs categorical dependent variable) to provide the best linear combination of original variables while providing the maximum separation among the different groups. On the other hand, PCA is unsupervised
LDA can be used for classification also, whereas PCA is generally used for unsupervised learning
LDA doesn’t need the numbers of discriminant to be passed on ahead of time. Generally speaking the number of discriminant will be lower of the number of variables or number of categories-1.
LDA is more robust and can be conducted without even standardizing or normalizing the variables in certain cases
LDA is preferred for bigger data sets and machine learning
Principal Component Analysis ( PCA) is generally used as an unsupervised algorithm for reducing the data dimensions to address Curse of Dimensionality, detecting outliers, removing noise, speech recognition and other such areas.
The underlying algorithm in PCA is generally a linear algebra technique called Singular Value Decomposition (SVD). PCAs take the original data and create orthogonal components (uncorrelated components) that capture the information contained in the original data however with significantly less number of components.
Either the components themselves or key loading of the components can be plugged in any further modeling work, rather than the original data to minimize information redundancy and noise.
There are three main ways to select the right number of components-
Number of components should explain at least 80% of the original data variance or information [Preferred One]
Eigen value of each PCA component should be more than or equal to 1. This means that they should express at least one variable worth of information
Elbow or Scree method- look for the elbow in the percentage of variance explained by each components and select the components where an elbow or kink is visible.
You can use any one of the above or combination of the above to select the right number of components. It is very critical to standardize or normalize data before conducting PCA.
In the below case study we will use the first criterion shown above, i.e. 80% or more of the original data variance should be explained by the selected number of components.
If you are not familiar with logistics regression, please read this article first. Moreover, if you are not familiar with the sklearn machine learning model building process, please read this article also.
Converting categorical variables into numerical dummy coded variable is generally a requirement in machine learning libraries such as Scikit as they mostly work on numpy arrays.
In this blog, let’s look at how we can convert bunch of categorical variables into numerical dummy coded variables using four different methods-
Scikit learn preprocessing LabelEncoder
We will work with a dataset from IBM Watson blog as this has plenty of categorical variables. You can find the data here. In this data, we are trying to build a model to predict “churn”, which has two levels “Yes” and “No”.
We will convert the dependent variable using Scikit LabelEncoder and the independent categorical variables using Pandas getdummies. Please note that LabelEncoder will not necessarily create additional columns, whereas the getdummies will create additional columns in the data. We will see that in the below example-
Here are few other ways to dummy coding-
Here is an excellent Kaggle Kernel for detailed feature engineering.
As highlighted in the article, clustering and segmentation play an instrumental role in Data Science. In this blog, we will show you how to build a Hierarchical Clustering with Python.
For this purpose, we will work with a R dataset called “Cheese”. Please install package called “Bayesm” in R and export this data set in csv format to be imported in Python. More on this dataset can be found here.