Learn Data Science using Python Step by Step

Here is how you can learn Python step by step

  1. Setup Python environment
  2. How to start jupyter notebook
  3. Install and check Packages
  4. Arithmetic operations
  5. Comparison or logical operations
  6. Assignment and augmented assignment in Python
  7. Variables naming conventions
  8. Types of variables in Python and typecasting
  9. Python Functions
  10. Exception handling in Python
  11. String manipulation and indexing
  12. Conditional and loops in Python
  13. Python data structure and containers
  14. Introduction to Python Numpy
  15. Introduction to Python SciPy
  16. Introduction to Python Pandas
  17. Python pivot tables
  18. Pandas join tables
  19. Missing value treatment
  20. Dummy coding of categorical variables 
  21. Basic statistics and visualization
  22. Data standardization or normalization
  23. Linear Regression with scikit- learn (Machine Learning library)
  24. Logistic Regression with scikit- learn (Machine Learning library)
  25. Hierarchical clustering with Python
  26. K-means clustering with Scikit Python
  27. Decision trees using Scikit Python
  28. Principal Component Analysis (PCA) using Scikit Python- Dimension Reduction
  29. Linear Discriminant Analysis (LDA) using Scikit Python- Dimension Reduction and Classification
  30. Market Basket Analysis or Association Rules or Affinity Analysis or Apriori Algorithm
  31. Recommendation Engines using Scikit-Surprise
  32. Price Elasticity of Demand using Log-Log Ordinary Least Square (OLS) Model
  33. Deep Learning- Introduction to deep learning and environment setup
  34. Deep Learning- Multilayer perceptron (MLP) in Python
  35. Other topics (coming soon)


Pandas Join Tables

There are many types of joins such as inner, outer, left, right which can be easily done in Python. Let’s work with an example to go through it. More details on our example can be found here


Use keys from left frame only


Use keys from right frame only


Use union of keys from both frames


Use intersection of keys from both frames



Introduction to Python Pandas

Pandas is an open source Python library which create dataframes similar to Excel tables and play an instrumental role in data manipulation and data munging in any data science projects. Generally speaking, underlying data values in pandas is stored in the numpy array format as you will see shortly.

Let’s look at some examples-

First, let’s import a file (using read_csv) to work on. Then we will begin data exploration.  Particularly, we will be doing following in the below example-

  • Import pandas and numpy
  • Import csv file
  • Check type, shape, index and values of the dataframe
  • Display top 5 and bottom 5 rows of the data using head() and tail()
  • Generate descriptive statistics such as mean, median, percentile etc
  • Transpose dataframe
  • Sort data frame by rows and columns
  • Indexing, slicing and dicing using loc and iloc. More on this is here
  • Adding new columns
  • Boolean indexing
  • Inserting date time in the data frame







Introduction to Python SciPy

Scipy is a Python open source package used for the scientific computing across many domains such as engineering, mathematics, sciences etc. Here are some examples of Scipy.

Let’s say that that income of a company’s employees is normally distributed with mean of 10,000 USD and standard deviation of 1,000 USD. Approximately what percent of the employees will be earning 11,000 USD of salary or less?

This can be easily accomplished using SciPy.  The answer is 84.1% of employees.


We can also say that 100-84.1% or roughly 16% of employees may be earning higher than 11,000 USD.


Here in another example on how we can pick a random sample from a particular normal distribution.



Introduction to Python Numpy

Numpy is Python open source packages which make the numerical computing possible in Python using N dimensional array. This forms the foundation of other data munging and manipulation packages such as Pandas.

Let’s look at why Numpy is needed. Assume that we want to add members of two lists as shown in the below example.


As you can see from the above example, numerical computing is possible in Python largely due to Numpy.

Let’s dig deeper into other aspects on Numpy.







Python Data Structure and Containers

Python has several in built data containers to facilitate efficient data storage and retrieval. Some key ones are-

  • List
  • Tuple
  • Dictionary

Let’s look at the above types one by one

List- Lists are mutable (can be edited) and iterable data containers with homogeneous or heterogeneous data. This is one of the most commonly used data structure in Python. A list is denoted by square brackets – “ [ ]

Let’s look at some examples of lists operations-





Next, let’s do slicing and dicing of the list. This follows the same zero based indexing as strings




Tuple- Tuples operations are significantly faster than list, however tuples are immutable. Tuples are best suited for write once and read many times jobs such as big data operations. Similar to list, a tuple can store heterogeneous data.

They are defined by ” ( ) “.  Let’s look at some examples of tuples operations-


Dictionary- Similar to tuples operations, dictionary operations are significantly faster than that of lists. A dictionary is made of “Key-Value” combinations. Values are generally retrieved by providing the keys.

Dictionaries are defined by ” { } “.  Let’s look at some examples of dictionary operations-



You can find much more information on the above objects in Python Official Documentation.


Conditional and Loops in Python

Conditional and loop statements are great tools for executing codes when a certain condition is met or till the point until certain condition(s) remain true. There are may types of conditionals and loops in Python. Some key ones are-‘ if’ statement,’ for’ statement, ‘while’ statement. Here are few examples.






String Manipulation and Python Indexing

In Python strings are created by specifying text either in single quote or double quote. Furthermore, Python Index begins from 0 while going from left to right and -1 while going from right to left. We can use indexing in many different ways. Some examples are shown below. In the below example, we are creating two strings and doing slicing an dicing and other string manipulation





Exception Handling in Python

Exception handling is a graceful way to manage execution of a program and display user friendly information when errors occur during execution of the programs.

Let’s look at the below example to get a better understanding. User needs to provide a number to the below program and reverse of the number will be returned.

If the user provides a numerical input the “try block” will be executed and the “finally block” will be executed.


However, if the user provides a non numeric input, an exception error “ValueError” will happen and code will execute the first “except” block and “finally block”.


Similarly if the user provides ‘zero’ as the input, an exception error “ZeroDivisionError” will happen and code will execute the second “except” block and “finally block”.