Here is a simple and straight forward analysis on how to use Tweepy package in Python to pull tweets. This article also demonstrates how to do several Natural Language Proecessing (NLP) on this data, thereafter.
Here is what you will need to have before we begin-
1. Install “tweepy” package by- http://docs.tweepy.org/en/v3.4.0/install.html
2. Setup Twitter developer account and copy the keys and access codes to be used later on. https://developer.twitter.com/
Here is what we will do in the article-
1. Import necessary packages for pulling tweets and doing text mining by using Natural Language Processing (NLP) packages in Python.
2. Exlore how to pull many different parameters from tweets such as- friends, followers, tweets, post tweets from Python etc. by establishing connects with Twitter developer account from within Python
3. Pull tweets by varieties of different ways- using hashtags(#), searching by Twitter handle (@), searching by words etc.
3. Clean-up text using NLP libraries
4. Perform wordcloud and sentiment analysis on text
5.
Pull the latest tweets on any topics by using the tweepy cursor method. In this we are also removing retweets. Then using the list comprehension we will display all tweets’ content-
Next, let’s explore other features of the tweets such as couting the number of tweets and displaying a particular tweet.
Next, let’s try to pull tweets by search words-
Let’s try to pull tweet by hastags-
Finding the names and locations of the users who are tweeting on a particular topic-
Let’s begin with doing more detailed NLP on 1000 tweets fetched on the subject of “india economy”
Cleanup text before conducting detailed text analysis. We will do following-
1. Remove stop words such as “will”, “and”, “I” etc.
2. Remove URLs, Twitter handle, emojis, special characters etc.
3. Bring all texts to lower case
4. Stem words to orginal form such as “Running” to “Run”
Etc.
This is how before and after cleaning tweets look like
Before
After
Let’s do some ananlysis to understand the frequently used words in the tweets and their frequency.
Leverage collection and itertools package to iterate over the list and count frequency of the 20 most commonaly used words
Remove words which are used for search queries or which otherwise need to be removed and redo the above analysis-
Form a dataframe with the 20 most commonaly used words-
Plot the most commonaly used words-
Build a wordcloud from the most commonaly used words-
The wordcloud looks like below-
Next, we will find sentiments of each tween using TextBlob
Plot the sentiment of the tweets in the form of a histogram-
The tweets sentiments look like below-
Thanks for reading!
You must be logged in to post a comment.