Simple WordCloud Using nltk Library in Python

In this article, we will build a wordcloud to show relative importance of the words. This is a tool that is very helpful in visualization of textual data such as customer comments, article, employee feedback etc. and saves valuable time in manually going through thousand and millions of lines of text.

We will be using a popular Python text processing library called “nltk” in this work. More on this library and how to use it can be found at the link below-

https://www.nltk.org/book/

So let’s get started with the simple coding to generate our first wordcloud.

Import Necessary packages

Define our text. You can either manually type the text or grab text from any pages such as Wiki etc. Please note that the multi-line text has to be put in the triple quotes.

Create the wordcloud object and remove the English stopwords. We need to remove English stopwords as they don’t add anything meaningful in the explanation and they are everyday words that we use. To know more about the stopwords, take a look at the section below the wordcloud

Plot the wordcloud-

Here is how the wordcloud looks-

English stopwords are-

{'on', "mightn't", 'yours', 't', 'myself', 'did', 'as', 'more', 'haven', 'very', 'are', 'again', 'both', 'against', "wasn't", 'some', 'does', "didn't", 'above', 'for', 'having', 'by', 'during', 'same', 'further', 'under', 'when', 'wasn', 'they', 'be', 'm', "doesn't", 'each', 'd', 'of', "that'll", 'you', 'herself', 'ours', 'doesn', "isn't", 'all', "won't", 'himself', "couldn't", 'which', 'have', 'that', 'before', 'into', 'so', 'your', 'who', 'these', 'then', 'she', 'didn', 'do', 'we', 'my', 'to', 'after', 'most', 'should', 'me', "haven't", 'them', 'once', 'with', 'has', "don't", 'other', "mustn't", 'were', 'from', "shouldn't", 'aren', 'this', "weren't", 'doing', 'mightn', "you'd", "you'll", 'he', 'or', 'between', "you've", 'own', 'weren', 'am', 'if', 'ourselves', 'about', 'no', 'ma', 'but', "she's", 'our', 'any', 'was', 'been', 'their', 'will', 'needn', 've', 'is', 'll', 'wouldn', 'through', 'what', 'shan', "shan't", 'hadn', "it's", 'why', 'won', 'itself', 'in', 'those', 'off', 'her', "should've", 'mustn', 'here', 'at', 'until', 'ain', 'nor', 'yourselves', 's', 'don', 'its', 'i', 'a', 'it', 'out', 'such', 'just', 're', "hadn't", 'themselves', 'had', 'an', 'shouldn', "needn't", 'y', 'yourself', 'than', 'not', 'the', "hasn't", 'where', 'up', 'hers', "you're", 'because', 'down', 'can', 'and', "aren't", 'his', 'now', 'theirs', 'while', 'being', 'only', 'below', 'hasn', 'o', 'whom', "wouldn't", 'isn', 'how', 'few', 'there', 'couldn', 'too', 'him', 'over'}

That’s it. Now you can generate wordcloud on any text of your choice in very simple Python codes. Thanks for reading!