WordClouds in Python about wines


Prerequisites

Install WordCloud

In [2]:
pip install wordcloud
Requirement already satisfied: wordcloud in /usr/local/lib/python3.6/dist-packages (1.5.0)
Requirement already satisfied: numpy>=1.6.1 in /usr/local/lib/python3.6/dist-packages (from wordcloud) (1.16.3)
Requirement already satisfied: pillow in /usr/local/lib/python3.6/dist-packages (from wordcloud) (4.3.0)
Requirement already satisfied: olefile in /usr/local/lib/python3.6/dist-packages (from pillow->wordcloud) (0.46)

Install all the necessary libraries

In [0]:
# Start with loading all necessary libraries
import numpy as np
import pandas as pd
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

import matplotlib.pyplot as plt
% matplotlib inline

For this example I'm using the wine review dataset from Kaggle.

You need to download the dataset from Kaggle, and then upload it to Google BigQuery.

First we have to define the use of bigquery

In [4]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
Authenticated

Understand the data structure

It's helpful to inspect schema and a sample of the data we're working with

In [6]:
%%bigquery --project <your gcp project> df
SELECT * FROM
  `thunderbeardesign.wine_reviews.winemag_data_130k_v2`
Out[6]:
id country description designation points price province region_1 region_2 taster_name taster_twitter_handle title variety winery
0 172 Chile "Dry, spicy aromas of tobacco, cedar, vanilla ... The 7th Generation Gran Reserva Estate Bottled 91 20.0 Loncomilla Valley Michael Schachner @wineschach G7 2012 The 7th Generation Gran Reserva Estate... Cabernet Sauvignon G7
1 834 Chile "Briary, spicy, mildly herbal berry aromas giv... Reserva Estate Bottled 86 10.0 Loncomilla Valley Michael Schachner @wineschach G7 2012 Reserva Estate Bottled Cabernet Sauvig... Cabernet Sauvignon G7
2 2772 Chile "Woody, creamy aromas of black fruits precede ... The 7th Generation Reserva 84 15.0 Loncomilla Valley Michael Schachner @wineschach G7 2013 The 7th Generation Reserva Carmenère (... Carmenère G7
3 53782 Chile "Dry, spicy aromas of tobacco, cedar, vanilla ... The 7th Generation Gran Reserva Estate Bottled 91 20.0 Loncomilla Valley Michael Schachner @wineschach G7 2012 The 7th Generation Gran Reserva Estate... Cabernet Sauvignon G7
4 92373 Chile Rubbery blackberry aromas are dense and horsey... The 7th Generation 84 8.0 Central Valley Michael Schachner @wineschach G7 2015 The 7th Generation Cabernet Sauvignon ... Cabernet Sauvignon G7
5 108565 Chile "Aromas of papaya and melon are hit with a sli... The 7th Generation Estate Bottled 84 8.0 Central Valley Michael Schachner @wineschach G7 2015 The 7th Generation Estate Bottled Sauv... Sauvignon Blanc G7
6 108580 Chile "Scents of tropical of mango and banana are of... The 7th Generation 83 10.0 Central Valley Michael Schachner @wineschach G7 2015 The 7th Generation Chardonnay (Central... Chardonnay G7
7 57854 Germany "Forward, fruity swathes of peach and apple ve... Erik's Refreshing 86 14.0 Nahe Anna Lee C. Iijima RR 2013 Erik's Refreshing Riesling (Nahe) Riesling RR
8 10641 New Zealand "A light-bodied, delicate style of Pinot Noir,... 87 15.0 Marlborough Joe Czerwinski @JoeCz Ara 2009 Pinot Noir (Marlborough) Pinot Noir Ara
9 22588 New Zealand "Ready to drink, this Pinot is already losing ... Composite 86 22.0 Marlborough Joe Czerwinski @JoeCz Ara 2008 Composite Pinot Noir (Marlborough) Pinot Noir Ara
10 22589 New Zealand "This is slightly bigger and earthier than Ara... Pathway 86 15.0 Marlborough Joe Czerwinski @JoeCz Ara 2009 Pathway Pinot Noir (Marlborough) Pinot Noir Ara
11 45115 New Zealand "Sparkling Sauvignon Blanc—what more need be s... One Estate Brut 85 15.0 Marlborough Joe Czerwinski @JoeCz Ara NV One Estate Brut Sauvignon (Marlborough) Sauvignon Ara
12 62009 New Zealand "The Ara project has yet to realize its consid... Select Blocks Single Estate 86 30.0 Marlborough Joe Czerwinski @JoeCz Ara 2011 Select Blocks Single Estate Pinot Noi... Pinot Noir Ara
13 63360 New Zealand "This is a crisp, tightly focused offering, bu... Pathway Single Estate 88 13.0 Marlborough Joe Czerwinski @JoeCz Ara 2013 Pathway Single Estate Sauvignon Blanc... Sauvignon Blanc Ara
14 91280 New Zealand "Sparkling Sauvignon Blanc—what more need be s... One Estate Brut 85 15.0 Marlborough Joe Czerwinski @JoeCz Ara NV One Estate Brut Sauvignon (Marlborough) Sauvignon Ara
15 92030 New Zealand "Influenced by the herbal, savory side of Marl... Resolute 85 32.0 Marlborough Joe Czerwinski @JoeCz Ara 2007 Resolute Pinot Noir (Marlborough) Pinot Noir Ara
16 102021 New Zealand "Riper in style than Ara's Composite bottling,... Resolute 86 26.0 Marlborough Joe Czerwinski @JoeCz Ara 2009 Resolute Sauvignon Blanc (Marlborough) Sauvignon Blanc Ara
17 129954 New Zealand "One of the more characterful Pinot Gris for t... Single Estate 90 15.0 Marlborough Joe Czerwinski @JoeCz Ara 2013 Single Estate Pinot Gris (Marlborough) Pinot Gris Ara
18 903 Germany "Whiffs of smoke and nuts mingle into tart tan... 87 10.0 Nahe Anna Lee C. Iijima Bex 2014 Riesling (Nahe) Riesling Bex
19 57973 Germany "Blossomy and fresh, this off-dry, light-foote... 87 10.0 Nahe Anna Lee C. Iijima Bex 2013 Riesling (Nahe) Riesling Bex
20 47553 Bulgaria "A blend of Cabernet Sauvignon, Syrah and Rege... 88 16.0 Thracian Valley Jeff Jenssen @worldwineguys F2F 2014 Red (Thracian Valley) Red Blend F2F
21 98090 Bulgaria "This straw-colored Chardonnay has aromas of a... 89 16.0 Thracian Valley Jeff Jenssen @worldwineguys F2F 2015 Chardonnay (Thracian Valley) Chardonnay F2F
22 118120 Germany "Deeply savory tones of bramble and earth are ... 88 10.0 Landwein Rhein Anna Lee C. Iijima HXM NV Riesling (Landwein Rhein) Riesling HXM
23 10397 Austria "This is a tight, mineral-textured wine that s... Chevalier Reserve 91 45.0 Burgenland Roger Voss @vossroger Iby 2009 Chevalier Reserve Blaufränkisch (Burg... Blaufränkisch Iby
24 35631 Austria "Shimmering with the palest salmon pink, this ... 91 18.0 Burgenland Anne Krebiehl MW @AnneInVino Iby 2015 Rosé (Burgenland) Rosé Iby
25 70187 Austria "Shimmering with the palest salmon pink, this ... 91 18.0 Burgenland Anne Krebiehl MW @AnneInVino Iby 2015 Rosé (Burgenland) Rosé Iby
26 72925 Austria Pure blueberry fruit is given extra lift by sp... Classic 91 18.0 Burgenland Anne Krebiehl MW @AnneInVino Iby 2014 Classic Blaufränkisch (Burgenland) Blaufränkisch Iby
27 90103 Austria "This is dry and firm—its tannins immediately ... Chevalier Reserve 89 45.0 Burgenland Roger Voss @vossroger Iby 2008 Chevalier Reserve Blaufränkisch (Burg... Blaufränkisch Iby
28 99423 Austria "Fleshy, almost overripe notes of cherry are p... Classic 88 15.0 Burgenland Anne Krebiehl MW @AnneInVino Iby 2015 Classic Blaufränkisch (Burgenland) Blaufränkisch Iby
29 101081 Austria "If you expect coquettish flirting you must go... Chevalier 91 35.0 Mittelburgenland Anne Krebiehl MW @AnneInVino Iby 2011 Chevalier Blaufränkisch (Mittelburgen... Blaufränkisch Iby
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
129941 81374 Italy "Aromas that recall jasmine, green apple and h... Casal di Serra 91 17.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Umani Ronchi 2015 Casal di Serra (Verdicchio ... Verdicchio Umani Ronchi
129942 81456 Italy "This is a fabulous take on Verdicchio from 30... Casal di Serra Vecchie Vigne 89 22.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Umani Ronchi 2004 Casal di Serra Vecchie Vigne... Verdicchio Umani Ronchi
129943 92811 Italy "Casal di Serra is a fantastic Italian white t... Casal di Serra 87 17.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Umani Ronchi 2009 Casal di Serra (Verdicchio ... Verdicchio Umani Ronchi
129944 101080 Italy "Notes of hawthorne, acacia and tropical fruit... Casal di Serra 91 17.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Umani Ronchi 2012 Casal di Serra (Verdicchio ... Verdicchio Umani Ronchi
129945 101519 Italy "Using fruit sourced from old vines, Umani Ron... Casal di Serra Vecchie Vigne 90 25.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Umani Ronchi 2009 Casal di Serra Vecchie Vigne... Verdicchio Umani Ronchi
129946 103152 Italy "Acacia flower, hawthorne and beeswax aromas a... Casal di Serra 88 17.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Umani Ronchi 2013 Casal di Serra (Verdicchio ... Verdicchio Umani Ronchi
129947 112544 Italy "Made from 40-year-old vines, this structured ... Vecchie Vigne 88 35.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Umani Ronchi 2012 Vecchie Vigne (Verdicchio d... Verdicchio Umani Ronchi
129948 115725 Italy "One third of Umani Ronchi's celebrated Plenio... Plenio Riserva 89 22.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Umani Ronchi 2003 Plenio Riserva (Verdicchio ... Verdicchio Umani Ronchi
129949 126681 Italy "Fruit for Casal di Serra is selected from the... Casal di Serra 85 14.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Umani Ronchi 2006 Casal di Serra (Verdicchio ... Verdicchio Umani Ronchi
129950 119010 Italy "The nose is inexpressive while the lean, rath... Il Priore 85 24.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Frati Bianchi 2014 Il Priore (Verdicchio dei ... Verdicchio Frati Bianchi
129951 122389 Italy "The nose is inexpressive while the lean, rath... Il Priore 85 24.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Frati Bianchi 2014 Il Priore (Verdicchio dei ... Verdicchio Frati Bianchi
129952 97677 Italy "This opens with aromas that recall acacia flo... Luzano 87 15.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Marotti Campi 2014 Luzano (Verdicchio dei Cas... Verdicchio Marotti Campi
129953 116648 Italy "This opens with aromas that recall acacia flo... Luzano 87 15.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Marotti Campi 2014 Luzano (Verdicchio dei Cas... Verdicchio Marotti Campi
129954 90000 Italy "This opens with pretty floral aromas of white... Pallio di San Floriano 87 18.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Monte Schiavo 2012 Pallio di San Floriano (Ve... Verdicchio Monte Schiavo
129955 5577 Italy "Italy's first cru Verdicchio, Le Moie offers ... Le Moie 87 20.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Fazi Battaglia 2005 Le Moie (Verdicchio dei C... Verdicchio Fazi Battaglia
129956 9823 Italy "This is one of the producer's top wines and i... Massaccio 89 30.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Fazi Battaglia 2004 Massaccio (Verdicchio dei... Verdicchio Fazi Battaglia
129957 81470 Italy "This is one of the producer's top wines and i... Massaccio 89 30.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Fazi Battaglia 2004 Massaccio (Verdicchio dei... Verdicchio Fazi Battaglia
129958 88142 Italy "Fresh, zippy and vibrant in the mouth, Bucci'... 86 26.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Fratelli Bucci 2007 Verdicchio dei Castelli d... Verdicchio Fratelli Bucci
129959 20795 Italy "Here's a straightforward Verdicchio offering ... Tosius 86 14.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Cantina Ma.Ri.Ca. 2012 Tosius (Verdicchio dei... Verdicchio Cantina Ma.Ri.Ca.
129960 97672 Italy "This opens with subdued aromas of citrus, nut... 87 24.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Stefano Antonucci 2014 Verdicchio dei Castell... Verdicchio Stefano Antonucci
129961 25090 Italy "Aromas of toasted oak, butterscotch and hazel... Gaiospino Fumé 87 38.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Fattoria Coroncino 2012 Gaiospino Fumé (Verdi... White Blend Fattoria Coroncino
129962 28711 Italy "This unusual, deeply golden colored wine does... Gaiospino 87 30.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Fattoria Coroncino 2013 Gaiospino (Verdicchio... White Blend Fattoria Coroncino
129963 46511 Italy "Deeply colored gold, this opens with scents s... Il Bacco 86 16.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Fattoria Coroncino 2015 Il Bacco (Verdicchio ... Verdicchio Fattoria Coroncino
129964 46512 Italy "The nose is rather shy but eventually reveals... 86 15.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Fattoria La Vialla 2014 Verdicchio dei Castel... Verdicchio Fattoria La Vialla
129965 55432 Italy "Aromas of yellow stone fruit and Spanish broo... Villa Torre 89 14.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Tenuta di Tavignano 2015 Villa Torre (Verdicc... Verdicchio Tenuta di Tavignano
129966 77672 Italy "This opens up with a heady fragrance of apric... Villa Torre 88 14.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Tenuta di Tavignano 2012 Villa Torre (Verdicc... Verdicchio Tenuta di Tavignano
129967 89756 Italy "Aromas of yellow stone fruit and Spanish broo... Villa Torre 89 14.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Tenuta di Tavignano 2015 Villa Torre (Verdicc... Verdicchio Tenuta di Tavignano
129968 99961 Italy "This lovely Verdicchio opens with an alluring... Misco 89 18.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Tenuta di Tavignano 2012 Misco (Verdicchio de... Verdicchio Tenuta di Tavignano
129969 110461 Italy "This opens up with a heady fragrance of apric... Villa Torre 88 14.0 Central Italy Verdicchio dei Castelli di Jesi Classico Super... Kerin O’Keefe @kerinokeefe Tenuta di Tavignano 2012 Villa Torre (Verdicc... Verdicchio Tenuta di Tavignano
129970 68546 Spain "The palate on this Tempranillo has smacking a... Viñas Viejas 86 12.0 Northern Spain Vino de la Tierra Ribera del Gállego-Cinco Villas Michael Schachner @wineschach Evohé 2011 Viñas Viejas Tempranillo (Vino de l... Tempranillo Evohé

129971 rows × 14 columns

Taking a look at the first 5 rows of the dataset

In [7]:
# Looking at first 5 rows of the dataset
df.head()
Out[7]:
id country description designation points price province region_1 region_2 taster_name taster_twitter_handle title variety winery
0 172 Chile "Dry, spicy aromas of tobacco, cedar, vanilla ... The 7th Generation Gran Reserva Estate Bottled 91 20.0 Loncomilla Valley Michael Schachner @wineschach G7 2012 The 7th Generation Gran Reserva Estate... Cabernet Sauvignon G7
1 834 Chile "Briary, spicy, mildly herbal berry aromas giv... Reserva Estate Bottled 86 10.0 Loncomilla Valley Michael Schachner @wineschach G7 2012 Reserva Estate Bottled Cabernet Sauvig... Cabernet Sauvignon G7
2 2772 Chile "Woody, creamy aromas of black fruits precede ... The 7th Generation Reserva 84 15.0 Loncomilla Valley Michael Schachner @wineschach G7 2013 The 7th Generation Reserva Carmenère (... Carmenère G7
3 53782 Chile "Dry, spicy aromas of tobacco, cedar, vanilla ... The 7th Generation Gran Reserva Estate Bottled 91 20.0 Loncomilla Valley Michael Schachner @wineschach G7 2012 The 7th Generation Gran Reserva Estate... Cabernet Sauvignon G7
4 92373 Chile Rubbery blackberry aromas are dense and horsey... The 7th Generation 84 8.0 Central Valley Michael Schachner @wineschach G7 2015 The 7th Generation Cabernet Sauvignon ... Cabernet Sauvignon G7

Basic information about the dataset

In [8]:
print("There are {} observations and {} features in this dataset. \n".format(df.shape[0],df.shape[1]))

print("There are {} types of wine in this dataset such as {}... \n".format(len(df.variety.unique()),
                                                                           ", ".join(df.variety.unique()[0:5])))

print("There are {} countries producing wine in this dataset such as {}... \n".format(len(df.country.unique()),
                                                                                      ", ".join(df.country.unique()[0:5])))
There are 129971 observations and 14 features in this dataset. 

There are 708 types of wine in this dataset such as Cabernet Sauvignon, Carmenère, Sauvignon Blanc, Chardonnay, Riesling... 

There are 44 countries producing wine in this dataset such as Chile, Germany, New Zealand, Bulgaria, Austria... 

This selects the top 5 highest wines by points among all 44 countries

In [9]:
df[["country", "description","points"]].head()
Out[9]:
country description points
0 Chile "Dry, spicy aromas of tobacco, cedar, vanilla ... 91
1 Chile "Briary, spicy, mildly herbal berry aromas giv... 86
2 Chile "Woody, creamy aromas of black fruits precede ... 84
3 Chile "Dry, spicy aromas of tobacco, cedar, vanilla ... 91
4 Chile Rubbery blackberry aromas are dense and horsey... 84

The first 5 countries points and price

In [10]:
# Groupby by country
country = df.groupby("country")

# Summary statistic of all countries
country.describe().head()
Out[10]:
id points price
count mean std min 25% 50% 75% max count mean ... 75% max count mean std min 25% 50% 75% max
country
63.0 66115.222222 39339.428129 913.0 37176.00 60678.0 99332.00 129900.0 63.0 88.634921 ... 90.00 92.0 59.0 24.593220 9.084125 6.0 19.00 25.0 30.00 50.0
Argentina 3800.0 65075.723158 38080.749866 16.0 30912.25 65171.5 99149.25 129948.0 3800.0 86.710263 ... 89.00 97.0 3756.0 24.510117 23.430122 4.0 12.00 17.0 25.00 230.0
Armenia 2.0 37158.000000 40995.222746 8170.0 22664.00 37158.0 51652.00 66146.0 2.0 87.500000 ... 87.75 88.0 2.0 14.500000 0.707107 14.0 14.25 14.5 14.75 15.0
Australia 2329.0 65037.539287 37271.655310 77.0 34120.00 64984.0 96027.00 129726.0 2329.0 88.580507 ... 91.00 100.0 2294.0 35.437663 49.049458 5.0 15.00 21.0 38.00 850.0
Austria 3345.0 65614.296861 37455.323032 93.0 32623.00 67701.0 98385.00 129939.0 3345.0 90.101345 ... 92.00 98.0 2799.0 30.762772 27.224797 7.0 18.00 25.0 36.50 1100.0

5 rows × 24 columns

This selects the top 5 highest average points among all 44 countries

In [11]:
country.mean().sort_values(by="points",ascending=False).head()
Out[11]:
id points price
country
England 72597.756757 91.581081 51.681159
India 68964.000000 90.222222 13.333333
Austria 65614.296861 90.101345 30.762772
Germany 65787.590762 89.851732 42.257547
Canada 70582.365759 89.369650 35.712598

Plot the number of wines by country

In [12]:
plt.figure(figsize=(15,10))
country.size().sort_values(ascending=False).plot.bar()
plt.xticks(rotation=50)
plt.xlabel("Country of Origin")
plt.ylabel("Number of Wines")
plt.show()

Quantity over quality?

In [13]:
plt.figure(figsize=(15,10))
country.max().sort_values(by="points",ascending=False)["points"].plot.bar()
plt.xticks(rotation=50)
plt.xlabel("Country of Origin")
plt.ylabel("Highest point of Wines")
plt.show()

Getting info about the wordcloud library

In [0]:
?WordCloud

WordCloud by describing words

In [15]:
# Start with one review:
text = df.description[0]

# Create and generate a word cloud image:
wordcloud = WordCloud().generate(text)

# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Wordcloud formatting

In [16]:
# lower max_font_size, change the maximum number of word and lighten the background:
wordcloud = WordCloud(max_font_size=50, max_words=100, background_color="white").generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
In [17]:
# Save the image in the img folder:
wordcloud.to_file("first_review.png")
Out[17]:
<wordcloud.wordcloud.WordCloud at 0x7fe58e357d30>

How many words of all reviews

In [18]:
text = " ".join(review for review in df.description)
print ("There are {} words in the combination of all review.".format(len(text)))
There are 31914049 words in the combination of all review.

Stopword List

In [20]:
# Create stopword list:
stopwords = set(STOPWORDS)
stopwords.update(["drink", "now", "wine", "flavor", "flavors", "finish"])

# Generate a word cloud image
wordcloud = WordCloud(width=400, height=200, background_color="white", max_words=1000, stopwords=stopwords, contour_width=3, contour_color='firebrick').generate(text)

# Display the generated image:
# the matplotlib way:
plt.figure(figsize=[20,10])
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()