Torbjorn Zetterlund

Sun 05 2019
Image

Wine Regions of the world with Google Data Studio

by bernt & torsten

What is better to do on a frisky spring day, than learning more about Google Data Studio and Colab Notebook. I set to find a dataset that would be interesting to do a report on and writeup a notebook with a word cloud.

To find a dataset to work on, I visit Kaggle and found one dataset wine posted by the Kaggle user zackthoutt. In the description for the Wine Review dataset zackthoutt described his work with that he wanted to create a predictive model to identify wines through blind tasting as a master sommelier would.

I thought this was an interesting dataset to work on and to tell a story on with a Jupyter notebook and do a Data Studio report.

I got started by downloading the dataset winemag-data-130k-v2.csv, my plan was to use the dataset to create a table in Google BigQuery from the dataset, I run into some limitations on Google BigQuery as the message I got when creating the table in BigQuery said that the file was too big. BigQuery only allows up to 10 MB from an external .csv file.

I had to change the approach, I uploaded the file to Google Cloud Storage, and then I used the cloud storage bucket where the file was stored to create the table in BigQuery.

Google Data Studio

I created a Google Data Studio report, I connected data studio to BigQuery, I also tried to use the Kaggle connector in Data Studio, in which I run into another limitation, it was size limitation in that the winemag-data-130k-v2.csv, was too big, the Data Studio Kaggle connector could only allow up to 20 MB file size. Kaggle user zackthoutt offered another file in Kaggle that was under 20 MB, so I used that winemag-data_first150k.csv to make the Data Studio to Kaggle connection work.

After that, I wrote my Jupyter Notebook, which is very self-explanatory. See the attached notebook at the end of this article.

The Dataset Details

The Kaggle user zackthoutt generated the dataset by scraping the WineEnthusiast website during the week of November 22nd, 2017. The code zackthoutt used for the scraper can be found here.

What’s Next

Some other weekend if the weather is not nice, I will add sentiment analysis using the Google Natural Language API. I may also create an AutoML model that can identify the variety, winery, and location of a wine based on a description.

Jupyter Notebook

Share: