Torbjorn Zetterlund

Wed 02 2022
Image

#datapipeline: YouTube Analytics to BigQuery with Airbyte

by bernt & torsten

You cannot simply rely on YouTube to derive maximum value for your business. You need a combination of YouTube and Google’s BigQuery to make more robust analytical insights and, in turn, drive increased revenue and growth to your YouTube channel.

This data pipeline will demonstrate how you can easily leverage Airbyte to do the job.

Why centralize YouTube data to a data warehouse?

Let’s look at why you might want to centralize your YouTube data into a data warehouse.

Out-of-the-box reporting is not sufficient 

YouTube has restrictions on reporting and dashboard capabilities, making it inflexible to use for different scenarios.

Why use Airbyte to extract YouTube data

Writing custom ETL scripts will slow down your project’s velocity. If these scripts are not well written, the risk is that constant care and time are required to devote to keeping these scripts running. 

Selecting a tool like Airbyte, you create automation, which allows you to ditch the complex hardcoded scripts that handle data wrangling and scheduling, enabling your teams to work efficiently. 

The Airbyte connectors are open-source and easily customizable, which gives you the advantage of seamlessly integrating your data that is residing across many business apps and databases in your data warehouse.

With Airbyte, you own your data, and no middleman service your data goes through, in Airbyte, your data is deduplicated and can be transformed on the fly based on custom business logic rules with SQL.

Other cool features of Airbyte are that Airbyte has built-in scheduling, orchestration, and monitoring., you can Airbyte scheduler enable data replication either fully or in an incremental fashion.

To get started with Airbyte

It would be best if you got started on extracting your YouTube data to your Google BigQuery.

  1. To deploy Airbyte, follow the simple instructions in the documentation here.
  2. You will need a YouTube Channel account.
  3. You also need a Google Cloud Platform to account with the BigQuery service enabled.
  4. Enable YouTube API in Google Cloud Platform

Enable YouTube API in Google Cloud Platform

To use the YouTube API you need to enable them in your Google Cloud Platform Project. I enabled YouTube Reporting API

Create credentials for OAuth client ID

To provide the credentials for the YouTube API, you would need to create credentials, you need to do this as Airbyte input for YoutTube requires OAuth Client Id, OAuth Secret ID and Refresh Token. here are the steps

Select Web Application as the application type.

To get the Refresh Token you use the Google Developers OAuth Playground Tool 

Follow the instructions in the tool, and enable the API you want as the scope in your Oauth.

Enter your personal Oauth credentials by clicking on the gear icon top right corner, and check to use your own OAuth credentials. Fukk in the Oauth Client ID and Secret you created in the Google Cloud Platform credentials.

When you have done all that – then click Authorize API, and you will get this screen.

Select the Account that you have associated with YouTube. Complete all the authentication steps and you will be back on this screen.

The first time you use this, you need to click on the button Exchange authorization code for tokens. What we want to use later in the Airbyte setup is the Refresh token. First, we will set up BigQuery.

Set up Google BigQuery

Create a BigQuery dataset and generate a credentials JSON file required to configure Airbyte. Login to your GCP account and go to BigQuery. 

Select Create a Dataset and give and add the required information. 

Next, go to the Service Account page and Create a new Service account. Enter the required information. Grant the service account the BigQuery Data Owner role.

Once the service account is created, go to Keys > Add Key and create a new key. You will be prompted to download a credentials file. Download the JSON version which will be used later.

Set up Youtube as your Airbyte source

Set up the connection for the source, which will be your Youtube details we created in a previous step. Under client_id, enter. Under client_secret, enter and under refresh_token, enter the refresh token obtained at the end of Step 1.

Set up BigQuery as your Airbyte destination

Next,  set up Airbyte to use BigQuery as the destination for the data replication. Enter your Client ID and Client Secret that you created when setting up your Oauth Client ID, and now it is time to use the Refresh Token that you previously created.

Create a YouTube to BigQuery connection

Once configured, a list of YouTube streams that data can be backed up.

Scroll through and select the Leads stream that contains the sample data.  

Note: You need to have billing enabled in your BigQuery service for normalization to work without errors.

Once configured, you can manually trigger a sync. Once complete, your data will be backed up to BigQuery.

Now you can see that many tables have been created in your dataset. You can preview data or make SQL queries on your YouTube tables in BigQuery.

Conclusion

Now that you have extracted your YouTube data to Google BigQuery, you can leverage the rich analytical capabilities of BigQuery to extract more insights from this data. You can also connect your BgQuery tables to Google Data Studio to make reports for your audience.

Share: