A tractor on Germany

How to build a RESTful API data pipeline with a Firebase Cloud Function and BigQuery

In this article, I will show how I build a data pipeline with a Google Cloud Function. My use case, in this case, was to use a RESTful API to get the data from an external system and create new entries from the data and store it in BigQuery.

Setting up


Let’s get into it. First, we have to create a Firebase project in the

[Firebase Console](https://console.firebase.google.com).

You must have the Firebase CLI installed. If you don’t have it install it with

 npm install -g firebase-tools

and then configure it with

firebase login

Set up your project by running

firebase use --add

and select the firebase project you had created.

Install dependencies in the functions directory

cd functions; 
npm install;

I have written my cloud function with javascript using node.js, the function reads an external RESTful API. To call the RESTful API I’m using the AXIOS, I also have to install firebase libraries and the BigQuery library. Here are the libraries I used.

npm install axios;
npm install firebase-functions;
npm install firebase-admin;
npm install @google-cloud/bigquery;
npm install http;

My package.json file looks like this you can copy this and run npm install

{
  "name": "gpi-noonreport-to-bigquery",
  "description": "This is a function that reads Emily API for noonreport table and store in BigQuery",
  "scripts": {
    "serve": "firebase serve --only functions",
    "shell": "firebase functions:shell",
    "start": "npm run shell",
    "deploy": "firebase deploy --only functions",
    "logs": "firebase functions:log",
    "build": "echo 'build script executed'",
    "lint": "eslint.js"
  },
  "dependencies": {
    "@google-cloud/bigquery": "^4.2.1",
    "axios": "^0.19.0",
    "firebase-admin": "^8.5.0",
    "firebase-functions": "^3.2.0"
  },
  "private": true,
  "engines": {
    "node": "8"
  },
  "devDependencies": {
    "eslint": "^6.6.0"
  }
}

This is the full code of my function, you could just copy it.

strict';

// Request Data From A URL
var axios = require('axios');
var https = require('https');

// Var Firebase Functions
var functions = require('firebase-functions');
const admin = require('firebase-admin');

// Initalise App
admin.initializeApp;

// Imports the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');

// Setting Timeout in Seconds - default is 1 minute
// The maximum value for timeoutSeconds is 540, or 9 minutes. Valid values for memory are:
// 128MB, 256MB, 512MB, 1GB, 2GB

const runtimeOpts = {
  timeoutSeconds: 300,
  memory: '512MB'
}
/**
 * Read a RESTFul API and create new entries in a BigQuery table
 */
exports.getRESTFULApitoBigQuery = functions
  .runWith(runtimeOpts)
  .region('europe-west1')
  .https.onRequest(async(req, res) => {
    try {
      // Intialise the read message and insert to BigQuery
      await createMessage();
      res.status(200).send("OK");
    }
    catch (error) {
      // Handle the error
      console.log(error);
      res.status(500).send(error);
    };
    //This Query inserts data after charges completed
    async function createMessage() {
      // The API Key
      let apikey = '<Your API Key>';

      // Table to get data from
      var apitable = '<your API function>';

      // Get the current date 
      var today = new Date();
      var dd = String(today.getDate()).padStart(2, '0');
      var mm = String(today.getMonth() + 1).padStart(2, '0'); //January is 0!
      var yyyy = today.getFullYear();

      today = '0000' + '-' + '00' + '-' + '00';
//      today = yyyy + '-' + mm + '-' + dd;

      var url = 'https://test.com/api/' + apitable + '?apikey=' + apikey + '&syncdate=' + today;

      // At request level
      const agent = new https.Agent({  
        rejectUnauthorized: false
      });

      //  axios.get(url)
      await axios.get(url, { httpsAgent: agent })
        .then(resp => {
          try {
            for(let artno in resp.data.noon_report) {
              console.log(console.log(JSON.stringify(resp.data)));
              // Create the BigQuery Row
                var rows = {
                  local_time: resp.data[artno].local_time || resp.data.report_date,
                  status: resp.data[artno].status,
                  location: resp.data[artno].location,
                  country: resp.data[artno].country         
                };
                insertBigQuery(rows);
              }
              console.log("For Loop end");
          }
          catch (error) {
            // Handle the error
            console.log(error);
            res.status(500).send(error);
          }
        })
    };

    //This Query inserts data after charges completed
    async function insertBigQuery(rows) {
        // Create a client
        const bigqueryClient = new BigQuery();
        //Make use of a dataset
        const datasetId = 'my-dataset';
        //Make use of a table
        const tableId = 'my-table';
        console.log("ROW TO INSERT " + JSON.stringify(rows));
        await bigqueryClient
        .dataset(datasetId)
        .table(tableId)
        .insert(rows)
        .catch(err => {
          if (err && err.name === 'PartialFailureError') {
              if (err.errors && err.errors.length > 0) {
                  console.log('Insert errors:');
                  err.errors.forEach(err => console.error(err));
              }
          } else {
              console.error('ERROR:', err);
          }
        });
        console.log("Inserted rows");
    };
    // End Function with status code 200
    //res.status(200).send("OK");
});

URL Request and SSL handshake

A few things about the code, the URL that is used are for a specific system, so you would need to change the following code for the URL that you are going to use.

var url = 'https://test.com/api/' + apitable + '?apikey=' + apikey + '&syncdate=' + today;

In my URL I’m requesting data from an https:// site, I did not write any code to handle the SSL certificate handshake, instead, I created an https agent, that will ignore the SSL certificate authentication, this is the code snippet.

// At request level
const agent = new https.Agent({  
    rejectUnauthorized: false
});

Node.js server-specific settings

Depending on the size of the data that you are requesting – you can see the timeout value – default is 1 minute, you can also alter the amount of memory you need in running your function. This is the code that you alter that.

// Setting Timeout in Seconds - default is 1 minute
// The maximum value for timeoutSeconds is 540, or 9 minutes. Valid values for memory are:
// 128MB, 256MB, 512MB, 1GB, 2GB

const runtimeOpts = {
  timeoutSeconds: 300,
  memory: '512MB'
}

Specify Region

In the code you can specify the region that you want your function to run in, this is done in the below code.

exports.getRESTFULApitoBigQuery = functions
  .runWith(runtimeOpts)
  .region('europe-west1')
  .https.onRequest(async(req, res)

Deploy your project using

firebase deploy

If you have many firebase cloud functions you need to specify the name of the cloud functions like this when you deploy

firebase deploy --only functions:getRESTFULApitoBigQuery

BigQuery

After you deploy your function and before you run it. You will need to create a BigQuery Dataset and a table – you can read https://cloud.google.com/bigquery/docs/datasets on how to setup.

In the cloud function code, you would need to change the following with the name of your dataset and table.

        //Make use of a dataset
        const datasetId = 'my-dataset';
        //Make use of a table
        const tableId = 'my-table';

Creating a BgQuery row, you would need to alter this code to match your URL response, depending on your data structure from the API call you make.

              // Create the BigQuery Row
                var rows = {
                  local_time: resp.data[artno].local_time || resp.data.report_date,
                  status: resp.data[artno].status,
                  location: resp.data[artno].location,
                  country: resp.data[artno].country         
                };

If you run the function the first time and you want the whole dataset from the RESTful API, in my cases I set the URL date to 0000-00-00

As BigQuery is a database that adds new records, you could create duplicates when testing your function, if that does happen you may want to empty your BigQuery table you can do it in many ways, you can read How to empty a table in BigQuery to learn how I do it.

To Run

As this is a function that is triggered by an HTTP request, you should go to the Firebase console and select functions from the left side menu. It will list your function and you will also see the https URL that you can copy and paste into another tab in your browser to run the function. After you run the function you can go to BigQuery and make a query to see if your data is there.


Posted

in

, , ,

by

Comments

One response to “How to build a RESTful API data pipeline with a Firebase Cloud Function and BigQuery”

  1. […] In my use cases, I often create custom data pipelines with Google Cloud Functions this link takes you to the article How to build a RESTful API data pipeline with a Firebase Cloud Function and BigQuery […]

Leave a Reply

Your email address will not be published. Required fields are marked *