"Bear's Den" Tafone Rock Karhunpesäkivi

Creating a Google Cloud Dataprep recipe with no newline delimited JSON file

This is to follow up on my previous article Detecting non newline delimited JSON file with Google Cloud Dataprep in this article I will show you how to create a Cloud Dataprep recipe that transforms the no newline delimited JSON file.

If you read the previous article, the data that I have to work with looks like this:

Creating Cloud Datprep Dataflow

The first step is to create a Cloud Dataprep flow, this going to be a simple flow in that we are going to use the imported dataset and we are going to transform the data by creating a recipe and add a few steps to create structured data that we will store in Google BigQuery. This is what we doing:

Creating a Cloud Datprep recipe

When you have imported your dataset, click on the icon and select Add New Recipe, when that is done a new icon has been created click on it and select edit recipe.

Split rows in Cloud Dataprep

The first step in your recipe would be to use splitrows, the important part is that it has to be the first step in the recipe. Here is a link to the documentation for Splitrows Transform

After you have added the first step, you can go about transforming your data. Cloud Dataprep is very easy to work with, by highlighting a data item, you get suggestions on what you could do, which is very handy.

Finishing up the Google Cloud recipe

You would need to go through your dataset and create the step to transform the data, in the end, you have a number of transformation steps as part of your recipe. Like this one.

All transformed, the last step is to decide on how you want to publish this data. Click the Run Job button on the top right corner.

Cloud Dataprep Run Job

You have two options you can use to publish your data – you can publish as a file to Cloud Storage or you can publish the data to a BigQuery Dataset table.

It’s very easy to set up, if you use BigQuery just make sure that you have a dataset defined, the rest Cloud Dataprep takes care of. In Cloud Dataprep for BigQuery, you can create a new table, with options as replacing the table on every run or to append to a current table.

Conclusion

This is a simple example of how you can use Cloud Dataprep, there is much more to Cloud Dataprep. Thank you for reading and comments are appreciated.


Posted

in

, , ,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *