Skip to Content
Previous

Store sensor data in Google Cloud Storage

By Amogh Kulkarni

Use Google Cloud Storage to store sensor data by using SAP Data Hub, trial edition.

Details

You will learn

  • How to store sensor data in Google Cloud Storage
  • How to use the operators GCS Consumer and GCS Producer

Please note that this tutorial is similar to the Store sensor data in HDFS tutorial from SAP Data Hub, developer edition tutorial group.


Step 1: Collect GCS Details

The SAP Data Hub, trial edition is deployed on Google Cloud Platform. Therefore we will use Google Cloud Storage for storing sensor data. For this purpose we need the following:

  • GCS Bucket Details
  • GCS JSON Key

If you don’t already have the JSON Key, refer the Getting Started with SAP Data Hub, trial edition guide, which contains step-by-step explanation to download the key and get the bucket details.

Step 2: Edit GCS Consumer and Producer Operators

In this pipeline, we would be using the operators - GCS Consumer and Producer. But for them to communicate with the Google Cloud Storage, we also have to provide the JSON key. We do this by uploading the key as an auxiliary file on both the operators.

From the Operators tab in the left menu pane, search for GCS Producer and right click on the operator and click on Edit

picture1

On the Operator Edit page, click on the Upload button (1). Then on the file upload dialog box, click on Browse (2) and select the key file (key.json) which we have discussed in Step 1 and click on Send (3). Once uploaded successfully, close the dialog box and return to the graph.

picture2

Similarly, from the Operators tab in the left menu pane, search for GCS Consumer and right click on the operator and click on Edit

picture3

Follow the above steps again but this time to upload the JSON key on to the GCS Consumer operator.

picture4
Step 3: Add and configure GCS Producer

Open the pipeline which you have created in the previous tutorial (test.myFirstPipeline), in the modelling environment (https://vhcalruntime/app/pipeline-modeler).

As the above URL is a local URL, it will be accessible only if you are doing the tutorials and have already configured the hosts file. If not, please refer to Getting Started with SAP Data Hub, trial edition guide.

Remove the connection between the Kafka Consumer 2 operator and the ToString Converter operator. Now drag and drop GCS Producer to the existing graph, and connect message output port of the Kafka Consumer2 to the inFile input port of the GCS Producer

picture5

Configure the GCS Producer operator by maintaining the following properties :

Field Name Value
projectID Value of project_id from the key.json file without the quotes. Example - xxx-xx-x-xxxx-xx. Second attribute in the file
bucket Name of the bucket we had earlier noted down
path sensordata/file_<counter>.txt

The GCS Producer will write the received data to files in the /sensordata directory in the specified GCS bucket. The files follow the scheme test_<counter>.txt (where counter is an incremental integer).

Step 4: Add and configure GCS Consumer

Now drag and drop GCS Consumer to the existing graph (test.myFirstPipeline). Then connect outFile output port of the GCS Consumer to the inMessage input port of the ToString Converter

picture6

Configure the GCS Consumer operator by maintaining the following properties :

Field Name Value
projectID Value of project_id from the key.json file without the quotes. Example - xxx-xx-x-xxxx-xx. Second attribute in the file
bucket Name of the bucket we had earlier noted down
path sensordata/
onlyReadOnChange true

We specify only the path without the file names because the Consumer operator will listen to any new file being created in the given path and access it as soon as it finds one. Also if any file is changed, the Consumer operator reads it. This task is performed because of the onlyReadOnChange attribute that we set above.

Afterwards click Save.

Step 5: Execute the data pipeline

Click Run to execute the pipeline

When the Status tab indicates that the pipeline is running, use the context menu Open UI of the Terminal operator to see the generated sensor data.

In contrast to the previous tutorial, this time the generated sensor data is not sent from the Kafka Consumer 2 operator to the Terminal operator directly, but via GCS. Hence the Terminal also shows you information about the created files.

Open http://console.cloud.google.com and navigate to GCP Left menu > Storage > Browser > Your Bucket name > sensordata folder. The longer the pipeline runs, the more files you will find there.

picture7

Stop the pipeline by clicking Stop.


Next Steps

Updated 06/04/2018

Time to Complete

30 Mins

Beginner

Tags

Prerequisites

Next
Back to top