You will learn
You will learn how to load tables with parent-child relationship between data items, and then create and query hierarchies.
Hierarchical data structures define a parent-child relationship between different data items, providing an abstraction that makes it possible to perform complex computations on different levels of data.
You will learn how to load tables with parent-child relationship between data items, and then create and query hierarchies.
SAP Vora introduces support for hierarchies by enhancing Spark SQL to provide missing functionality. Extensions to Spark SQL support hierarchical queries that make it possible to define a hierarchical DataFrame and perform custom hierarchical UDFs on it.
For this tutorial the SAP Vora, developer edition, has already sample files preloaded into HDFS. You can see them by executing following statements in the host’s operating system as user vora
.
hdfs dfs -ls officers.csv addresses.csv
Similarly to loading sample CSV files you will use Zeppelin with predefined notebook here as well.
Once Zeppelin opens up in a new browser window, check it is Connected and if yes, then click on 3_Hierarchies
notebook.
Firstly load a table OFFICERS
with parent-child relationship between data items.
You can execute them as well from SQL Editor.
CREATE TABLE OFFICERS (id int, pred int, ord int, rank string)
USING com.sap.spark.vora
OPTIONS (
tableName "OFFICERS", paths "/user/vora/officers.csv");
SELECT * FROM OFFICERS
Secondly load a master data table ADDRESSES
that you will use later to join with the hierarchy.
You can do it as well from SQL Editor.
CREATE TABLE ADDRESSES (rank string, address string)
USING com.sap.spark.vora
OPTIONS (tableName "ADDRESSES", paths "/user/vora/addresses.csv");
SELECT * FROM ADDRESSES
Create a hierarchy view using an SQL statement. To create a hierarchy HV
, you use a table OFFICERS
that specifies the relations between the predecessors and successors of the hierarchy.
You can do it as well from SQL Editor.
CREATE VIEW HV AS SELECT * FROM HIERARCHY (
USING OFFICERS AS child
JOIN PARENT par ON child.pred = par.id
SEARCH BY ord ASC
START WHERE pred=0
SET node) AS H;
SELECT * FROM HV
Join the ADDRESSES
tables and HV
hierarchy view.
SAP Vora provides user-defined functions (UDFs) to be used with hierarchies.
Run the query that returns the rank of the descendants of the root. It uses is_parent(u,v)
and is_root(u)
hierarchy functions.
Return the address and the rank for the officers from level 2.