Skip to Content
Previous

Working with Hierarchies using Apache Zeppelin

By Vitaliy Rudnytskiy

SAP Vora 1.4: Hierarchical data structures define a parent-child relationship between different data items, providing an abstraction that makes it possible to perform complex computations on different levels of data

You will learn

You will learn how to load tables with parent-child relationship between data items, and then create and query hierarchies.

Details


Step 1: Hierarchies

Hierarchical data structures define a parent-child or level relationship between different data items, providing an abstraction that makes it possible to perform complex computations on different levels of data.

An organization, for example, is basically a hierarchy where the connections between nodes (for example, manager and developer) are determined by the reporting lines that are defined by that organization.

Since it is very difficult to use standard SQL to work with and perform analysis on hierarchical data, Spark SQL has been enhanced to provide missing hierarchy functionality. Extensions to Spark SQL support hierarchical queries that make it possible to define a hierarchical DataFrame and perform custom hierarchical UDFs on it. This allows you, for example, to define an organization’s hierarchy and perform complex aggregations, such as calculating the average age of all second-level managers or the aggregate salaries of different departments.

Since the SAP Vora 1.4 execution engine supports hierarchies, support has been added for pushing down hierarchical queries to SAP Vora using the data source implementation.

Step 2: Running notebook 2_Hierarchies

Start by selecting the 2_Hierarchies notebook. If your developer edition is missing this notebook, then proceed to the next tutorial.
Hierarchies notebook

Run the following paragraphs.
First paragraphs in Hierarchies notebook

Take a look at the hierarchy structure. Then create a new fact table.
Fact table

Use the fact table to join with a hierarchy.
Join with hierarchy

Step 3: Level Hierarchies

Scroll down the note book until you reach “Create Level Hierarchy” paragraph.

An alternative way of creating a hierarchy is by mapping hierarchy levels to source table columns. This type of hierarchy creation is particularly useful when the source table is actually a flattened hierarchy where all hierarchy paths are encoded as rows.

You can work with the resulting hierarchy as with adjacency-list hierarchies.
Level hierarchy

Select data by hierarchy level (4 in this example).
Select by level

Next Steps

Next
Back to top