CX Works

CX Works brings the most relevant leading practices to you.
It is a single portal of curated, field-tested and SAP-verified expertise for SAP Customer Experience solutions

Get the Most out of Your Cloud Hot Folders and Azure Blob Storage

22 min read

Get the Most out of Your Cloud Hot Folders and Azure Blob Storage

Hot Folders is a feature allowing you to integrate files in SAP Commerce quickly and efficiently. Traditionally, this has required you to provide a local or a shared directory where you push your files through SFTP (secure file transfer protocol). With SAP Commerce Cloud in the Public Cloud, the Cloud Hot Folders replaces the classic Hot Folders, and Azure Storage replaces local or shared directories. With SAP Commerce Cloud in the Public Cloud, you are pushing blobs to Cloud Hot Folders instead of files. For more on the architecture of Cloud Hot Folders, see the "Cloud Hot Folders" section in SAP Commerce Cloud Architecture as well as the product documentation.

In this article, we will explain how you can migrate your connectivity from the Hot Folders to the Cloud Hot Folders and what are the different ways to connect and push files/blobs to the Cloud Hot Folders. We will also explain how to emulate the Azure Storage locally, which could be very useful for developers. Finally, we will explain how to upload product media/images to SAP Commerce Cloud using the Cloud Hot Folders.

Table of Contents


Migrate the Connectivity from On-Premise to Cloud Hot Folders

To push/read data to/from the On-Premise Hot Folders, the classical options are:

  • Use FTP/SFTP to transfer files 
  • Use NFS driver

Since Cloud Hot Folders are using Azure Blob Storage, the options above are no longer available.

In order to migrate the connectivity from On-Premise to the Azure Blob Storage, the new options are:

  1. Explorers for Blob Storage
  2. AzCopy Command Line Tool
  3. Blob Services REST API
  4. Blob Storage SDK
  5. Blobfuse Virtual System Driver
  6. Azure CLI

However, it's always possible to create a bridge system that can receive files through SFTP and transfer them to the Azure Blob Storage. If this is critical, there are third-party systems that provide paid services for this feature (for example, https://docevent.io/ ). For the purpose of this article, we will not consider these 3rd party options. It is also possible to create your own such bridge using Azure CLI scripts.

We will now go through all of the options above and will describe the pros and cons for each. 

Connect to Azure Cloud Hot Folders

Explorers for Blob Storage

Similarly, to SFTP clients like FileZilla, there are several clients/explorers for the Azure Blob Storage.

To manage blobs, Microsoft proposes:

  • A web portal: Microsoft Azure Portal (not in scope, because access isn't available for SAP Commerce Cloud customers)
  • A client: Microsoft Azure Storage Explorer

The below screenshot is Microsoft's Azure Storage Explorer. The export folder (on the left side) contains some zip files (on the right side).


For more information about the Client Tools, please read: https://docs.microsoft.com/en-us/azure/storage/common/storage-explorers


Pros Cons

Usability. The Client Tool is very easy to use for manual operations.

Automation. Pushing or reading blobs automatically is not possible as the Client is a GUI (Graphical User Interface) Tool.


AzCopy Command Line Tool

AzCopy is a command line tool designed to copy data from/to Azure Storage and File Storage. AzCopy is able to copy data from File Storage to Azure Storage or from Azure Storage to File Storage.

AzCopy command is used as follows:

azcopy copy "[source]" "[destination]?sv=SAStoken" --recursive=true [Options]

Source is the source file or folder/container, could be File Storage or Blob Storage.

Target is the target file or folder/container, could be File Storage or Blob Storage.

For more information about using AzCopy see the CX Works Migrate Media with AzCopy article. In addition you can learn about the supported options and command details for AzCopy by reviewing the documentation: https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux.


Pros Cons

Usability. Simple commands to copy Blobs from/to Azure Storage.

Automation. Pushing or reading Blobs automatically is not possible (or a custom tool must be implemented).

Usability. Simpler than Azure CLI, if it meets the use case needs.


Compatibility. AzCopy tool is available for Linux, MacOs and Windows.

Upgrade. Keep the executable in sync each time a new version is released.

Blob Services REST API

The Azure Blob Services offer several REST operations through the HTTP protocol. Below are some examples of the offered operations:

  • List Containers
  • Create Container
  • Delete Container
  • List Blobs
  • Get Blob
  • Create Blob
  • Delete Blob
  • and more

For more details, please read https://docs.microsoft.com/en-us/rest/api/storageservices/blob-service-rest-api.


Pros Cons

Compatibility. HTTP is a standard protocol, compatible across systems/platforms.

Synchronous. Operations.

Support. No upgrade needed, the last API version is used.

Usability. Not very practical to use. A custom tool should be implemented to consume the REST operations.


Blob Storage SDK

The other option Azure proposes, is the SDK or Client API. SDK is available for several programming languages:


Pros Cons

Asynchronous. The Java SDK provides asynchronous operations.

Upgrade. Each time a new version is released, an upgrade is needed.

Integrates. with SAP Commerce Cloud, which is already using the Java SDK.

Usability. To perform basic operations (add blob, remove blob), custom code is needed.

Blobfuse Virtual System Driver

Blobfuse is a virtual file system driver that allows you access to the Azure Blobs through the Linux file system. Blobfuse is using Blob Service Rest API's to translate the basic operations (read, write, list, and more).

First, install Blobfuse. Then, your Blob container can be mounted in the folder of your choice.

For more information on how to install Blobfuse, mount a container, and perform read/write operations, please refer to: https://github.com/Azure/azure-storage-fuse.


For performance reasons, files are cached in the Blobfuse temporary directory. If a Blob is modified in the Azure Storage, Blobfuse will wait the cache timeout to download the latest version.

Pros Cons

Usability. Blob containers can be accessed as a directory. Standard file system operations like ls, copy can be used.

Reading Delay. Files are cached. If the Blob is modified in the Azure Storage, the latest version will be downloaded to Blobfuse after a timeout.


Non Optimized Update. To update a Blob, Blobfuse downloads the entire file to the local cache, modifies it, and then uploads it to Azure Storage.


Concurrent Write. If multiple nodes are trying to write the same file, the last writer will win.


Limitations. Some operations are not supported by Blobfuse; symbolic links, permission, synchronization (readlink, symlink, link, chmod, chown, fsync, lock).


Azure CLI

Azure CLI is a command line interface for managing Azure Subscription including Azure Storage. Install latest Azure CLI for your appropriate operating system via,  Install the Azure CLI . Azure CLI scripts can be a powerful way to connect to Cloud Hot Folder and automate ETL integrations using the Cloud Hot Folders and other interfacing systems.

Two environment variables viz. AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_KEY  need to be set up in your script (or shell environment in interactive mode) to access Cloud Hot Folders. You can obtain those values from Commerce Cloud Portal (only authorized users can see/get)

Following operations are allowed in the script and/or interactive mode via Azur CLI. Please refer to  https://docs.microsoft.com/en-us/azure/storage/common/storage-azure-cli  for all specific applicable command details. Since Commerce Cloud in the Public Cloud only provides Role Based Access to Azure Blob Storage (for authorized users) and not to the Azure Subscription or other types of Azure Storage types, only those commands/accesses are applicable.

  • Create/Manage/List blobs and/or container
  • Upload/Download blobs in container 
  • Copy/Delete blobs
  • Set content type
  • and many more

Example Azure CLI sample script. 

The below script first creates a new container in your storage account, then uploads an existing file (as a blob) to that container. It then lists all blobs in the container, and finally, downloads the file to a destination on your server/computer that you run Azure CLI on. Please replace <placeholder text> with appropriate content.

#!/bin/bash
# A simple Azure Storage example script

export AZURE_STORAGE_ACCOUNT=<storage_account_name>
export AZURE_STORAGE_KEY=<storage_account_key>

export container_name=<container_name>
export blob_name=<blob_name>
export file_to_upload=<file_to_upload>
export destination_file=<destination_file>

echo "Creating the container..."
az storage container create --name $container_name

echo "Uploading the file..."
az storage blob upload --container-name $container_name --file $file_to_upload --name $blob_name

echo "Listing the blobs..."
az storage blob list --container-name $container_name --output table

echo "Downloading the file..."
az storage blob download --container-name $container_name --name $blob_name --file $destination_file --output table

echo "Done"


To check that you have successfully connected to Cloud Hot Folders, issue a command  az storage blob list --output table -c hybris.  You should see tabular output of your blob storage as below.

ShellCommandPrompt$ az storage blob list --output table -c hybris


ShellCommandPrompt$


Note:  If you create a new container with a name other than  hybris , it will be outside of the Cloud Hot Folder processes.  All security best practices and governance needs to be followed for script automation to be secure. Follow the conventions for naming in Azure Blob Storage (hybris/master/hotfolder) for commands to be meaningful for the Commerce in the Public Cloud.


Pros Cons

Usability. More robust commands and options available than just AzCopy. Different and unique automation use case scenarios can be realized.

Usability. For interactive one-time operations with Azure Blob Storage, it is easier/similar to use AzCopy.

Automation. Use cases are limitless with shell scripting, within the bounds that Azure Subscription Login is not available.

Automation. Scripts needs to be written and tested/retested and secured to realize business value.

Automation. Monitoring on other systems can be scripted coupled with Azure Blob Storage management and Cloud Hot Folders.

Limitation. Not all commands from Azur CLI are available because Azure Subscription Login is not available. If a customer has multiple Azure Cloud Subscription(s) outside of Commerce Cloud in the Public Cloud, the same limitation may limit some unique automation use case realization.

Compatibility. Azure CLI is available for Linux, MacOS and Windows and can also be run in Docker Containers.

Upgrade. Each time a new version of Azure CLI is released, an upgrade is needed .


Run Blob Storage Locally

Running a Blob Storage locally could be very useful, especially during development. Reading/Writing data to Blob Storage during feature implementation could be easier when you have a dedicated Blob Storage, so you don't run into conflicts. That said, a dedicated Azure Blob Storage instance for each developer is not possible with SAP Commerce Cloud. You could, however, create and support your own Blob storage separate from your SAP Commerce Cloud instance. We have found running a local Blob Storage is a good compromise for development. For more details on using Azurite see for local testing see product documentation. This section provides a general overview and examples that build off the documentation.

You can also use the Microsoft Blob Storage emulator, however, it's compatible with Microsoft Windows only :   https://docs.microsoft.com/en-us/azure/storage/common/storage-use-emulator.


  1. Install and Run the Emulator as per the product documentation.

  2. Connect to Local Blob Storage. On the left navigation panel, right click on "Storage Accounts".

  3. Select "Attach to local emulator" and click "Next"
  4. Under "Storage Accounts", a "local" storage should appear.

Trouble Shooting Cloud Hot Folders

Once you have your Cloud Folders setup and deployed your SAP Commerce Cloud solution, you may be faced with the situation where Cloud Hot Folders does not process a file dropped into a Cloud Hot Folder. There may not always be enough information to go on, so what do you do next? This section walks you through troubleshooting steps to get your Cloud Hot Folders processing files. Most of the information here is focused on troubleshooting in your SAP Commerce Cloud environment, however the information can also be applicable to a local development environment.

Which Node is Cloud Hot Folders Running On?

Cloud Hot Folders should be configured to run only on the backgroundProcessing node. See Cloud Hot Folders > Technical Details for more information along with Service Control

connection-string

One of the first things to check is if your Commerce Cloud instance can communicate with Cloud Hot Folders. This is configured by the azure.hotfolder.storage.account properties, specifically the connection-string. You can set the connection-string property in local.properties however it gets overridden by the build process. 

local.properties - azure.hotfolder.storage.account
azure.hotfolder.storage.account.connection-string=DefaultEndpointsProtocol=https;AccountName=abcd12345;AccountKey=123456gfedcba==;BlobEndpoint=https://abcd12345.blob.core.windows.net;
azure.hotfolder.storage.account.name=abcd12345

The Cloud deployment process sets the EndpointSuffix which may not work. Cloud Hot Folders cannot establish a connection to the blob container with this value–it needs to have the BlobEndpoint set. When you check in your Commerce Cloud environment's hybris Admin Console (hAC) it has EndpointSuffix vs. BlobEndpoint

hac
azure.hotfolder.storage.account.connection-string=DefaultEndpointsProtocol=https;AccountName=wxyz1234;AccountKey=7654321hgfdsa==;EndpointSuffix=core.windows.net

A symptom of this is that when SAP Commerce Cloud starts up it should create the blob container (if it doesn't already exist). You can use your local workspace to verify the connection-string. Change the connection-string in your local.properties and start your local Commerce instance. You can also change the container name to make it obvious, the container will get created on startup and if the connection-string is valid it will create the container. For example set:

  • azure.hotfolder.storage.container.name=local-wsp-to-remote

If it cannot connect it should throw a StorageException and/or an IOException.

To override the connection-string property in your Commerce Cloud environment you will need to use the Cloud Portal and set the property in Services as follows:

  • Commerce Cloud Portal > Environments > d1 > Services

    This needs to be done in each of your cloud environments

  • Select "Background Processing"
  • Select Properties
  • Set properties here
  • Press Save
  • When it applies the changes it will restart the Pod(s)
  • Use hac to check that the property value was set.

cluster.node.groups

The build manifest file (manifest.json) should have the property cluster.node.groups=integration,yHotfolderCandidate on the backgroundProcessing node (aspect):

  • integration - Controls the startup of the Spring integration thread pools.
  • yHotFolderCandidate - Starts the leadership contention. If the node is elected, it starts SmartLifecycles with a role of yHotfolderServices.

The property cluster.node.groups=integration,yHotfolderCandidate should only be set in backgroundProcessing in manifest.json.

manifest.json - cluster.node.groups
"aspects": [
	{
		"name": "backgroundProcessing",
			"properties": [
				{
					"key": "cluster.node.groups",
					"value": "integration,yHotfolderCandidate"
				}
			],
	}
]

Double check in hac on each node. This is important since it will initiate the startup process for these beans and services.

azureChannelAdapterTaskExecuter

Make sure that you are using ThreadPoolExecutor$CallerRunsPolicy for the rejectedExecutionHandler in the azureChannelAdapterTaskExecutor bean (see azurecloudhotfolder-spring.xml). If you use ThreadPoolExecutor$DiscardPolicy your thread will be discarded and therefore unable to process Cloud Hot Folders.

Do NOT use ThreadPoolExecutor$DiscardPolicy, this may result in the thread(s) for polling Cloud Hot Folders being discarded.  Make sure to use ThreadPoolExecutor$CallerRunsPolicy.

bean - azureChannelAdapterTaskExecutor
<bean id="azureChannelAdapterTaskExecutor" 
   class="de.hybris.platform.cloud.commons.scheduling.HybrisAwareThreadPoolTaskExecutor">
   <property name="waitForTasksToCompleteOnShutdown" value="true"/>
   <property name="threadNamePrefix" value="AzureIntegrationTaskExecutorThread-${tenantId}-"/>
   <property name="threadGroupName" value="AzureIntegrationTaskExecutor-${tenantId}"/>
   <property name="corePoolSize" value="${azure.hotfolder.storage.polling.core-pool-size}"/>
   <property name="maxPoolSize" value="${azure.hotfolder.storage.polling.max-pool-size}"/>
   <property name="queueCapacity" value="-1"/>
   <property name="keepAliveSeconds" value="60"/>
   <property name="rejectedExecutionHandler">
      <bean class="java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy"/>
   </property>
   <property name="role" value="integration"/>
   <property name="autoStartup" value="false"/>
   <property name="phase" value="10"/>
   <property name="awaitTerminationSeconds" value="60"/>
</bean>

Logging

On Startup there isn't much information logged unless you change the log level. You can search for azureChannelAdapterTaskExecutor:

log - azureChannelAdapterTaskExecutor
INFO | jvm 1 | main | 2020/03/05 08:52:21.649 | INFO [localhost-startStop-16] [HybrisAwareThreadPoolTaskExecutor] Initializing ExecutorService 'azureChannelAdapterTaskExecutor'

This doesn't really tell us much though other than that the bean is initializing.

Thread

If the thread for polling Cloud Hot Folders is not present then files can't be processed.

bean - azureChannelAdapterTaskExecutor
<bean id="azureChannelAdapterTaskExecutor" 
   class="de.hybris.platform.cloud.commons.scheduling.HybrisAwareThreadPoolTaskExecutor">
   <property name="waitForTasksToCompleteOnShutdown" value="true"/>
   <property name="threadNamePrefix" value="AzureIntegrationTaskExecutorThread-${tenantId}-"/>
   ...etc...
</bean>

To check that the thread is present on the backgroundProcessing node take a thread dump and search for "AzureIntegrationTaskExecutorThread":

  • go to the backroundProcessing node
  • hac > Monitoring > Thread Dump
  • Search for "AzureIntegrationTaskExecutorThread"
  • If it is not present then files can't be processed.

ApplicationResourceLock

We had the situation where there was an old ApplicationResourceLock in the database which prevented the thread from being created on startup since it could not acquire the lock.

To check for an existing ApplicationResourceLock:

  • hac > Console
  • FlexibleSearch > Flexible Query 
    • select * from {ApplicationResourceLock}
    • Result: In some cases Flexible Query is not returning correct results, if it does not find any row(s) then execute a SQL Query to be sure.
  • FlexibleSearch > SQL Query
    • SELECT * FROM applicationresourcelock
    • Result: 1 row, key (example '7c67706a-d0d1-3e8a-b38e-1693a2906c22')
    • Save the key value for the next step.

To remove the old ApplicationResourceLock:

  • hac > Console
  • FlexibleSearch > SQL Query
    • DELETE FROM applicationresourcelock WHERE p_lockkey = '7c67706a-d0d1-3e8a-b38e-1693a2906c22'
    • Click the Rollback button to set Commit to true
    • Execute the query

After the ApplicationResourceLock has been deleted then azureChannelAdapterTaskExecutor should be able to acquire the lock right away (without a restart) and create the thread. Once the thread has been created Cloud Hot Folders will be able to start processing files. 

Zero byte file

If you upload a zero byte file Cloud Hot Folders will attempt to read it and throw a StorageException. The Exception handling in the method (AzureBlobInboundSynchronizer.transferFilesFromRemoteToLocal()) will catch the error and will not process remaining files in that folder.

  • If you are using a FTP client to upload files to Blob storage it may be creating a zero byte marker file. 
    • Delete the zero byte file from the Cloud Hot Folder.
  • You can also change the azure.hotfolder.storage.container.match.pattern to only process files that have a valid file extension
    • The default value for azure.hotfolder.storage.container.match.pattern is ^((?!ignore).)*$
      • azure.hotfolder.storage.container.match.pattern=^((?!ignore).)*$
    • You could change this to ^.*\.csv$ to only process .csv files.
      • azure.hotfolder.storage.container.match.pattern=^.*\.csv$


Conclusion

In this article, we explained how to push/read Blobs to/from Cloud Hot Folders, including the possible options. We explained how to run an Azure Storage emulator locally, which is useful for local development. Finally, we demonstrated how to upload product media/pictures using the Cloud Hot Folder. Now, you should be able to perform all these operations and fully enjoy the new Cloud Hot Folders.

For more details about the Cloud Hot Folder, you can consult the Cloud Hot Folders section in the product documentation.

Overlay