CX Works

A single portal for curated, field-tested and SAP-verified expertise for your SAP C/4HANA suite. Whether it's a new implementation, adding new features, or getting additional value from an existing deployment, get it here, at CX Works.

Search and Navigation in SAP Commerce Cloud - Indexing

Indexing

SAP Commerce Cloud uses relational databases for storing information like orders, carts, customers, and more. For a product to be visible as part of a search result, it first needs to be included in a Solr index through the process of indexing. In this article, we cover the fundamentals of indexing as well as provide recommendations on how to improve your indexing processes.

This article is one out of several that are part of a larger in-depth series on search and navigation.

Table of Contents

Background

Information stored in the database is normalized. This means data is split and linked among multiple tables to improve resource usage and performance. Solr works differently by storing information in documents instead of tables. Denormalization and transformation are required to compose these documents. The information is then indexed to ensure swift access and search capacity. This section covers the process of indexing data from SAP Commerce Cloud into indexed Solr documents that can be used by your Solr searches.

How Solr indexing works is covered in the first few minutes of this webinar.

Indexing Strategies

The indexing process starts with exporting data from SAP Commerce Cloud to Solr before Solr indexing takes place. Solr requires some time for the index operations and should, therefore, take place in the master node (if using a Solr Standalone Cluster). Once completed, the index is then replicated to the registered Solr slaves. For Solr Cloud, careful planning of shards/replicas for a Solr Cloud infrastructure is highly recommended (see the Solr Infrastructure section). The index can be built or updated using different supported indexing strategies such as:

  • Full: This will stop replication, delete all current documents and rebuild the index completely from scratch. This operation can take some time to complete for large data sets, so it usually runs once a day, after the catalog synchronization finishes. Additionally, it supports two modes of commit strategies:
    • Direct mode will add or update every document in the index and if the operation fails, previous entries that had been successfully committed will remain available. Index replication is disabled before this operation begins and then resumes later after it finishes. We recommend this operation mode for a performance environment.
    • Two-phase mode works atomically, and in the case of failure, everything will be rolled back to the initial state. To accomplish this behavior, Solr will create an additional core, which by default uses the same name as the main core, with an added suffix. Such a core will be used to store documents during the indexing operation. Once it's done, a swap operation is performed, and the new index becomes online in the master/leader node and ready for replication.
  • Update: Only index documents that have changed in a given period of time. This operation won't require replication to stop and is usually executed more frequently since it performs faster only on targeted documents. If needed, hot updates on specific documents can be triggered manually or programmed using the API. Updates can also happen partially, only covering specific attributes on a document. See here for more details on creating update jobs.
  • Delete: This simply removes documents from the index. Keep in mind that a full index operation can take care of excluding documents on its own since it is rebuilding the index from scratch. Depending on how often (and how long) you are running a full index job, it can sometimes be more efficient to run delete jobs periodically to maintain accurate data.

For solutions using Solr Cloud, two-phase is the recommended commit strategy, especially when running a full indexing. As mentioned above, the full indexing will delete all the current documents. For Solr Cloud, since all the nodes are all in-sync, using direct indexing, will result a document deletion across to all nodes. Additional information about these strategies can be found here.


Indexed Types

Each SAP Commerce Cloud Composed Type can become a Solr Indexed Type. This will make it eligible for indexing and searching. SAP Commerce Cloud already provides basic types such as Products, but additional ones can be configured using the Solr Item Type Management, available either in the Backoffice or through ImpEx. Each indexed type contains the definition of how Solr should handle each property and how to look for items to index, using Flexible Search queries for the indexer jobs. We recommend to create a separate indexed type (even for the same SAP Commerce Cloud Composed Type) for Backoffice search use as the use case to search for Backoffice is different from that for the storefront for the same Composed Type.



These queries can also be found and configured using the Solr Item Type Management, under Facet Search Config > Indexed Types > Indexed Type Details, in the region labeled as "Indexer Queries". Once item types are properly set, index operations can be requested through the HMC or the Backoffice.



SAP Commerce Cloud already provides cronjobs that can perform all indexing operations stated previously (Full Indexing, Update and Delete). Targeted items retrieved from indexer queries will be processed and then indexed. This process also heavily relies on a component called Value Provider.



Partial and Update Indexing

Update indexing performs an indexing operation for a specific document. There are two ways to modify the content of an existing Solr document.

  • Default Update Indexing: This indexing strategy modifies all the attributes of a Solr document. For example, if a document has 30 Solr index properties, then this approach will update all 30 Solr index properties.
  • Partial Indexing: This approach modifies only specific attributes of a Solr document, which is called a "partial update". For example, a partial update can be configured to only update price and stock Solr index properties. During a partial update indexing process, only stock and price properties of the Solr document will be modified. That means the remaining 28 properties will remain unchanged and unprocessed.

Benefit

Solr partial update is preferable when a few specific attributes change frequently. It's cumbersome to recreate a full Solr document merely to change a few attributes. It's better to modify only the specific attributes of that particular Solr document that have been changed.

Additionally, a Solr document consists of values from different attributes which are created through value providers. Each value provider requires different data processing and can result in increasing the total indexing time. Depending on the business case, some value providers can take longer to process data. By configuring a partial update for only the fields you need, you will reduce CPU processing time, the number of queries to the database and much more.

Implementation

A cronjob needs to be created to perform a partial update. It can be done in the Backoffice. Please go to Managing Solr Partial Updates (Backoffice) for more information.

Consideration

Partial update indexing has been designed to update a portion of the Solr document. For better results, the Solr indexing property needs to be chosen carefully. In a Solr document, there are many properties which don't change that often, like classification data. On the other hand, other properties, like pricing, could change as often as every hour. These type of properties are good candidates for partial update indexing.

Another consideration is the frequency of the indexing job. This is dependent on how often data gets updated. It should be documented in the requirements as to how much old (dirty) data is acceptable, and you should configure the frequency of your partial updates to ensure your data is updated within these parameters.

The flexible search query which will collect data for partial update indexing should be designed carefully. It's the query that should find the changes for the provided Solr properties. 

As partial update indexing doesn't recreate a full document, we recommend to be extra careful with Solr properties where data will be removed completely. For example, if an out of the box implementation is used to index the Solr property "price", it will be saved like this in the Solr document:


{
        "indexOperationId_long": 12816,
        "id": "electronicsProductCatalog/Staged/1934398",
        "pk": 8796098297857,
        "catalogId": "electronicsProductCatalog",
        "catalogVersion": "Staged",
        ....................................
        "price_jpy_string": "¥50,000-¥99,999",
        "price_usd_string": "$500-$999.99",
        "priceValue_jpy_double": 52040,
        "priceValue_usd_double": 610.88
        ..................................
 }


Now, if the price for "JPY" is removed completely then most likely the value provider won't create a value for the property "price_jpy_string" and "priceValue_jpy_double". This means the value will remain unchanged in the Solr document and will still be visible on the storefront. To tackle this type of issue, there are a couple of options:

  • Don't remove the value, instead set the value to 0.
  • Set the data to invalidate. For example, for price set the end date. The value provider should create a "null" entry so that data is set to empty like this, "price_jpy_string={set= null }".

Limitation

  • Attributes property in Solr schema.xml should be set to stored="true".
  • "should not be used on attributes that are used for spellchecking / suggestions" (see "Limitations of the PARTIAL_UPDATE operation" here)

Value Providers, Identity Provider and Results Converter

Value providers can handle the conversion between SAP Commerce Cloud database entries and Solr document values. It's their responsibility to retrieve information to denormalize and complete data for an indexed type or its properties. There are multiple value providers already available for the most commonly used types, and custom ones can be created if needed. Keep in mind that value providers are used extensively during indexing operations. They should be carefully designed and unit tested to avoid issues related to performance and behavior.

Identity providers can handle how to identify a document uniquely in Solr index. The out-of-the-box implementation works for most product item type use cases. However, with the introduction of Solr Cloud capability in version 6.2, it may necessitate customizing the identity provider, for specific use cases for either product or other index types.

The results converter is similar to the service layer converter concept and can transform the search results of Solr Documents into corresponding DTOs (data transfer object) to be used in the storefront.

Recommended Practices

Cronjobs

  • Full Indexing operations should be scheduled to occur after the catalog synchronization finishes. It should be run as few times a day as possible
  • Consider using the Solr update index job to run every 5-15 minutes and update its Solr Indexer Query to only include changes that have occurred since the last index (for example, {p:modifiedtime} >= ?lastIndexTime). This will ensure your key data is up to date without the need to run a full index. Note, an update does not remove documents from the index.

API

  • Flexible Search Queries for indexer jobs can be executed with specific users. Following SAP Commerce Cloud authorization capabilities, these users can have restricted access to information. It is recommended that the index jobs run as admin and that search restrictions be used on the storefront to ensure the customers see only what they should see.
  • The use of value providers, especially custom ones, should be used sparingly, as they are used multiple times during indexing operations. Non-performant providers can drastically increase the time needed to complete a full index job. Carefully design queries using the Flexible Search and minimize loops. Also, consider adopting strategies to improve performance such as Caching and Database Indexes.
  • Consider if a custom Value Provider is really required. Some requirements can be achieved by using SpEL Value Providers:


    (#item instanceof T(de.hybris.platform.variants.model.VariantProductModel)) ? baseProduct.code : null )
  • Direct index mode should be used for servers running in standalone mode (typically for performance, pre-production, and production). During indexing, SAP Commerce Cloud will disable replication before the indexing starts, then will enable replication when indexing is complete to allow the Solr Slaves to pull the updated index.

Conclusion

You should now have a deeper understanding of the Solr indexing process. Once your data is indexed in solr the next step would be to understand Solr Queries, so you can more effectively tune your search results.