Search and Navigation in SAP Commerce Cloud - Solr Infrastructure
SAP Commerce Cloud uses Apache Solr for the search and navigation functionalities, both for storefront and in different perspectives of the Backoffice. A reliable Solr infrastructure needs to be well planned to guarantee a great user experience. This section covers the various considerations when setting up the infrastructure to host Solr for your search and navigation requirements.
It is one of many articles in a larger in-depth series on search and navigation.
Table of Contents
SAP Commerce Cloud on public infrastructure comes with Solr Cloud. This article is targeted mostly at those who are hosting their own solution of SAP Commerce or for developers. If you're using SAP Commerce Cloud on public infrastructure, you can skip to the Solr Cloud section below to learn more about what is happening behind the scenes.
Embedded versus Standalone versus Cloud Solr Server
There are three ways to set up a Solr installation to work in conjunction with SAP Commerce, depending on the version.
- Solr Cloud - For SAP Commerce Cloud on public infrastructure, Solr Cloud is the default. Support for Solr Cloud was introduced in SAP Commerce v6.2 as a new way to set up Solr. It complements the standalone cluster mode for production use for scalability and availability. Solr Cloud leverages Apache Zookeeper, index sharding and replicas to manage large index scalability with ease. For SAP Commerce (on-premise), it is not enabled by default.
- Solr Standalone - Standalone mode is the commonly recommended setup. Solr is executed in its own JVM (and usually its own machine) which makes it much easier to perform the tasks mentioned above. Additionally, to further improve Solr's reliability and scalability, such environments should always use a Solr cluster, which runs multiple Solr instances running in standalone mode. Further information about Solr cluster is provided in the next section. See Note.
- Embedded - means Solr will execute as an external process inside the same JVM that runs your SAP Commerce process. While this mode can be suitable for development, it's not recommended for performance environments such as Production, as not only crashes would be fatal for both, but it would also be harder to setup, monitor and scale them individually. See below Note.
NOTE: Depending on which version of SAP Commerce you're using, a different setup is used by default:
- Versions < 5.7 ships with an embedded Solr server with standalone mode supported
- With versions 5.7 and above ships only with default standalone setup
- With version 6.2, you can set up Solr Cloud as an alternative to Solr standalone cluster
Solr Cloud provides scalability and availability of search and navigation. Instead of configuring the usual master and slave standalone nodes, multiple Solr Cloud nodes are used, containing shards/replicas of the index/collection. This cluster/cloud is managed by Apache ZooKeeper, which is also responsible for request routing and load balancing Solr nodes, for indexing across shards, for distributing queries across replicas, and for managing nodes.
The increased scalability and availability for read, write and sharing capabilities brought by this approach requires careful infrastructure planning, configuration, and testing. It also requires configuring the sharding strategy, the appropriate number of shards/replicas along with managing the appropriate ZooKeeper configuration
Following is a sample infrastructure that is representative of a simple Solr Cloud configuration. The load balancer can be optional, as true routing is done by the ZooKeeper ensemble. Also, the ZooKeeper can be deployed on the same Solr Cloud nodes running in separate JVMs for production/non-production environments. For development and testing, you can have a minimum of one ZooKeeper node (embedded or separate) and two Solr Cloud nodes and they can all be deployed on a single machine.
For reliable Zookeeper service, Zookeeper's cluster has to achieve a quorum. This will avoid split-brain situations where the connection between the leader and the follower are interrupted. How quorum works on Zookeeper's ensemble is when the leader is down, it will automatically select one of the follower nodes as a leader.
To determine the number of Zookeeper nodes for quorum, you need the following formula:
Q= (N+1) / 2
where N is the number of Zookeeper nodes
Solr Cloud Nodes
To maximize the advantages of Solr Cloud, the minimum number of nodes required is four. This will allow us to leverage the distributed indexing with at least two shards, each having two replications. You can also apply the tolerable number of fail nodes allowed formula:
Tolerable number of failed Solr nodes = round down (N/2) - 1
Solr Cloud usually performs better in indexing throughput (both full and partial) as well as query throughput for similar infrastructure with Solr standalone cluster. It does have a slight impact on querying performance while indexing for a similar Solr standalone cluster (assuming the master is not used for querying at all) due to indexing and querying that typically occurs on the same node.
If you intend to use Solr search for Backoffice product search (since version 6.0), preferably on the same Solr infrastructure, for ease of maintenance and troubleshooting end to end search and navigation, we recommend using a single scalability approach (Solr Cloud or Solr standalone cluster) and not to mix and match.
Solr Cloud Configuration
When creating the Solr collection in a Solr Cloud configuration, the Solr indexer configuration in the Backoffice provides the ability to specify the replication factor and the number of shards:
If the collection was already created with a previous configuration, you will need to remove the collection (for example, through the Solr admin on any node). Then, trigger a full index from the Backoffice again, which will re-create the collection based on the new settings above.
The number of shards is the number of shards to create. This should be a number that is less than or equal to the number of Solr nodes. For example, if the number of shards equal the number of Solr nodes, Zookeeper will create one shard per node.
The replication factor is the number of replicas for each shard specified in the number of shards. A replication factor of one means just the primary shard (no replication), a replication factor of two means one extra replica of the primary shard, and so on. Zookeeper decides where to create the replica and will not create it on the same node as the primary shard.
Here is an example of what a Solr Cloud cluster looks like in the Solr admin console:
The Backoffice will not allow you to create a configuration that results in a scenario where each live Solr node is holding more than one shard (either a primary or replica). For example, if you want your Solr collection to be distributed across three shards with one replica each, you will specify a configuration of 3x2 in the above. You will get an error message when trying to create this on anything less than a six node setup:
In the above scenario, a 3x2 setup was attempted on a three-node Solr Cloud setup. In short, it is indicating that you haven't reached true redundancy because there are only three live nodes while six nodes in total are needed to house each shard.
You can still work around this in the Backoffice by creating a 3x1 setup and then in Solr's admin of any one node, create the replicas manually. But again, it is Solr's implicit way of telling you that you need more nodes for the number of shards and replication factor you are looking for.
Solr Standalone Cluster
If you're using an on-premise version of SAP Commerce, then creating a Solr standalone cluster is a possibility. A Solr cluster can be created with multiple machines running a Solr search server, commonly configured to play two basic roles, master and slave. A master node is responsible for performing indexing operations which consists of retrieving data from a given source (like a database), processing, denormalizing and converting it into Solr documents and making it available to searches. Once this index is generated, the master node will be replicated to all slaves in the cluster. Using this index, slaves can receive queries and respond to search requests.
Even though it's possible to use the master for search too, we recommend to use it only for indexing thus providing increased flexibility in the solution. These are some benefits:
- It allows for a separate Solr configuration to be specified for the master node. This allows removing front-end query processing. For example, on the slave (query) nodes, it's useful to warm up the caches by registering a "firstSearcher" event listener to execute a query when the searcher is being prepared. This is not needed on the master.
- JVM configurations can be tailored for master and slave configurations.
- More consistent query and index times. The server load on the master is unaffected by the query loads, and similarly, queries are not slowed during indexing time.
- For high availability requirements, the availability of the master node is not as critical since indexing occurs in the background. If the master node goes down, searches can continue, using data as recent as the last replication from the master. Additional slaves can be added as needed to cover any slaves that fail or that are in high demand.
- For large indexes, there may be some garbage collection issues as well as a physical limitation on the index size. It can also have stale data issues for a very large index for full or partial indexing that takes too long to finish.
Once the cluster is configured, SAP Commerce will internally load-balance requisitions between slave nodes, using a Round-Robin algorithm.
The next diagram presents a basic infrastructure, following the recommendations above.
Some scenarios might require the capacity to handle a greater query load. Therefore a larger farm of Solr slaves is required to handle this workload. A tiered approach is recommended for such a situation, where Solr nodes are configured as both slaves and masters, enabling multi-tiered Solr replication for a more efficient index distribution.
There are three ways to configure the Solr cluster in SAP Commerce:
- Releases 5.X and 6.0 can use the Hybris Management Console to set up Solr, under "Facet Search Config".
- In Release 6.0, it's also possible to find the "Facet Search Config" module in the Backoffice Administration Cockpit.
- ImpEx files can be used to migrate configurations among multiple environments or for backup purposes (see solr.impex files in the accelerator for examples).
For detailed steps on how to configure SAP Commerce Cloud to communicate with Solr, please refer to Managing Solr Search Configuration.
Solr Cloud or Solr Standalone Cluster?
As stated earlier, this option is only available to SAP Commerce on-premise installations. Starting with SAP Commerce version 6.2, you can choose between Solr Cloud or Solr standalone cluster for production deployment for scalability and availability. The following are important considerations to take into account.
- Solr Cloud is not a traditional cloud. It can be deployed on physical hardware, on IaaS/virtualization or in a Docker container.
- Solr Cloud can use distributed indexing (shard > 1) and/or querying (replica > 1). Care should be taken for function queries to return good results as well as to avoid distributed deadlock.
- Solr Cloud can manage a very large index by distributing documents between various shards and by distributing queries among replicas thus improving both indexing and querying performance.
- Solr Cloud can be used to scale horizontally very easily and dynamically, especially with virtualized environments.
- Solr Cloud will generally require more servers to support ZooKeeper and for redundancy for each shard. This will increase the infrastructure cost.
- Since indexing creation/update can be distributed, any eventual indexing failure requires careful configuration as well as deliberate error resolution. Developing a backup/restore strategy is highly recommended for quick disaster recovery.
- For high availability, an odd number of JVM/physical/VM are recommended for the Zookeeper ensemble. As long as more than 50% of the nodes are up, the site will function without an outage.
- ZooKeeper can run on the same node as the Solr Cloud node or on a separate node. Keep in mind high availability, especially in production.
- Management of Solr Cloud is different as roles of nodes can be dynamic, depending on the current state of infrastructure and its history since startup.
General Performance Considerations
Based on our testing, the following points affect performance and should be considered carefully when designing the Solr infrastructure:
- Ensure that the indexer configuration in SAP Commerce is set as follows to reduce indexing time:
- Allocate as many threads as possible to indexing.
- Set the Solr commit mode to AFTER_INDEX. This avoids creating a bottleneck by having to commit changes to the index after each batch of updates when using AFTER_BATCH mode.
- If using the standalone version of Solr, consider using Solr Cloud if indexing time needs to be reduced further.
- Solr Cloud is much faster at querying large indexes than the standalone version. It can also scale better to a large numbers of concurrent users.
- For smaller SKU counts such as < 50k, Solr Cloud is not recommended. For 50k-300k SKUs, you can choose between the two scalings judiciously. However, above 300k SKU, Solr Cloud is better in terms of scalability and availability.
- Ensure the servers running Solr are running the latest supported version of Java.
Solr nodes should not be exposed to the outside world. But if there is a requirement for that, SAP Commerce can expose the search API, through Omni-Commerce Connect (for external systems, such as a mobile app).
- SAP Commerce doesn't support common security strategies available on Solr, such as User Authentication. But internal network restriction is still possible through whitelisting Solr and SAP Commerce IP addresses. This can be achieved through Jetty's
IPAccessHandlercomponent. Be aware that access to Solr admin interface would require additional IP entries or direct access to allowed machines.
- Protect your Solr endpoint URLs by Access Control Lists (ACL) and firewall rules. Even if it is not exposed outside your network, you run the risk of an employee inadvertently dropping a core.
- Only allow incoming HTTP requests from your app serves to the Solr servers.
- Do not allow your lower environment app servers to make HTTP requests to production Solr servers since the Solr configuration is stored in the database. Otherwise, this would cause huge problems if you were to refresh your staging database with production data.
- For troubleshooting purposes and access to Solr admin console, you could also allow access from a designated jump box where only the operations team have access.
- We recommend to secure Zookeeper ensemble using internal network restrictions if using Solr Cloud.
- Even though it's not mandatory to have different machine setups for master and slaves, master nodes should have higher CPU (and can have smaller memory). Slaves should have higher memory (and can have lower CPU).
- Solr nodes benefit from 2-4 CPU cores (slave/master) at the very least. Their JVM can be initially set to use half of the machine's RAM.
- SSDs are better for indexer but are not necessary.
- A commonly accepted approach for memory is to consider the index size in the disk, the amount required to run applications in the JVM and adjust the amount of memory to the sum of those numbers.
Example: A 4 GB index in the disk, plus a 4 GB of memory to run Solr would result into a JVM using 8 GB of memory. Half of the machine's RAM is usually recommended to properly accommodate OS and Cache. Here, you would be looking into a 16 GB RAM.
- We do not recommend to give more than 16 GB of memory to the JVM.
- Sample numbers (baseline that should be scaled up or down based on testing your data set):
- 2-4 cores; 8-16 RAM
- 50-100 million documents of approximately 1000kb
- 10 queries per second
- Results in under 0.5 seconds, for small/medium queries
- For Solr Cloud, all nodes should typically be identically sized. The primary decision to be made are the number of shards and replicas that can be easily managed in a given cluster and the need for availability.
- Also, it is preferred to set up the ZooKeeper ensemble in separate nodes, for better availability needs.
- Starting with Solr version 5, which comes with SAP Commerce 5.7, deploying Solr as a WAR in servlet containers like Tomcat is not supported by Solr. From a performance standpoint, there are no benefits of running Solr on Tomcat instead of Jetty even for an older version of Solr (Solr 4 and Solr 3). We recommend you run Solr on Jetty even for these lower Solr versions.
- Tune the JVM settings and garbage collection settings for Solr. By default, Solr startup scripts would only use 512 MB of memory for heap size. Some recommendation on Solr tuning and startup scripts can be found at this URL: https://wiki.apache.org/solr/ShawnHeisey
- The recommendation to not use G1GC is no longer relevant.
UseLargePagesoptions, can be enabled with caution. If the number of HugePages allocation is not right, it will trigger Solr's Out-Of-Memory (OOM) scripts.
- Disable the compound file system, if enabled (solrconfig.xml). Also, increase open file limits to 15000 if not using the compound filesystem.
- Tune the Solr queries
- Use fq as much as possible to optimize caching
- Avoid similarity parameter too low for fuzzy search
- Use indexed=false, stored=true (use for search disabled in BackOffice) for fields that are used for display only so they won't be put in the memory (schema.xml)
- Use stop words to reduce the number of indexed words
- Tune Solr caches, with an emphasis on filterCache (solrconfig.xml)
- For large number of items that are being indexed, we recommend to change
You should now be able to determine which Solr option to use and some general guidelines on how it should be structured. Additionally, you should now have an idea of some of the recommended practices to set up the hardware. Once your Solr infrastructure is set up, your next step is to look at tuning your indexing process.