Search and Navigation in SAP Commerce Cloud - Understanding Solr Queries
Relevant search results can lead to higher conversions. Although we've cover the functionality available to business users and how important having the right content indexed is, it all comes together with the Solr search query. If you haven't been able to get the right search results, you may want to take a deeper look at how the query process works to determine if you need to customize it further.
Table of Contents
- Search for the Relevant Search Result
- Analyzing the Query Terms
- Breakdown of a Solr Query
- Solr Query Decripted
- Optimizing the Query
- Changing the Query in SAP Commerce Cloud
- Solr Schema
- Caching and Cache Configuration
- Solr Tuning
- Recommended Practices
Search for the Relevant Search Result
As stated above, a relevant search result is critical for an exceptional customer experience. Getting to that point is typically an iterative process of analyzing, modifying and reviewing the updated results. Check out this video which covers many of the opportunities available for tweaking search results and is a good place to start.
Analyzing the Query Terms
Before you analyze how a query is broken down to determine relevance, it is important to understand how the terms you search for are translated by Solr prior to executing the search query. In Solr's schema.xml, you can define various field types. Each field type can have various tokenizers and filters that Solr uses. Here's an example of "text_spell_en":
You can see that this "fieldType" is associated with the spellcheck_en field:
For any SAP Commerce Cloud field associated with this field, you can expect Solr to execute the defined tokenizers and filters. Each of these fields is associated with a SolrIndexedType's type field:
The easiest way to visualize how the tokenizers and filters act is through the Solr admin console's "Analysis" window (for example, http://localhost:8983/Solr/#/master_powertools_Product/analysis). In the following image, you can see how a query of "BLACK power drills" goes through each of the various filters (Note: the filters continue but the image below only shows a portion of the processing):
At the bottom, you end up with the result of the tokens that are passed into the Solr query.
Breakdown of a Solr Query
SAP Commerce Cloud is using the RESTful Solr API to query Solr documents from the index. These documents are then parsed and displayed on the search result page or category list page. To optimize, extend, change or debug the Solr query, an understanding of how a Solr query works and how SAP Commerce Cloud is building the query, needs to be understood. For this purpose, we start with a simple example query sent to Solr by the SAP Commerce Cloud. This query is an example of a single term search for "camera" from the accelerator's electronics storefront and was extracted from the Solr log:
Review the article on Solr troubleshooting for help resolving Solr related issues.
Let's have a look at the relevant query parameters.
|fq||'fq' is used for filtering out documents prior to scoring. It's similar to a "where" clause in SQL. It is a fast way of reducing your result sets needed for scoring as Solr has extra caching for these queries.|
|row||Number of request search results. Corresponds to page size for a paged search. The parameter 'start' would be used to offset the search result to further pages.|
|q||'q' is the query itself, using standard Solr query syntax.
For more details of the rest of the parameters, see https://cwiki.apache.org/confluence/display/Solr/Query+Syntax+and+Parsing.
As you can see in the query above, SAP Commerce Cloud searches for the term "camera" in many fields, such as ean_string, code_string, and more. The fields to search against are defined in the commerceservices-spring-Solrfacetsearch.xml as part of the defaultCommerceSearchTextPopulator spring-bean. Each search field has a field-boost and a QueryBuilder. The field-boost influences the relevancy score that Solr calculates for each document. This score is used as a default sort attribute in the storefront and is labeled as "Relevance". The field-boost should not be confused with term-boost, which can be set at runtime in the Backoffice to boost certain search terms rather than fields.
In addition to the exact field to search, various search types can be combined for each search field. For example, "camera^200.0+OR+camera*^100.0". This example consists of an exact search (camera^200.0) with a wildcard search (camera*^100.0) with different boost values (^200 and ^100, respectively) in order to get a higher score for an exact match compared to a wildcard match. These search sub-strings are built using an implementation of FreeTextQueryBuilder. QueryBuilders are covered later in this document.
A quick overview of what search features can be used in a Solr query are:
|^||Boost on search term||
Boosting allows you to control the relevance of a document by boosting its term.
|*||Wildcard search||The wildcard '*' at the end of the search term ensures words starting with the term are picked up.|
|~||Fuzzy search||Fuzzy search matches even if the search term is only partially matching. For example, a search for roam~ will match terms like roams, foam, and foams. It will also match the word "roam" itself.|
Solr Query Decripted
name_text_en:(camera^100.0 OR camera^50.0 OR camera~^25.0)
Exact match on the word ‘camera’ will be weighted at 100 points.
Wildcard match on ‘camera*’ (that is, ‘cameras,' ‘cameraman’) will be weighted at 50 points.
Fuzzy match on ‘camera~’ (that is, ‘kamera,' ‘camra’) based on Levenshtein's distance algorithm will be weighted at 25 points.
Exact match of the word 'camera' and 'battery' where both words are no further than five words from each other, will be weighted at 100 points.
Consider using your own implementation of a FreeTextQueryBuilder for additional boosting of multi-word phrases, in case words are close to each other in the original text.
Optimizing the Query
Often the standard SAP Commerce Cloud search does not behave the way business users expect. Irrelevant products show up as top search hits and the relevant products are often not even shown on the first page. To optimize your query so the products show up in the right order, you need to do two things:
- Understand why Solr scores these irrelevant products so high.
- Once you determine this, you can then explore ways to either reduce the score of the irrelevant products or to boost the score of the relevant products.
It's important to note that there is no 'perfect' query and the most suitable solution is often a compromise. Consider also that optimizing one particular query can have a negative impact on other queries. Often while tuning the Solr query, the focus is too much on a particular query or combination of queries, and you can lose sight of the bigger picture.
Extracting the Query
The fastest way to see how changes to the query are impacting the search result is to take the query SAP Commerce Cloud is executing and run it straight on Solr through the Solr admin console. This will allow you to get an immediate feedback on the query changes. The query can be extracted from the Solr log. Alternatively, the query can be taken from the SAP Commerce Cloud logs by adding it in the SolrQueryDebuggingListener through the Backoffice. If you're on versions prior to 5.7, you can look at the setting the class DefaultFacetSearchService to debug. The raw query looks like the following:
Note that this query is used in SAP Commerce version 6.6. It can be executed straight to Solr using the solr admin console (for example, h):
Changing the Query in SAP Commerce Cloud
Add a New Search Field
As mentioned earlier, the Spring-Bean defaultCommerceSearchTextPopulator is used to build the query. To add a field, the field 'description' in our example, simply extended the list as its own bean. This replaces the bean 'customCommerceSearchTextPopulator' by using the same alias. Note: It is important to set merge="true" to get the list merged with the original one from the parent bean. In the example below, the property "description" is being boosted by 30. Therefore, if a match is found in the "description" field of the Solr document, it will be boosted.
By changing the field-boost, we can influence the score depending on the field the search term has been found in. As an example, we want to score documents higher with a match in the field categoryName to ensure a search for "camera" shows products in the "camera" category higher than products outside of this category. This can be achieved by increasing the field-boost on the field categoryName. Unfortunately, the entire bean customCommerceSearchTextPopulator has to be redeclared to achieve this change:
Change the QueryBuilder Type
The default implementation is DefaultFreeTextQueryBuilder. The QueryBuilder performs an exact search, a wildcard search, and a fuzzy search for all text-fields. Alternatively, the NonFuzzyFreeTextQueryBuilder can be configured which only searches for an exact match. You can also write you own implementation by implementing the FreeTextQueryBuilder.
A Solr schema defines how values are stored and queried. It contains information about Types, their Fields (regular and dynamics), and field Analyzers (one optional char filter, one tokenizer, and several filters). To customize the Solr schema in SAP Commerce Cloud, go to the product documentation.
Analyzers reformat the queried terms for processing. For example:
- The MappingCharFilter removes special characters, such as letters with diacritic marks.
- The WhitespaceTokenizeerFactory breaks the queried string into individual words or terms.
- The StopFilterFactory removes stopwords from the query.
- The LowerCaseFilterFactory changes capital letters to lower-case letters.
Handlers allow the sending and retrieving of information in Solr.
- The Request Handler processes search features (select, query and get). It uses a query parser to interpret search terms and query parameters. It uses a response writer to format output to formats like XML, JSON, and others.
- The Update Handler receives information from external sources (like a relational database). It pushes the transformation into a document, then executes the indexing operation.
- SAP Commerce Cloud adds Synonym and Stopword Handlers on top of these, allowing administration of these features through the Backoffice. See the article for business users for more details.
If you want to see the effect your schema has on your index and queries, you can use the analyzer built into Solr's admin console.
Caching and Cache Configuration
EhCache is the default implementation for the SAP Commerce Cloud Region of the cache. The cache works implicitly every time the API is accessed. Data is cached in the memory of the JVM and invalidations are sent through JGroups on UDP by default. This cache is divided into:
- Type System
- Facet Search Configuration
- Any custom region can be defined
The Facet Search Configuration Cache is created to improve performance. The entire cache is cleared whenever any model object of type related to FacetSearchConfig data is changed.
Solr has its cache regions that are independent of the SAP Commerce Cloud cache regions:
These regions can (and should) be tuned:
- class – specifies the eviction policy
- size – the maximum number of entries in the cache
- initialSize – the initial capacity (number of entries) of the cache
- autowarmCount – the number of entries to prepopulate from an old cache
The regions of interest for SAP Commerce Cloud:
- filterCache – Stores an unordered set of document IDs that match the key for queries. In particular, the results of any filter queries. This cache should have as high a hit ratio as possible.
- documentCache – Holds the underlying document objects. If the documents are small in size and certain documents are frequently accessed, this cache size should be increased
- queryResultCache – Caches the results of searches. This is an ordered list of document IDs based on a query, sort, and range of requested documents.
The process of tuning these regions is not SAP Commerce Cloud specific. The following documentation is a good resource: https://lucene.apache.org/solr/guide/7_7/query-settings-in-solrconfig.html#QuerySettingsinSolrConfig-Caches
To customize the Solr configuration in SAP Commerce Cloud, go to the product documentation.
Additional Tuning - SolrConfig.xml
- AutoCommit – Performs automatically a hard commit under certain conditions.
- maxDocs is the number of documents to add since the last commit before triggering.
- maxTime is the time in ms that is allowed to pass since a document was added before triggering.
SoftCommit ensures that changes are visible but not synced to disk.
- maxTime is the time in ms that is allowed between soft commits.
- QueryResultWindowSize – Block of documents that are pulled into cache.
- QueryResultMaxDocCached – Maximum number of documents to cache for any entry in the QueryResultCache.
Additional Tuning - local.properties
- Define a set of queries with clearly defined search result expectations. These can then be executed to validate that any change to the Solr query configurations, do not 'break' one of these baseline queries.
- Use the Solr admin console's query analyzer to execute a raw query (append parameter "queryDebug=true") to understand why results are scored the way they are.
Understanding the structure of the Solr query and the various options for configuring it, can help you identify and iterate through changes, to arrive at your desired search results. If you're still running into issues, you may want to look at our article on Troubleshooting Solr.