Infrastructure Considerations for On-Prem SAP Commerce
43 min read
Although we covered Go Live Readiness in a different article, when running SAP Commerce on your own infrastructure, there are many additional considerations that have to be accounted for as part of your initial go-live. Additionally, you will have to account for the ongoing maintenance of your solution. Proper hardware and OS monitoring, and an established error process should be in place. An instrumentation of the application itself is highly recommended. This gives you the ability to monitor the application and pro-actively work on potential bottlenecks before they affect the production environment.
For the SAP Commerce application server configuration, you need to make sure that you have separated the storefront and back-office nodes. Also, from a configuration management perspective, there are many items which need to be configured in different files of the SAP Commerce platform. For example: local.properties, server.xml (Tomcat), web.xml (Tomcat). Some of the configuration settings need to be applied before the final compilation of the SAP Commerce Software Package, and some need to be applied to a specific environment or node.
Table of Contents
- General Considerations
- Application Cluster
- Load Balancer (LB)
- Web Server
- Web Server Security Hardening
- Application Servers
- Application Servers OS Settings
- Application Server Configuration Settings
- Garbage Collector
For all servers in the SAP Commerce environment, regardless of the layer they are part of, our recommendations include:
- You should have a non-root partition for the application install location and log storage location. This is needed to prevent the root partition from running out of space.
- Increase the open files limit to 65536 so that you reduce
the risk of getting “too many open file errors” under load.
- Turn off unneeded services such as FTP, SMTP and telnet. Only services that are core to the OS and required applications supporting SAP Commerce should be running on the node.
- Make sure to use low latency network interfaces for all servers in SAP Commerce environments.
- Web-to-app server protocol. Communication between the web servers and app servers can be performed using HTTP or AJP. AJP has been proven to be better performing than HTTP in SAP Commerce implementations.
- Verify that the OS has the latest security patches installed and that there is a scheduled operation process to update to the latest security patches. Design these maintenance processes with the goal to minimize downtime. For example, an OS patch may require a restart of the host. A Java update would require a restart of the SAP Commerce instances, and in some cases, it could require a restart of an ant server. Performing these types of activities on a monthly basis during the engineering phase would ensure that you are ready for launch and have a defined an OS maintenance process that is in line with the desired service-level agreements (SLA’s) for the application.
Prior to go-live and prior to completing development, plan a professional audit and penetration test to be performed by a third-party auditor against the SAP Commerce infrastructure. This will help to detect potential security issues with your code and configuration.
In an SAP Commerce cluster, you also need to make sure you have a way to share/synchronize media to/between the nodes in your SAP Commerce cluster. You may have an admin node where the media (for example, product images) is created, but eventually, the storefront would need to be able to access these files. A straightforward and good practice method is to use a NFS share, and configure the SAP Commerce application servers media-folders to point to that share. An alternative would be to use a scheduled rsync process from the authoring SAP Commerce node to the rest of the nodes. Please note that this approach may not fit your environment if you have multiple authoring nodes, or if you need immediate availability for media on the other nodes. Additional media storage strategies can be used, and are supported by SAP Commerce. A few examples include: Amazon S3, Windows Azure or MongoDb.
We recommend separating traffic between the storefront and backoffice at both the web server and application tier. If using Solr standalone, you should have a Solr primary, which is configured to be dedicated exclusively for indexing. Additionally, you should have at least two Solr replicas that will serve search queries from SAP Commerce. With Solr Cloud, you should use zookeeper to elect the leader node. SSL should be terminated at the load balancer for the storefront application. Although there is separation for the traffic between the storefront and backoffice, all the SAP Commerce nodes are connecting to the same database and they are all part of the same SAP Commerce cluster. Your system may look slightly different. Some projects do not use the SAP Commerce storefront, and they may only use the Omni Commerce Connect (OCC) layer. Deployments on public cloud infrastructure, such as Amazon Web Services (AWS), prefer to have an internal load balancer in order to take advantage of auto-scaling, among other services. You may use Nginx instead of Apache, so you will use HTTP instead of AJP. However, some of the key takeaways with this approach should be:
- Take time to document your setup in a diagram.
- Separate backoffice and storefront traffic.
- Plan to have redundancy at every layer.
- For Solr make sure the primary node does not serve search queries.
- Avoid sending SSL requests all the way to Tomcat. Try to terminate SSL at the load balancer or web server tier.
Load Balancer (LB)
- A common approach which is used in most on-premise environments is to have a load balancer in front of the web servers, which in turn use a module, such as Apache's mod_proxy_balancer, that is used to balance the application servers. The mod_proxy_balancer would need to be configured to use sticky sessions, an example on how to do this with mod_proxy is shown here https://httpd.apache.org/docs/2.4/mod/mod_proxy_balancer.html.
- For other deployments, such as in AWS, it is typical to use an architecture with two elastic load balancers (ELB): One external in front of the web servers and the other one internal which balances the app servers. This helps customers take advantage of the Amazon auto-scaling feature. In this case, sticky sessions only need to be enabled on the internal load balancer. Also until recently, customers running in AWS needed the web server tier, since the ELB could not handle URL routing. Recently AWS introduced an Application Load Balancer service which can be used to block requests to specific paths (backoffice in our case), and potentially eliminate the need for the web tier if a good content delivery network (CDN) strategy is in place for asset caching.
- SSL termination should be done before Tomcat termination on the load balancer or the web server layers.
- Consider terminating SSL at the load balancer. Usually, the hardware load balancer performs better with crypto operations, and you will also benefit from maintaining the security certificate in only one place. You could have two certificates since it is possible that you can have a passive load balancer for DR purposes.
- If you are using CDN for asset edge caching, you should have your public site’s certificate installed on the CDN and a private certificate on the load balancer. Note that between CDN and load balancer, you would have to re-encrypt the original SSL traffic.
- You will also need to:
- Set the x-forwarded-proto=’https’ header either on the CDN or the load balancer.
- Configure the Tomcat remoteIP valve so that it inspects this header and determines if the original request was secure or not.
- Note, that if you have a 172.16/12 network scheme, you also have to allowlist the internal IP of the load balancer in the RemoteipValve's internalProxies attribute on your application server configuration (server.xml). See more about this in the application server section.
- The benefits of terminating SSL on the LB or CDN are:
- The offloading of SSL processing.
- The management of the SSL certificate in one location.
- The reduction of load on the application and web server.
- Block access to backoffice URLs on the load balancer and/or at the web tier.
- Ensure that production SSL certificates have been installed, that they have the proper domain, and they have not expired. Set up reminders for a few months before expiration dates of SSL certificates.
- Protect your SAP Commerce environments with a Web Application Firewall (WAF) and intrusion detection systems. Monitor and update the WAF on a regular basis. There are many commercial options available, and some of the CDN providers offer WAF and IDS solutions. However, if you have budget constraints, you could try to use open source solutions such as snort (https://www.snort.org/). It is a Payment Card Industry Data Security (PCI) requirement to have such systems in place for your commerce environment. However, this document is not meant to be a PCI audit process.
Attention on F5 and sticky sessions
While F5-> Web -> Application Server has been the most common architecture for SAP Commerce deployments, there are some other options as well. Please note that having the F5 in front of the SAP Commerce application server would require some special handling of connections to ensure sticky sessions.
- F5 -> web servers-> F5 -> Application servers
- This adds an extra layer of complexity, and an extra hop between client and origin server, but may be easier to manage configuration
- In case you decide to chose this option, please make sure you use the F5 OneConnect profile.
- Akamai->F5-> Application servers
- Make sure you use the OneConnect profile in F5.
- F5-> Application servers
- Make sure you enable the OnConnect profile in F5.
If the OneConnect profile is not enabled on the F5, you can experience a loss of stickiness in all cases presented above. For example, if the virtual server does not reference a OneConnect profile, and the BIG-IP system initially sends a client request to node A in pool A , the system inserts a cookie for node A . Then, within the same Transmission Control Protocol (TCP) connection, if the BIG-IP system receives a subsequent request that contains a cookie for node B in pool B , the system ignores the cookie information and incorrectly sends the request to node A instead. For each one of the above cases we may experience loss of stickiness:
- Subsequent requests on the same TCP connection from the web server to internal F5 will be routed to the initial node regardless of the sticky cookie.
- Subsequent requests from the same edge Akamai node that reuses the same TCP connection will be routed to the initial node regardless of the sticky cookie.
- Subsequent requests from the users coming behind the same corporate firewall may be routed to the initial node regardless of the sticky cookie.
To resolve these issues, the OneConnect profile needs to be enabled on the F5.
Out-of-Band Health Checks
If you have a load balancer in front of SAP Commerce that does out-of-band health checks, it is best to configure a filter on top of the filter chain of your storefront extension which exposes an URL for the load balancer to ping and respond with a minimal page impact. This will make sure only a small amount of network traffic is transferred over the network and no unnecessary sessions are created in the web application.
- Generally, two or three web servers for the storefront should be enough, even if you serve static assets from the web tier and you have a large application server cluster. Please see Web Server Sizing for more information. Adding more web servers could also be problematic in certain situations. This is because most of the time, the web server configuration would list the application servers in the same order. In this case, if you have X number of web servers, you may end up with the first X number of users on the same application server. This is especially seen in cases where you cache the static resources on the client side, or when you stick sessions at the load balancer layer (which you should not do. See Load Balancer section). If you need more web servers for your storefront, we recommend to adjust the order of the application servers in the balancer directives, so the application server order varies across web servers.
- Protect your environment so it does not receive HTTP requests with the wrong header. If you receive
requests without a valid host header (not matching SAP Commerce CMS patterns), this can cause Apache to put
the Tomcat node in an error state if you serve error pages from Apache (ProxyErrorOverride is set to on).
Protecting your environment can be done using the following methods:
- Ensure a dummy virtual host is defined as the first virtual host for:
- Port 80 if the LB terminates the SSL and you send all requests through port 80 of Apache downstream.
- Port 80 and 443 if:
- SSL is terminated at the load balancer and you use 443 as the non-SSL but secure channel (when you don't use the x-forwarded-headers).
- SSL is terminated at apache and 443 is your secure port.
- Configure your load balancer or firewall to drop any requests with invalid host headers.
- If you have a CDN, restrict access to your origin IP to only the CDN servers.
- Ensure a dummy virtual host is defined as the first virtual host for:
All the above are valid options. However, using Apache dummy host can protect you internally as well.
- Ensure sticky sessions are configured appropriately.
- Be aware of the mod_proxy_balancer behavior for Apache 2.4.13 and lower versions. When you serve error pages from Apache (ProxyErrorOverride is set to on), if the SAP Commerce servers return a single HTTP 500 error and the request went through Apache, that server will be put in an error state for one minute, or the value of the retry attribute in the BalancerMember directive (if you changed that from 60 to a different number). Ideally, you would request to the development team that the application server code does not throw 500 errors. All exceptions in your web applications should be handled, logged, and not bubbled up to the container. For Apache 2.4.13 and up this issue has been fixed. Please see https://bz.apache.org/bugzilla/show_bug.cgi?id=56925.
- Tune the reverse proxy settings (keepalive, ping, timeout).
- Most customers leverage Apache HTTPD for the web tier. SAP Commerce recommends using the 2.4 version
of Apache HTTPD over version 2.2. Depending on your OS choice, you may not have Apache 2.2 distributed
as an rpm, so you may need to compile it from source. We recommend the 2.4 version over 2.2 for the
- Bug/performance fixes in mod_proxy. Especially in mod_proxy_ajp.
- Multi-Processing Module (MPM) event is supported in 2.4. In 2.2, MPM even is experimental and SHOULD NOT be used.
- No need to plan for an Apache upgrade in the short-term.
- Depending on the usage and incoming expected traffics, MPM Prefork VS MPM Event or Worker have each merits and demerits. For most of the B2C cases; ensure the event or worker MPM is used.
- If using MPM Worker in Apache, tune the number of threads and child processes for Apache. Coordinate
maxRequestWorkerswith the number of Tomcat available threads. In your calculation, remember to include in the number of SAP Commerce servers and number of web servers. Also, increase the Apache threads if static content is served by Apache. As a rule of thumb, you could start with the following formula:
MaxRequestWorkers=(number of tomcat nodes/number of apache nodes) * max tomcat threads. Keep the number of servers low and the threads high. Make sure that
minSparethreads>= startServers * ThreadsPerChild and make sure you leave enough of a difference between
MaxSpareThreadsso that Apache does not work too hard on spawning and killing processes to maintain this balance.
- Typically, the web layer is where you would configure and define access restrictions to back-office web applications (cockpits). The external traffic should only be allowed to reach your storefront application and any public APIs. This restriction can also be defined in the load balancer, but it is best practice to define these restrictions on both layers. Depending on the size of your environment, you may have a separate web server that would front your back-office applications (Product Cockpit, WCMS cockpit, CS Cockpit, and others). Alternatively, you could use separate virtual hosts in Apache. You can define a separate virtual host for each back-office application for ease of use, bind that to the internal IP address of the web server and give it a friendly server name that would only be resolved in your internal network. Please note that if you have users working from remote locations, you should not expose any back-office applications to the public. Use solutions such as VPN or Citrix to provide these applications to remote locations.
- Define error pages on the web server for the following types of responses. Do not set the
ProxyErrorOverride, since the 500 and 404 errors should be handled by the application servers.
- HTTP 503
- HTTP 504
- HTTP 403 for any requests that are blocked to the application servers
- Configure the web server layer to perform HTTP compression.
- Consider serving static content from the web server. Even if you use a CDN, this will reduce the number of requests to the application server.
- Consider caching content on a CDN (Akamai or other edge cache provider). Start with static and non-personalized content. However, for sites that are expecting heavy loads, explore caching page fragments or page results. Careful testing and application tuning may be needed if caching pages at CDN’s.
- Define and configure maintenance pages to be served from the web server or load balancer. The maintenance page process should be easily managed via load balancer or web tier, and should be search engine optimization (SEO) friendly.
Web Server Security Hardening
- Turn off directory listing on your web server, and disable the auto-index module.
- Configure the web servers so they do not display server signature.
- Disable any unneeded Apache modules.
- Make sure Apache runs as a non-root user. The user account and group should be changed. They should not be "nobody" or "daemon", which are the defaults.
- Verify the Apache user account has an invalid shell, and lock the Apache account.
- Apache directories and files should be owned by root and have a group ID of the root. The only exception should be the Apache web document root (htdocs) that could have a group ownership, which allows it to be updated through a change management process.
- Restrict other write access to the Apache configuration files. Except for the web document root, which could have a group, write access to all other Apache directories should have 755 permission. Files should be similar except when executable permission does not apply.
- Secure the core dump directory or disable the core dump altogether. The core dump directory must not be in the web document root. It should be owned by root, have a group ownership of the Apache group as defined in the Group directive, and must have no read–write-search access permissions by other users.
- Restrict group write access for the Apache directories and files. An exception can be made to the document root.
- Make sure that the Apache group that is used to run the server does not have write access to any directories in the document root.
- Restrict Override for All Directories. Ensure that AllowOverride is set None, and there are no AllowOverrideList directives present.
- Restrict Options for the OS Root Directory and document root. Set Options to None on the root directory tag, and document root Directory tags or any other directory in the Apache configuration file. Usually, in a reverse proxy setting, the Options directive should not be needed.
- Remove any default index.html or welcome page, and ensure the configuration files do not provide an index or welcome page. Ensure the Apache user manual is not installed.
- It is a best practice that server-status and balancer-manager handlers are defined. They should be protected with a password, and access should only be allowed from specific source IP.
- Comment out the reference to proxy-html.conf in the main Apache configuration file.
- Remove the Default CGI Content, printenv and test-cgi: rm cgi-bin/printenv cgi-bin/test-cgi
- Limit HTTP Request Methods to only allow GET, HEAD, POST and OPTIONS using LimitExcept directive. Please note that if you are using REST APIs you may also need to allow PUT and DELETE for the directory that serves these requests.
- Disable HTTP TRACE Method by setting TraceEnable to off in the Apache configuration file.
- Consider using mod_rewrite to restrict HTTP Protocol versions lower than HTTP 1.1. This may disable some old browser and tools used for monitoring or troubleshooting, such as wget and telnet. You may consider restricting this protocol after monitoring the logs and determining what type of traffic uses HTTP 1.0.
- Restrict Access to .ht* files.
- Configure the error log and set its level to notice. Consider adding the jsessionid to the access log.
- Configure log rotation. Make sure logs are not stored on the root partition. Also, retain logs for at least 13 weeks.
- Make sure the web server is at the latest patch level.
- If using SSL on Apache, protect your private keys so that they can only be read by root. Ideally, you will not need this, as the recommendation is to terminate SSL on the load balancer, and traffic from the load balancer to the web server would go through port 80. This assumes that the load balancer and web servers are within a secured environment. PCI/card data should never be stored within the environment as you would use an external payment gateway to process payments.
- If using SSL disable weak ciphers, only allowing for FIPS compliant ciphers.
- Make sure
SSLInsecureRenegotiantionis set to off.
- Ensure SSL compression is set to off.
- Consider setting timeout directive to be 10 seconds or shorter in order to prevent DoS attacks. When setting this value, be aware that it could also affect requests that Tomcat is slower to respond to. This is an addition to, and not a replacement of, having a WAF or intrusion detection solutions.
- Make sure the KeepAlive directive is set to on and MaxKeepAlive requests is set to 100.
- Set the KeepAliveTimeout to 15 to mitigate denial of service attacks.
- Enable mod_requesttimeout in the Apache configuration: RequestReadTimeout header=20-40,MinRate=500 body=20,MinRate=500
It is a best practice to separate the back-office and storefront servers on different application servers. We recommend this separation because:
- Backoffice web applications are more resource intensive than the storefront.
- Backoffice nodes can run catalog synchronization jobs, imports, exports, or any other batch integration activities that could put a tremendous load on the system.
- Usually, the backoffice applications have different SLAs.
- Separation of concerns from the standpoint of monitoring and troubleshooting.
- Different tuning strategies may need to be set up specifically for your back-office applications (i.e. cache size and eviction strategy, JVM settings, etc).
Regardless of the size, SAP Commerce Expert Services recommends that you have at least one back-office node. For small-size customers, one back-office node can be used both for batch jobs and back-office web applications. For larger customers, these can be further separated in one separate back-office node that runs the batch jobs, and multiple other nodes that could be further dedicated to each serve a specific area (Product Cockpit, Customer Service Cockpit, WCMS cockpit, etc).
Application Servers OS Settings
- There are a couple of kernel settings that need to be tuned for the SAP Commerce clustering mechanism to
function efficiently. SAP Commerce clustering is used to communicate cache invalidation messages between
application server nodes in a cluster. For more information on SAP Commerce clustering please
see Cluster. Best practice is to leverage the SAP Commerce clustering mechanism with
JGroups UDP. Since version 5.0, this is the default clustering mechanism. For optimal JGroups communication,
you may need to adjust the following kernel settings the sysctl.conf file depending on the JGroups library
version you are using. Please check the startup log of the servers. If there are any JGroups warnings, the
log may suggest to apply the following:
- net.core.rmem_max = 26214400
- net.core.wmem_max = 655360
- The JVM should be configured to use from 6 to 8 GB of RAM in most cases if you use 4 vCPU machines. Allow enough physical RAM for flexible JVM tuning. Usually, you would allocate 10 to 14 GB of physical RAM.
- Typically, you would allocate 4 cores for each SAP Commerce node, but in larger clusters you may decide you want to reduce the cluster size and use 8 cores for each application server instance.
- Make sure you are using the most recent release of java. Note that open-jdk is not supported, so make sure you use the Oracle JDK. Check third-party compatibility that applies to your SAP Commerce version for additional information: Third-Party Compatibility.
Application Server Configuration Settings
- Ensure you have production licenses in place.
- SSL termination:
- On the load balancer layer or Content Delivery Network (CDN - if you used one), it is necessary to set a header (x-forwarded-proto) for the requests that came in over the secure channel. An additional step we need to perform on the application server layer is to configure Tomcat to interpret this header so that it knows when the request was originally secured.
- Define the RemoteIpValve inside the Engine tag of the server.xml and, in case the internal interface of the IP address where you set the x-forwarded-proto header is one of the 172.16/12, you should also set the attribute trustedProxies to this IP address. For more info see https://tomcat.apache.org/tomcat-7.0-doc/api/org/apache/catalina/valves/RemoteIpValve.html.
- Session timeouts:
- In most cases during site operational activities, the number of HTTP and Jalo sessions should be
identical (or very close) for each web application. The SAP Commerce Administration Console provides a
chart view of HTTP and Jalo sessions. In recent versions of SAP Commerce (5.1 and up), the SAP Commerce
Administration Console will not display the number of HTTP sessions. An alternate way to monitor the
number of sessions is using JMX. Tomcat exposes the number of HTTP sessions for each web application
through the Manager/Active Sessions mbean, and SAP Commerce exposes the Jalo sessions as mbeans. These
can be pulled and charted into your Application Performance Management tool (for example, Dynatrace,
Newrelic, and others). Please note that if you have monitoring scripts that will periodically “HTTP GET”
an URL on the application servers, you will get more HTTP sessions than Jalo sessions. Depending on the
web session timeout, this could be much higher. Solutions around avoiding unnecessary sessions due to
HTTP monitoring could be:
- Your monitoring script is cookie-aware, and it will retain the cookie from HTTP request on the subsequent requests.
- Your monitoring script is not cookie-aware, and you will need to design a JSP page that does not create an HTTP session, and use that for your health checks.
- In most cases during site operational activities, the number of HTTP and Jalo sessions should be identical (or very close) for each web application. The SAP Commerce Administration Console provides a chart view of HTTP and Jalo sessions. In recent versions of SAP Commerce (5.1 and up), the SAP Commerce Administration Console will not display the number of HTTP sessions. An alternate way to monitor the number of sessions is using JMX. Tomcat exposes the number of HTTP sessions for each web application through the Manager/Active Sessions mbean, and SAP Commerce exposes the Jalo sessions as mbeans. These can be pulled and charted into your Application Performance Management tool (for example, Dynatrace, Newrelic, and others). Please note that if you have monitoring scripts that will periodically “HTTP GET” an URL on the application servers, you will get more HTTP sessions than Jalo sessions. Depending on the web session timeout, this could be much higher. Solutions around avoiding unnecessary sessions due to HTTP monitoring could be:
- Tune the Tomcat connector settings (maxThreads, connectionTimeout, and more):
- The connectionTimeout (defined in seconds) should be matched to the TTL attribute on the BalancerMember in the Apache HTTPD balancer settings.
- Tomcat 7 (leveraged in SAP Commerce since version 5.1), made the NIO AJP connector available. You may be tempted to enable this, with the assumption that it would yield better performance. However, at the time of writing this document, we have noticed issues with using the NIO AJP connector. This is likely due to Servlet 2 specs still leveraging blocking operations. The recommendation is to stick with BIO connector. However, if performance is not increasing to expectations, you could compile the native APR libraries to leverage the AJP APR connector.
- Set the URIEncoding attribute in your AJP connector to “UTF-8”.
- Tune the cache:
- Adjust entity region and query cache region sizes and eviction policies. Some tests show the LRU eviction policy to work best (for most scenarios). Note that this should still be tested.
- Also, consider pre-warming the SAP Commerce cache before a new application server node is brought online. This should especially be done in the scenario where you want it to take a lot of traffic immediately. However, this could be part of continuous deployment strategy as well.
- Consider automatically loading the type system into cache.
- Access most commonly viewed pages.
- While not strictly related to SAP Commerce cache, you could also consider sending your more popular search queries against the node before coming up.
- SAP Commerce Expert Services has built a “Node Warmer” extension that you can leverage.
- Some of the above tasks could also be implemented using automated testing tools, such as Selenium or HTTPUnit.
- Additionally, you could consider setting a smaller load factor in the balancermanager of Apache for a new node in the cluster, and increasing that value as the cache warms up naturally.
- If you did not purchase the SAP Commerce high-performance license module, the total cache size for all regions must be lower than 170000.
- Adjust the logging level in production environments to WARN or higher.
- Configure log rotation, but ensure it complies with your data retention policies.
- Ensure JGroups multicast is used for cluster communication (if not in a cloud setting). Also add the ipv4
settings to JVM to avoid problems with ipv6:
-Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Addresses=true. Verify the cluster works correctly by monitoring the cluster using HAC.
- If JGroups TCP is used, for example in environments hosted in AWS where multicast is not allowed, please note that you cannot use a JNDI data source for SAP Commerce database. You will need to specify all the database parameters in the local.properties file.
- Ensure the max-age headers for static assets and media resources are set appropriately.
- You can inject the cache-control headers for specific folders on Apache if media and static assets are served by the web tier.
- Remove any tenants in any environment that is higher than the integration development environment. Set installed tenants to nothing in local.properties file.
- Disable Lucene cronjobs. This has been disabled by default in newer SAP Commerce versions.
- Lock the system initialization screen and protect it by setting system.unlocking.disabled=true.
- Disable tenant Restart on DB connection lost: tenant.restart.on.connection.error=false.
- Configure cronjobs to clean up instances of job logs, old carts or any instances of types that are meant to be short-lived.
- Set up custom error pages to be served by the application server:
- Define custom HTTP 500 response error page.
- Define custom HTTP 404 response error page.
- Ensure that secondary caches are being invalidated following syncs. Price caches, promotion caches, and droplet caches are examples of secondary caches.
- If you are using a case-insensitive DB collation, and are still on a version with HMC, we recommend to disable case sensitive searching through hmc.caseinsensitivestringsearch=false.
- Be sure to thoroughly test the impact that deployments/syncs have on the system, especially while the customer-facing site is under heavy load.
- Perform Linpack sql, sqlmax and overall tests while the system is idle in all environments, and compare the results between servers and environments. These tests can be executed from HAC->performance. Typically for Linpack tests, you should have a composite score of more than 1100 per CPU, and for SQL you should have under 0.7 ms average response time. These could validate differences in infrastructure performance across environments.
Special care should be taken on configuring the JVM Garbage collector (GC) for SAP Commerce projects. The GC is a fundamental piece of software for the internal JVM dynamics and having the right configuration will ensure the success of the memory management on the SAP Commerce e-commerce servers.
Recommended Practices for the GC Configuration of SAP Commerce Servers
The recommendation is to use G1-Garbage Collector on your servers for the following reasons:
The Garbage-First (G1) collector is a server-style garbage collector targeted for multi-processor machines with large memories such as SAP Commerce servers. G1 meets garbage collection (GC) pause-time goals with a high probability while achieving high throughput. The G1 collector is designed for applications that:
- Can operate concurrently with applications threads, like the CMS collector.
- Compact free space without lengthy GC induced pause-times.
- Need more predictable GC pause durations.
- Do not want to sacrifice a lot of throughput performance.
- Do not require a much larger Java heap.
- G1 is a "compacting" collector, and works by compacting sufficiently in order to completely avoid the use of fine-grained free lists for allocation. It instead relies on regions. This considerably simplifies parts of the collector, and will mostly eliminate potential fragmentation issues.
- The G1 offers more predictable garbage collection pauses than the CMS collector, and allows users to specify desired pause-targets.
The G1 main focus is to provide a solution for users running applications that require large heaps with limited GC latency. Specifically, heap sizes of 6GB or larger, and a stable and predictable pause-time below 0.5 seconds.
Applications running today with either the CMS or the ParallelOldGC garbage collector would benefit by switching to G1 if the application has one or more of the following traits:
- Full GC durations are too long or too frequent.
- The rate of the object allocation rate or promotion varies significantly.
- Undesired long garbage collection or compaction pauses (longer than 0.5 to 1 second)
Most of these traits are present on SAP Commerce instances simultaneously, and these conditions are sufficient argument to justify the use of the G1 Garbage collector on SAP Commerce servers.
Recommended GC settings for SAP Commerce server instances
SAP Commerce runs on the embedded Tomcat web container. Like in any other Java application that runs on
Tomcat, the GC settings should be configured on the
tomcat.generaloptions file. The
following table displays the recommended settings for GC relevant to SAP Commerce instances:
|Purpose of the property||Property configuration||Setting should be present in Production?|
|Use the Garbage First (G1)
|Use the Concurrent Mark Sweep (CMS) collector||
|No. SAP Commerce recommends using the G1GC.|
|Use the Parallel Collector||-XX:+UseParNewGC||No|
|Use the Serial Collector||-XX:+UseSerialGC||No|
These properties have been deprecated since Java 8. Remove if present.
|Print GC Date stamps||
|Print timestamps at garbage collection.||XX:+PrintGCTimeStamps||Yes|
|Print messages at garbage collection. Manageable.||
Also equivalent to -XX:-PrintGCDetails
The G1 garbage collector has been fully supported since the Oracle JDK version 7 update 4.
- Verify that the latest version of the current JVM is running (version 5.5.0 and lower are limited to Java 7, 5.5.1 and up should use Java 8, 1905 and up should use Java 11).
- Do not use open-idk. It is not supported. Instead, use the supported version of Java for your version as listed on the Third-Party Compatibility page.
- Verify that the -server flag is used, even if the machine is supposedly "server-class. Do not fully trust ergonomics in the JVM.
- Verify that -Xms and -Xmx are set to the same value.
- Make sure you use G1GC or CMS algorithm. As mentioned in the previous section, SAP Commerce Expert Services recommends G1 garbage collection as it requires minimal tuning from the default settings.
- Verify that garbage collection patterns look healthy during load tests and soak tests. Restarting instances periodically to alleviate memory issues is not acceptable.
- Ensure that -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:<file> is used. These options should stay enabled even in production.
- Verify that the objects in the heap look normal when the instance is under load. To check, generate a histogram using $JAVA_HOME/bin/jmap.
- Investigate the usage of -XX:+UseTLAB. Lower CPU utilization may occur.
- See the application server configuration section for a starting example of parameters for G1GC. Note that this does not leverage large pages.
- In case that you are using a parallel garbage collector due to project-specific reasons, verify that the number of parallel GC threads (-XX:ParallelGCThreads) has been explicitly set. Experiment for an optimal value. This assumes you are using a parallel garbage collector. Typically, this defaults to the number of CPU's. If you have more then 8, use 3 + ((5* # of CPUs)/8).
- If you are still using a version with HMC and/or the cockpits, disable default passwords:
- For passwords, see Password Storage Strategies.
- Run a flex search query to find users with vulnerable passwords, and passwords in clear text:
- Reset or remove default accounts to your defined encoding method using ImpEx (except for admin). This may
need to be done after production initialization as cutover activities. Make sure system updates do not
revert these accounts to their defaults.
- Change the default salt for media storage hash by redefining this property: media.default.storage.location.hash.salt . We recommend that this be done earlier in the project to avoid any issues with already existing initialized media.
- Verify the application does not render unauthorized resources or data.
- Verify that user account and checkout pages are only accessible through a secure SSL connection. Since 5.5.1, the default configuration in the accelerators is that all pages are secured. Review the spring security configuration to make sure the correct access rules are defined.
- Change the default symmetric key primary password during the initial stages of the project. Transparent Attribute Encryption (TAE) encrypts all user's hashed passwords since SAP Commerce 5.7, and there is no way to rotate the primary password. Therefore, you can just rotate the keys. It is important you change the primary password and generate a non-default key before importing or creating users.
- Disable or tighten-up Cross-Origin Resource Sharing: Cross-Origin Resource Sharing.
- Verify that HTTP Only attribute is set on cookies.
- Secure your SAP Commerce installation against the OWASP Top 10 of Vulnerabilities (http://www.owasp.org).
- Disabling Anonymous login in Core.
- By default, in advanced.properties we have login.anonymous.always.disabled=true. This property ensures that User.isLoginDisabled will always return true for the anonymous customer, and User.checkPassword will always return false for the anonymous customer. Ensure this property has not been overridden in local.properties or project properties files.
- Secure the access to the cockpits:
- Make sure only employees in employeegroup can access the cockpits, and customers in customergroup can access the storefront. The cockpit spring configuration enforces this since v5.0 through the defaultCorePreAuthenticationChecks bean, defined in cockpit-spring-security.xml.
- Lock down any folder or directory in which non-public pages and files are stored. Please see Secure Media Access.
Lock the initialization button in HAC, and then prevent unlocking by setting: system.unlocking.disabled=true in local.properties. This will prevent users from inadvertently initializing or updating your system.
Please note that there is one lock for both initialization and update. Learn how to disable this option at runtime in order to be able to do system updates.
- Review installed extensions and make sure unnecessary and risky extensions are disabled (such as VJDBC). Additionally, verify all web module loaded (HAC->Platform->Extensions, sort by webroot column descending) are required to be there. Depending on how you created the extensions, you may have ended up with web module for an extension that you did not intend to have. Adjust extensioninfo.xml to remove the web module entry for extensions that do not require a web extension.
- Secure JMX connections. Define the correct password and access in /config/tomcat/conf/jmxremote.password
and config/tomcat/conf/jmxremote.access, and adjust authentication options in
- Verify that the SAP Commerce application server is not running as root. Typically, init scripts should take care of this. Alternatively, the tanuki wrapper.sh file uses an environment variable RUN_AS_USER. You could export this variable prior to starting the SAP Commerce server if you want to use the wrapper functionality to switch to the correct user.
- Secure Backoffice extensions from internet access by implementing proxy rules on the web server or load
balancer layer. Test these rules against host header spoofing. Make sure you identify all webroots in Tomcat
by checking the web extensions in HAC→platform/extensions.
Do not attempt to remove Backoffice extensions from local extensions.xml or by using the myextensionname.webroot=<disabled>. You should have the same local extensions.xml loaded on all nodes, and the <disabled> feature will cause the HAC to not work. You need HAC to verify granular cache metrics and other diagnostics features, such as clustering verification, that are not currently available by JMX.
Ensure that you have a qualified DBA on staff that will proactively monitor and tune the SAP Commerce database. The DBA should be taking part in the performance testing activities, and should identify any possible improvement throughout the project. For information related how to configure these third-party databases with SAP Commerce, please see the following link: Third-Party Databases.
The SAP Commerce Administration Console provides some basic tools to measure database is connected and working under /monitoring/performance path. You should run these tests against your production database:
The SQLMax test available in the hAC is intended to verify that the database is working, but it should not be used to confirm real-world performance. For most row-based databases the test is a good indicator of the connection between the app server and database layers. A SQL Max test will not account for additional tuning that is done in SAP Commerce. In the specific case of HANA databases, the driver has been tuned to leverage prepared statement caching with SAP Commerce. This leads to significant performance improvements that you would not see with a plain SQL Max test. An in-memory database will always perform significantly better on reads and since the SQL Max test is only focused on writes the expectation you may see different results than what you will get in the real world.
- The SQL test should yield below a 0.7 ms average execution time.
- SQL maximum should be:
- Under 1 second for “Time to add 10000 rows”.
- Under 50 seconds for “Time to add 10000 rows using max() queries.
- Under 2 seconds for “Time to add 10000 rows using max() queries and index”.
Please note that having an active-active database cluster between 2 different data centers is not a supported configuration. This is due to the following reasons:
- From a cluster perspective, there is no guarantee that cache invalidation messages are arriving after the data. Replication is complete if database replication is asynchronous.
- Latency is a concern if, for example, database replication uses synchronous commits.
SAP Commerce does not usually provide database expertise. However, here are some items that we think are important to mention:
- Traditionally, it was best practice to run the database on physical servers and not on virtual machines. However, there are instances of Oracle databases performing well on VMware when implemented correctly. It is still recommended to use Iron as the first choice for the database. However, if you insist on using VMWare, please see these links for additional information on running Oracle on VMWare:
- SAP Commerce auto-generates tables and indexes specified in extensions’ xxxx-items.xml. Additional indexes may be required.
- Indexes and tablespaces still require inspection by a qualified DBA.
- Choose a CI (case insensitive) collate when creating a database for SAP Commerce.
- Verify that the database can handle all of the connections that the application could create to it. By verifying the size of the connection pool on each application server, you can determine the maximum number of connections that the database should be able to support.
- Usually, you do not need to modify the default SAP Commerce settings for the connection pools unless your
application server runs on more than 8 cores. If you reach the limit (90 concurrent connections in use per
application server), it is usually a problem in the application code. Some common issues with connections
not immediately released to the pool could be:
- Long-running queries due to missing indexes.
- Older SAP Commerce versions that use an older version of the Apache commons libraries for the connection pool.
- Missing uniqueness on the pk attribute (initialization in certain SAP Commerce versions missed adding the uniqueness on the pk columns). Future upgrades will not fix this issue. It has to be fixed by a DDL script created by a DBA.
- Application code generating queries that are too complex due to bad data modeling or bad code.
- Using JNDI can cause problems with application servers not being able to recover during short network connectivity loss to the database. It is recommended to use a SAP Commerce default data source.
- Make sure that the production database account used by the SAP Commerce application servers does not have a password expiration, and has a strong, hard to guess password.
- Database backups should be scheduled nightly, according to database vendor best practices.
- Do not manipulate data using SQL. You should always use SAP Commerce utilities as a way of manipulating data (ImpEx, Flexible Search, purge data trough maintenance strategies provided by SAP Commerce).
- Perform a database restore exercise before launch, and document the process.
- Depending on the database solution you chose, there may be different DR recovery scenarios for databases.
In case of a disaster, you will need to define policies and procedures around recovery. Here are a few
disaster scenarios you may need to consider:
- RAC node failure (this should be handled transparently).
- Database storage failure. With RAC the storage is a single point of failure. Additional options can be to have a combination of RAC and Dataguard, where you have a standby database available that will switch over in case of a disaster.
- Some customers choose to have a standby database along with a complete SAP Commerce standby environment in a separate data center. When disaster strikes, they will simply power up the standby web and application servers, alter the database to be the primary, and switch the DNS over to the standby datacenter.
- Ensure that there are database-level alerts in place to warn of problems (like blocking sessions).
- Make sure to clean up any remaining test data. For instance, test users/orders are often created during load testing.
- Ensure that the database(s) are supported per SAP Commerce supported environments. Ensure that the JDBC drivers are supported per SAP Commerce supported environments. Manually open the manifest files of the JDBC driver JAR file to verify.
- Check that the tablespaces are set to auto extent, or place alerts in place when the tablespace size gets too close to a certain threshold.
- Be sure that if there is a firewall between the database and the application servers, and if the database is inspecting SQL*NET traffic, that the firewall can keep up with it. During periods of heavy site traffic, the firewall CPU can be maxed out due to all of the SQL*NET traffic it has to inspect.
- Verify that NIC saturation is not too high. You could use nicstat http://www.brendangregg.com/K9Toolkit/nicstat.
- Check for excessive packet collisions (netstat -i 1). If output collision counts / output packets is > 10%, there is a problem.
- GB networks is recommended. The best solution would be to use 10 GB networks for SAP Commerce
Ensuring that your infrastructure is ready to support an SAP Commerce solution does require the assistance of many members in your team. Some of these configurations/checks can take a while to implement and test. Therefore, do not leave this to the last minute.