Data Maintenance and Cleanup
Like any complex software system, SAP Commerce Cloud generates transactional and temporary data. In this article, we outline ways for you to configure data retention and cleanup rules to ensure your data is properly removed and to eliminate any performance impact.
Table of Contents
- Data Maintenance and Cleanup in Custom Code
- Personal Data Retention in Custom Code
- Data Maintenance Setup
- One-time Clean Up
Data Maintenance and Cleanup in Custom Code
Let's start with a recommendation for your custom extensions:
For example, assume you have a custom job that requires the use of a temporary media item to generate result files:
If you use the approach outlined above, you will accumulate unused media that:
- Increase the size of the media table
- Increase the number of files/blobs in your media storage
Both types of behavior will decrease the performance of your system over time. Instead of leaving the temporary media items in the system, remove them after you are done with the processing:
The above pattern can be applied to any temporary resource, like temporary files on the file system or other items in SAP Commerce Cloud.
Personal Data Retention in Custom Code
Another topic to consider is the data retention of personal information to comply with data protection regulations, such as the General Data Protection Regulation (GDPR) of the European Union.
If you introduce new custom item types that contain personal data that is linked to a customer and if the data retention period is over or the customer wishes to delete his or her personal data, you need to ensure you clean up the data properly.
SAP Commerce Cloud provides hooks to add custom cleanup logic to the personal data retention framework:
For details how to implement and configure such a hook, refer to Personal Data Erasure, as well as Deletion. These articles provide an overview of all data retention rules available out-of-the-box. For more on data protection, see the article Data Protection in SAP Commerce Cloud.
Data Maintenance Setup
Now that we have covered recommendations regarding custom code, let's have a look at the platform and how you should configure data maintenance and cleanup for SAP Commerce Cloud.
This will be split into three parts:
- Generic Audit - generated by the Generic Audit feature of SAP Commerce Cloud
- Technical Data - generated by jobs, and more
- Transactional Data - generated by business logic or by your storefront
The platform provides two ways to clean up unused data:
- Maintenance Framework - the old way, requires implementation effort
- Data Retention Framework - new and improved, covers most use cases just with configuration
For the rest of this article, the focus will be on the Data Retention Framework, unless otherwise stated.
The Generic Audit feature was introduced in version 6.6 and is enabled by default.
This feature tracks every change to a type and stores a before and after snapshot of the data in an audit table for the type. It can generate a lot of data very quickly. However, this will degrade the performance of your database and solution.
The default configuration enables the Generic Audit feature for a wide range of types. Therefore, we recommend that you review the default settings carefully and consider disabling any type you don't need (for example, requirements for audit logging on
You can enable/disable the auditing of a type in your properties. For example, if you wish to disable audit of changes to Product types, you can set the following in your local*.properties file(s):
Completely disable audit logging for local development (and maybe also on your Continuous Integration server) to speed up platform initialization and test run time by setting the following in your local.properties and local-dev.properties files:
You should also consider using the Change Log Filtering feature introduced in 1905 (and back-ported to 220.127.116.11 / 18.104.22.168 / 1808.9 / 1811.5)
This feature allows you to conditionally include or exclude data from audit logging to reduce the amount of data generated and stored in the audit logging tables.
This section covers the technical data that the platform accumulates over time and how to properly clean it up.
The platform ships with the cleanup capabilities. However, some of these capabilities require additional configuration to enable. The usual areas of consideration for this category are:
- Cronjob Logs
- Cronjob History
- ImpEx Media
- Saved Values
- Stored HTTP Sessions
- Distributed ImpEx
All sample configuration in this section is provided on best-effort basis.
Make sure to verify and to adapt it to your project!
Over time, many cronjob instances will accumulate in your SAP Commerce Cloud database.
The most frequent jobs are:
- ImpEx Imports/Exports
- Catalog Synchronization
- Solr Jobs
To clean those up, you can easily configure a retention job with an ImpEx script like the following:
A few notes regarding the above configuration:
- It aggressively cleans up all jobs older than two weeks, regardless of the cronjob result
code'where' clause restricts it to auto-generated jobs
- It only targets unscheduled cronjobs (= cronjobs without a trigger)
- Cleaning up is done once per day, at midnight
The good part about using retention rules is that they are easily configurable, as shown in the example above.
An alternative way for cleaning up cronjobs would be the CleanupCronJobStrategy for the legacy Maintenance Framework. However, that strategy requires customization if you want to change which cronjobs it processes.
To actually clean up old cronjob log files as described in CronJob Logs Clean-up, ensure that you configure a cronjob and a trigger to delete the logs.
The platform does not clean up old log files out-of-the-box!
The following is a sample ImpEx script which can be used to generate and run a cleanup job:
If you have cronjobs that run very frequently (for example, every few minutes), you should schedule the log file cleanup even more frequently. Running the cleanup more frequently avoids building up too many log files that need to be deleted.
SAP Commerce Cloud uses Cronjob Histories to track the progress of cronjobs. Similar to cronjob logs, they can accumulate quickly for frequently running jobs. Unlike logs, there is no cleanup for them available out-of-the-box. This is why a new retention rule is necessary, such as the one defined in the following ImpEx script:
Every ImpEx import or export generates at least one ImpexMedia. These media stay in the system, the platform does not delete them when it deletes the ImpEx jobs they belonged to (ImpEx media can potentially be re-used for other ImpEx jobs, but that's rarely ever the case). To set up a retention job for Media, use the following sample ImpEx script:
The retention time should be the same as for the cronjob cleanup.
The Backoffice uses Saved Values to track item changes by business users (the "Last Changes" in the Administration tab). We recommend to keep as few history entries as possible. You can configure the number of entries per item through a property:
If your project runs for a long time and/or was upgraded from multiple previous releases, you most likely have accumulated millions of
SavedValueEntry records in the database. Additionally, each of those entries also generate multiple rows in the
props table which slows down the overall system even further.
If you have access to the database, to quickly delete all entries and free up considerable space in the database, run the following SQL script directly in the database while SAP Commerce Cloud is offline:
Stored HTTP Sessions
Out-of-the-box, the HTTP Session Failover mechanism of the platform stores the sessions in the database. To avoid performance degradation, clean them up as soon as they are stale by using something like the following ImpEx script:
The configuration above uses a conservative retention period of one day. Depending on the traffic on your site, you may want to clean them up even more aggressively.
If you use Workflows to coordinate the work between business users, make sure to think about proper retention rules for those too. Workflows are highly specific to your project, which is why we don't provide a one-size-fits-all solution to clean them up. Some items to consider include:
- How long do your processes usually take? When is a process considered abandoned if it is not finished?
- How long do you need to keep finished workflows? Do you need them to audit changes?
- Do you use Comments in your workflow? How long do you need to keep them?
Based on the answers to these questions, you can set up retention rules that fit your workflows.
ImpEx Distributed Mode
If your project uses ImpEx Distributed Mode to distribute workloads across the cluster, you may want to consider setting up retention rules for the following types:
Now that we have covered most of the data generated by technical processes of SAP Commerce Cloud, let's look at the data that is generated by your customers when they interact with the storefront. For this kind of data, you also need to consider the regulatory requirements around how long it needs to be stored or when it needs to be deleted. For example, GDPR includes the "right to be forgotten" and you are required to delete any data you have for a person if requested to do so.
SAP Commerce Cloud covers this for Customers and Orders. See Personal Data Erasure for more details.
If your project began before these jobs were available out-of-the-box or if you don't import project data during the update process, you may need to import the jobs described in the link above into your system.
This leaves us with a few types in the system for which additional configuration is necessary:
- Business Processes
For carts, there is a cleanup cronjob available, see Removing Old Carts with Cronjob. To enable the cleanup for your site, you need to modify the job and add your BaseSite to the configuration:
The cronjob is provided by
ycommercewebservices extension. Make sure it is included in your configuration, if you want to use it. Alternatively, you can always configure your own retention rules (one for anonymous carts, one for the carts of registered users).
Most of the operations users do in the Accelerator trigger Business Processes (for example, reset password, place an order, order fulfillment and more). Those processes obviously accumulate over time and need to be removed regularly.
This configuration cleans up all succeeded processes older than two weeks. If you have a lot of processes in other states (for example, FAILED or ERROR), you may want to configure a second retention rule for those with a longer retention period. In general, you want to keep the errors around longer for analysis.
If you have customized the business processes, make sure to cleanup any additional data related to them. An out-of-the-box example for this are
EmailMessages. Those get automatically cleaned up at the end of a business process as long as they were successfully sent. Conversely, they remain in the database if they were not successfully sent.
One-time Clean Up
We have now covered most of the periodic cleanup necessary to ensure the performance of your solution remains high and doesn't degrade over time. However, is it possible to run a one-time cleanup of the system, for example, before a migration to the cloud?
It doesn't make sense to configure additional cronjobs for this tasks. To delete data you can:
- Execute SQL statements directly.
- Execute scripts in the administration console.
- Generate ImpEx scripts to remove items.
SQL statements are generally the fastest option but also the most dangerous one. They are executed outside the type system and therefore none of the automated cleanup, delete interceptors, validation and others are performed. Use with caution!
Scripts provide the maximum freedom and cleanup logic, however, they also have two disadvantages:
- Every script executed in the administration console runs inside a database transaction by default. Deleting a lot of data may fail because of this.
- You need to implement multi-threading if you want to speed up the deletion process.
That's why generating ImpEx scripts to delete data is usually faster then running a cleanup script:
- ImpEx is multi-threaded by default.
- You don't need to worry about transactions.
Here is an example skeleton to generate a cleanup ImpEx script through ImpEx Export:
You can re-use the Flexible Search queries provided for the retention rules. You only need to replace the
CALC_RETIREMENT_TIME query parameter with a date calculation specific to your database.
Perform the following steps to use this script:
- Export the items to remove. (Go to HAC: Console -> ImpEx Export)
- Open Backoffice, System -> Tools -> Import
- "Upload" zip file generated in step 1, click "Create"
importscript.impexas "Import file in ZIP"
- Make sure that "Allow code execution from within the file" is checked
You can ignore any ImpEx errors when deleting the data.
This article covered various aspects regarding regular data maintenance and cleanup.
While the topic itself isn't the most glamorous thing to work on when delivering a project, setting it up correctly from the start ensures your SAP Commerce Cloud solution stays healthy and high-performing over time.
- Make sure you properly delete any temporary data in your custom code as soon as feasible.
- Set up proper retention rules for all the data generated by the platform and all business processes your implementation supports.