Running Data Store Garbage Collection
There are three ways of running data store garbage collection, depending on the data store setup on which AEM is running:
- Via Revision Cleanup - a garbage collection mechanism usually used for node store cleanup.
- Via Data Store Garbage Collection - a garbage collection mechanism specific for external data stores, available on the Operations Dashboard.
- Via the JMX Console.
If TarMK is being used as both the node store and data store, then Revision Cleanup can be used for garbage collection of both node store and data store.
However if an external data store is configured such as File System Data Store, then data store garbage collection must be explicitly triggered separate from Revision Cleanup.
Data store garbage collection can be triggered either via the Operations Dashboard or the JMX Console.
The below table shows the data store garbage collection type that needs to be used for all the supported data store deployments in AEM 6:
The built-in Weekly Maintenance Window, available via the Operations Dashboard, contains a built-in task to trigger the Data Store Garbage Collection at 1 am on Sundays.
If you need to run data store garbage collection outside of this time, it can be triggered manually via the Operations Dashboard.
Before running data store garbage collection you should check that no backups are running at the time.
Note:
The Data Store Garbage Collection task will only be visible if you have configured an external file data store. See Configuring node stores and data stores in AEM 6 for information on how to set up a file data store.
This section is about manually running data store garbage collection via the JMX Console. If your installation is set up without an external data store, then this does not apply to your installation. Instead see the instructions on how to run Revision cleanup under Maintaining the Repository.
Note:
If you are running TarMK with an external data store, it is required you run Revision Cleanup first in order for garbage collection to be effective.
Note:
The data store garbage collection will not collect files that have been deleted in the last 24 hours.
Note:
The data store garbage colleciton task will only start if you have configured an external file data store. If an external file data store has not been configured, the task will return the message Cannot perform operation: no service of type BlobGCMBean found after invoking. See Configuring node stores and data stores in AEM 6 for information on how to set up a file data store.
If possible, data store garbage collection should be run when there is little load on the system, for example in the morning.
The built-in Weekly Maintenance Window, available via the Operations Dashboard, contains a built-in task to trigger the Data Store Garbage Collection at 1 am on Sundays. You should also check that no backups are running at this time. The start of the maintenance window can be customized via the dashboard as necessary.
Note:
The reason not to run it concurrently is so that old (and unused) data store files are also backed up, so that if it is required to roll back to an old revision, the binaries are still there in the backup.
If you don't wish to run data store garbage collection with the Weekly Maintenance Window in the Operations Dashboard, it can also be automated using the wget or curl HTTP clients. The following is an example of how to automate backup by using curl:
Caution:
In the following example curl commands various parameters might need to be configured for your instance; for example, the hostname (localhost), port (4502), admin password (xyz) and various parameters for the actual data store garbage collection.
1
| curl -u admin:admin -X POST --data markOnly= true http: //localhost :4503 /system/console/jmx/org .apache.jackrabbit.oak "%" 3Aname "%" 3Drepository+manager "%" 2Ctype "%" 3DRepositoryManagement /op/startDataStoreGC/boolean |