Data Store Garbage Collection AEM 6.x

Running Data Store Garbage Collection


There are three ways of running data store garbage collection, depending on the data store setup on which AEM is running:
  1. Via Revision Cleanup - a garbage collection mechanism usually used for node store cleanup.
  2. Via Data Store Garbage Collection - a garbage collection mechanism specific for external data stores, available on the Operations Dashboard.
  3. Via the JMX Console.
If TarMK is being used as both the node store and data store, then Revision Cleanup can be used for garbage collection of both node store and data store.
However if an external data store is configured such as File System Data Store, then data store garbage collection must be explicitly triggered separate from Revision Cleanup. 
Data store garbage collection can be triggered either via the Operations Dashboard or the JMX Console.
The below table shows the data store garbage collection type that needs to be used for all the supported data store deployments in AEM 6:

Node StoreData StoreGarbage Collection Mechanism
TarMKTarMKRevision Cleanup (binaries are in-lined with Segment Store)
TarMKExternal Filesystem
Data Store Garbage Collection task via Operations Dashboard
JMX Console
MongoDBMongoDB
Data Store Garbage Collection task via Operations Dashboard
JMX Console
MongoDBExternal Filesystem
Data Store Garbage Collection task via Operations Dashboard
JMX Console

Running Data Store Garbage Collection via the Operations Dashboard


The built-in Weekly Maintenance Window, available via the Operations Dashboard, contains a built-in task to trigger the Data Store Garbage Collection at 1 am on Sundays.
If you need to run data store garbage collection outside of this time, it can be triggered manually via the Operations Dashboard.
Before running data store garbage collection you should check that no backups are running at the time.


  1. Open the Operations Dashboard by Navigation -> Tools -> Operations -> Maintenance.


  2. Click or tap the Weekly Maintenance Window.

    chlimage_1


  3. Select the Data Store Garbage Collection task and then click or tap the Run icon.

    chlimage_1


  4. Data store garbage collection runs and its status is displayed in the dashboard.

    chlimage_1
Note:
The Data Store Garbage Collection task will only be visible if you have configured an external file data store. See Configuring node stores and data stores in AEM 6 for information on how to set up a file data store.

Running Data Store Garbage Collection via the JMX Console


This section is about manually running data store garbage collection via the JMX Console. If your installation is set up without an external data store, then this does not apply to your installation. Instead see the instructions on how to run Revision cleanup under Maintaining the Repository.
Note:
If you are running TarMK with an external data store, it is required you run Revision Cleanup first in order for garbage collection to be effective.

To run garbage collection:


  1. In the Apache Felix OSGi Management Console, highlight the Main tab and select JMX from the following menu.


  2. Next, search for and click the Repository Manager MBean (or go to http://host:port/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Drepository+manager%2Ctype%3DRepositoryManagement).


  3. Click startDataStoreGC(boolean markOnly).


  4. enter "true" for the markOnly parameter if required:

    OptionDescription
    boolean markOnlySet to true to only mark references and not sweep in the mark and sweep operation. This mode is to be used when the underlying BlobStore is shared between multiple different repositories. For all other cases set it to false to perform full garbage collection.


  5. Click Invoke. CRX runs the garbage collection and indicates when it has completed.
Note:
The data store garbage collection will not collect files that have been deleted in the last 24 hours.
Note:
The data store garbage colleciton task will only start if you have configured an external file data store. If an external file data store has not been configured, the task will return the message Cannot perform operation: no service of type BlobGCMBean found after invoking. See Configuring node stores and data stores in AEM 6 for information on how to set up a file data store.

Automating Data Store Garbage Collection


If possible, data store garbage collection should be run when there is little load on the system, for example in the morning.
The built-in Weekly Maintenance Window, available via the Operations Dashboard, contains a built-in task to trigger the Data Store Garbage Collection at 1 am on Sundays. You should also check that no backups are running at this time. The start of the maintenance window can be customized via the dashboard as necessary.
Note:
The reason not to run it concurrently is so that old (and unused) data store files are also backed up, so that if it is required to roll back to an old revision, the binaries are still there in the backup.

If you don't wish to run data store garbage collection with the Weekly Maintenance Window in the Operations Dashboard, it can also be automated using the wget or curl HTTP clients. The following is an example of how to automate backup by using curl:
Caution:
In the following example curl commands various parameters might need to be configured for your instance; for example, the hostname (localhost), port (4502), admin password (xyz) and various parameters for the actual data store garbage collection.

Here is an example curl command to invoke data store garbage colleciton via the command line:
1
curl -u admin:admin -X POST --data markOnly=true  http://localhost:4503/system/console/jmx/org.apache.jackrabbit.oak"%"3Aname"%"3Drepository+manager"%"2Ctype"%"3DRepositoryManagement/op/startDataStoreGC/boolean

Checking Data Store Consistency in AEM 6.4



The data store consistency check will report any data store binaries that are missing but are still referenced. To start a consistency check, follow these steps:
  1. Go to the JMX console. For information on how to use the JMX console, see this article.
  2. Search for the BlobGarbageCollection Mbean and click it.
  3. Click the checkConsistency() link.

          After the consistency check is complete, a message will show the number of binaries reported as missing. If the number is greater than 0, check the error.log for more details on the missing binaries. 
          Below you will find an example of how the missing binaries are reported in the logs:
          11:32:39.673 INFO [main] MarkSweepGarbageCollector.java:600
          Consistency check found [1] missing blobs

          11:32:39.673 WARN [main] MarkSweepGarbageCollector.java:602 Consistency check failure in the the blob store : DataStore backed BlobStore [org.apache.jackrabbit.oak.plugins.blob.datastore.OakFileDataStore], check missing candidates
          in file /tmp/gcworkdir-1467352959243/gccand-1467352959243

          Audit Log Maintenance in AEM 6

          AEM events that qualify for audit logging generate much archived data. This data can quickly grow over time due to replications, asset uploads and other system activities.
          The Audit Log Maintenance includes several parts of functionality that enables the ability to automate audit log maintenance under specific policies.
          It is implemented as a configurable weekly maintenance task and is accessible via the Operations Dashboard monitoring console.
          For more information, refer to the Operations Dashboard Documentation.
          There are three types of Audit Log Purge options:
          1. Page Audit Log Purging
          2. DAM Audit Log Purging
          3. Replication Audit Log Purging
          Each can be configured by creating rules in the AEM Web Console. After they have been configured, you can trigger them by going to Tools - Operations - Maintenance - Weekly Maintenance Window and running the AuditLog Maintenance Task.

          Configure Page Audit Log Purging

          Follow these steps in order to configure Audit Log Purging:
          1. Go to the Web Console Admin by pointing your browser to http://localhost:4502/system/console/configMgr/
          2. Search for an item called Pages audit Log Purge rule and click it.
            chlimage_1
          3. Next, configure the purge scheduler according to your requirements. The available options are:
            • Rule name: the name of the audit policy rule;
            • Content path: the path of the content the rule will apply to;
            • Minimum age: the time in days the audit logs need to be kept;
            • Audit log type: the type of audit log that should be purged.
            Note:
            The content path only applies to children of the /var/audit/com.day.cq.wcm.core.page node in the repository.
          4. Save the rule.
          5. The rule you just created needs to be exposed in the Operations Dashboard in order for it to be executed. In order to do this, go Tools - Operations - Maintenance from the AEM Welcome screen.
          6. Press the Weekly Maintenance Window card.
          7. You will find the maintenance task already present under the AuditLog Maintenance Task card.
            chlimage_1
          8. You can either inspect the date of the next execution, configure it, or manually executing it by pressing the play button.
          In AEM 6.3, if the scheduled maintenance window closes before the Audit Log Purge task can complete, the task stops automatically. It will resume when the next maintenance window opens.
          With AEM 6.4, you can manually stop a running Audit Log Purge Task by clicking the Stop icon. On the next execution the task will safely resume.
          Note:
          To stop the maintenance task means to suspend its execution without losing track of the job already in progress.

          Configure DAM Audit Log Purging

          1. Navigate to the System Console at http://serveraddress:serverport/system/console/configMgr
          2. Search for DAM audit Log Purge rule and click the result.
          3. In the next window, configure your rule accordingly. The options are:
            • Rule name: the name of the audit policy rule;
            • Content path: the path of the content the rule will apply to
            • Minimum age: the time in days the audit logs need to be kept
            • Audit Log Dam event types: the types of DAM audit events that should be purged.
          4. Click Save to save your configuration

          Configure Replication Audit Log Purging

          1. Navigate to the System Console at http://serveraddress:serverport/system/console/configMgr
          2. Search for Replication audit Log Purge Scheduler and click on the result
          3. In the next window, configure your rule accordingly. The options are:
            • Rule name: the name of the audit policy rule
            • Content path: the path of the content the rule will apply to
            • Minimum age: the time in days the audit logs need to be kept
            • Audit log Replication event types: the types of Replication audit events that should be purged
          4. Click Save to save your configuration.