January 2, 2021
Estimated Post Reading Time ~

AEM Tar Compaction - Revision Cleanup (for AEM 6.4 and 6.5)

This is an updated Tar Compaction and Garbage Collection process based on AEM 6.4. This maintenance functionality is now called Revision Cleanup.

The AEM platform is based on Apache Jackrabbit Oak. Oak is an effort to implement a scalable and performant hierarchical content repository for use as the foundation of modern world-class web sites and other demanding content applications. It uses TarMK, a fast, small, and simple embedded hierarchical database engine serving as a persistence backend for the Java Content Repository (JCR). 

It implements multi-version concurrency control (MVCC) and stores all data in tar files in an append-only way. And because of this MVCC persistence of only appending, it is critical to perform scheduled maintenance activities in keeping a performant and efficient AEM system. The lack of these maintenance tasks causes instability, performance, and disk space issues and outages.

To reclaim disk space, improve performance, and to avoid uncontrolled repository growth, old revisions need to be cleaned up to free disk resources. This maintenance functionality is called Revision Cleanup and has been available as an offline routine since AEM 6.0 which was called Tar Compaction. With AEM 6.4 (and above) an online version of this functionality called Online Revision Cleanup was introduced. 

Compared to Offline Revision Cleanup where the AEM instance has to be shut down, Online Revision Cleanup can be run while the AEM instance is online. Online Revision Cleanup is turned on by default and it is the recommended way of performing revision cleanup.

The revision cleanup process consists of three phases:
  • Estimation - determines whether to run the next phase (compaction) or not based on how much garbage might be collected.
  • Compaction - segments and tar files are rewritten leaving out any unused content.
  • Clean up - removes the old segments including any garbage they may contain.
AEM 6.4 introduces two new modes for the compaction phase of the Online Revision Cleanup process:
  • The full compaction mode rewrites all the segments and tar files in the whole repository. The subsequent cleanup phase can thus remove the maximum amount of garbage across the repository. Since full compaction affects the whole repository it requires a considerable amount of system resources and time to complete.
  • The tail compaction mode rewrites only the most recent segments and tar files in the repository. The most recent segments and tar files are those that have been added since the last time either full or tail compaction ran. The subsequent cleanup phase can thus only remove the garbage contained in the recent past of the repository. Since tail compaction only affects a part of the repository it requires considerably less system resources and time to complete than full compaction.
Technical Steps:
Online Revision Cleanup is configured by default to automatically run once a day on both AEM Author and Publish instances. All that is required is to define the maintenance window on when to run the maintenance task. This is usually during a period with the least user activity.

1. In AEM, go to Tools - Operations - Maintenance (/libs/granite/operations/content/maintenance.html)


2. Click on the Configure Icon of the Daily Maintenance Window



3. Validate the configuration values (recurrence, start and end time) and click Save. Between 2am and 5am are the recommended times for Tail Compaction (Mon. to Saturday) and Full Compaction on Sunday from Adobe.

The default configuration runs tail compaction on weekdays and full compaction on Sundays.

Tail Compaction schedule configuration:


Full Compaction schedule configuration:


Monitoring:
  • To check if the Online Revision Cleanup has completed successfully by checking the logs:
For example, "TarMK GC #{}: compaction completed in {} ({} ms), after {} cycles" means the compaction step completed successfully unless preceded by the message "TarMK GC #{}: compaction gave up compacting concurrent commits after {} cycles", which means there was too much concurrent load. Correspondingly there is a message "TarMK GC #{}: cleanup completed in {} ({} ms" for the successful completion of the cleanup step.
  • By monitoring the Maintenance Task RevisionCleanupTask MBean JMX Object


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.