April 8, 2020
Estimated Post Reading Time ~

JCR Compare



Purpose
Quickly compares the contents of AEM instances. JCR Compare computes checksums for specified node types (via an aggregated checksum of that node’s descendants) across multiple AEM instances and then compares the checksum results to identify what node (and node sub-systems) are the same or different.

This is a useful tool for checking for content inconsistencies across AEM instances that would be otherwise difficult to find.
  • The checksum computation cost is a function of the amount of data being compared. Large amounts of data will be costly to collect and compare.
Usage
Setup
Ensure the following
  1. Install ACS AEM Commons 2.1.0+ on all AEM instances that require comparing.
  2. The browser running the JCR Checksum Compare WebUI has access to all the hosts being compared.
  3. The proper credentials are being used for each host configuration (ie. admin)
  4. The following servlet paths are not blocked (especially when accessing remote AEM instances via a host-name protected via Dispatcher). Note: Under NO circumstance should the following paths be publicly accessible.
    • /bin/acs-commons/jcr-compare.hashes.txt
    • /bin/acs-commons/jcr-compare.dump.json
Access the tool on the base AEM instance at http://localhost:4502/etc/acs-commons/jcr-compare.html /etc/acs-commons/jcr-compare.html

Configuration Tab
The Configuration Tab is defined as what AEM instances are compared, and what the comparison criteria are.

Hosts


The hosts are eligible for comparison.
The convenience hostname of localhost can be used to reference the AEM Instance (host and port) that the browser is interacting with, using the logged in user’s credentials.

Paths

A set of paths can be defined to include in the comparison.
Note: the larger the content trees to compare, the longer and more taxing the comparison will be.
Default
  • /content
Query

A query can be provided to generate the results to include. Note if both paths and a query is configured, a union of the results will be displayed.
Node types


Node types define what node times are eligible for representing a node sub-system that will have a checksum generated for it. Generally this is an aggregate node like a jcr:content node (Ex. cq:PageContent or cq:AssetContent)
Defaults
  • cq:PageContent
  • dam:AssetContent
  • cq:Tag
Node types to exclude

When computing the checksum for a candidate Node type (see above configuration), these node types will be excluded.
Defaults
  • rep:ACL
  • cq:meta
Node types to exclude


When computing the checksum for a candidate Node type (see above configuration) these properties are excluded from the checksum generation process. These are usually properties that are known to be divergent between AEM instances.
Defaults
  • jcr:mixinTypes
  • jcr:created
  • jcr:createdBy
  • jcr:lastModified
  • jcr:lastModifiedBy
  • cq:lastModified
  • cq:lastModifiedBy
  • cq:lastReplicated
  • cq:lastReplicatedBy
  • cq:lastReplicationAction
  • jcr:versionHistory
  • jcr:predecessors
  • jcr:baseVersion
Sorted multi-value properties

Properties defined in this list will have their order taken into account when comparing the nodes.
For example, if cq:tags is included in this list and
  • AEM Author has /content/my-page/jcr:content@cq:tags=[animals:dog,animals:cat]
  • AEM Publish has /content/my-page/jcr:content@cq:tags=[animals:cat,animals:dog]
This discrepancy in the order would cause JCR Compare to identify these are a delta.
Defaults
  • cq:tags
Content Comparison Tab


The Content Comparison Tab lets you compare 2 hosts (defined in the Configuration Tab) using the comparison configuration (also defined in the Content Configuration Tab).


Click on non-matching results to open up a node-system (JSON-based) content comparison and see exactly what the different in content is.

Comparison data collection can be a long running operation depending on the content trees/queries defined. Please ensure the configuration can be accommodated.

JSON dump

Basic functionality exists for downloading the raw JSON Dump for the comparison defined in the Configuration Tab.

This functionality currently does not support the authentication credentials defined with each Host.

JCR Compare via curl
To use the tool, do the following:
  • HTTP GET /bin/acs-commons/jcr-compare.hashes.txt?paths=/content/my-site where /content/my-site should be replaced with the path you want to generate hashes for. For example: curl -u admin:admin http://localhost:4502/bin/acs-commons/jcr-compare.hashes.txt?paths=/content/my-site > hashes_pub1.txt
  • Save the output to a file and sort the results using the sort command on a linux machine then diff them. 

  • For example: 
  • cat hashes_pub1.txt | sort > hashes_pub1_sorted.txt 
  • cat hashes_pub2.txt | sort > hashes_pub2_sorted.txt 
  • diff hashes_pub1_sorted.txt hashes_pub2_sorted.txt
Once you know which paths are different then you can go to each server you retrieved a diff against and use the HTTP GET /bin/acs-commons/jcr-compare.dump.json servlet to get a dump of the content. For example here is how you would request this for a particular node:

Then you can use a JSON diff tool to visualize the differences: http://tlrobinson.net/projects/javascript-fun/jsondiff



By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.