April 1, 2020
Estimated Post Reading Time ~

How to Efficiently Copy Large Amounts of Content Between CQ Repositories

There are multiple options available in CQ to move content from one CQ instance to another. These include “replication” (author->publish), and create package->export->install elsewhere.

The File Vault (vlt) tool has a Remote Copy (rcp) option as well - one that is especially useful if you are moving GB or TB of digital assets (JCR node type dam:Asset) from DEV to STAGING to PRODUCTION. Run all renditions workflows once on a very powerful DEV machine with large numbers of CPU cores and high throughput local storage.

You don’t need to open any ports on the firewall - all traffic is HTTP(S).

Once done, perform vlt rcp to STAGING and PRODUCTION environments after turning off renditions workflows on those. You can do this by navigating to the Workflow Console at /libs/cq/workflow/content/console.html and editing each workflow model whose name starts with “DAM” (Launcher tab) - set the ‘Activate’ radiobutton to “Disable” and Save. See below:



Since it streams data between online repositories, it does not use the Durbo packaging used by replication. Tests by Adobe Performance Architect Gardner Buchanan shows that this is more storage efficient and avoids storage bloat (2:1 in the case of replication).

Gardner also recommends running multiple instances of vlt rcp against separate source tree structures to parallelize the whole operation. To avoid unnecessary network traffic, run vlt rcp on one of the participant instances, not on a remote, third instance.

Also, tests indicate that the default batch size of 1000 should be reduced to 100 for better throughput.

Assuming that vlt is set up and configured, the following command will copy a large content tree at /content/dam/JJK-Folder-1 on one CQ “author” instance to another “author” instance. In this case, both are running on the local machine but they can be remote. Also, both don’t have to be in the same run mode. Content can be remote copied from an “author” instance to a “publish instance.


vlt rcp -b 100 -r -u -n http://admin:admin@localhost:4502/crx/-/jcr:root/content/dam/JJK-Folder-1 http://admin:admin@localhost:4503/crx/-/jcr:root/content/dam/JJK-Folder-1

A test with 1,000 (1 MB) 1680 x 1050 JPG images copied 20,304 (dam:Asset) nodes (2,690,706,251 bytes) in 573,117 milliseconds - a throughput of 36 JCR nodes/second or 16 GB/hr.

In another test, I copied 44,802 (cq:Page) nodes (5,017,126 bytes) in 461,925 ms - that is a throughput of 97 JCR nodes/second or 37 MB/hr.

The process is differential, meaning only changed nodes are actually copied. However, each and every source JCR node needs to be checked against each and every destination JCR node.

Amazon EC2 Cloud

VLT rcp can be used to keep two or more CQ instances in synch (across Amazon AWS "regions” (data centers) in N. Virginia and N. California, for example). A shell script (like this) can be run every hour, for example.

N. Virginia -> N. California

2,000 (556 KB) PDFs and their associated renditions (29,000 JCR nodes total) under /content/dam were copied in 30.3 minutes at a throughput rate of 2.5 GB/hour (16 JCR nodes/second).

An additional 2,000 (1 MB) JPG images and their associated renditions were then added to /content/dam. /content/dam was then VLT rcp’d again. This time, 34,894 JCR nodes were reported copied in 36.6 minutes at a throughput rate of 6.2 GB/hour (16 JCR nodes/second). In other words, the transfer efficiency of VLT rcp improves when the assets get bigger.

The differential check (no content had to be copied) took about 35 minutes. /content/dam had the usual Geometrixx content + 2,000 PDFs, 2,000 JPGs and their related renditions for a total of 63,991 JCR nodes and 330,645 properties (5 GB). The throughput was 31 JCR nodes/second.

Also, for the cq:Page JCR node type, 44,804 nodes were copied in 20 minutes at a throughput rate of 14.5 MB/hour (38 JCR nodes/second). The differential check (no content had to be copied) took about 3 minutes (275 JCR nodes/second).

N. Virginia -> Singapore

/content/dam with the usual Geometrixx content + 2,000 PDFs, 2,000 JPGs and their related renditions for a total of 64,010 JCR nodes and 330,732 properties (5 GB) were VLT rcp’d in about 88 minutes. The throughput was 3.5 GB/hour (12 JCR nodes/second).

The differential check (no content had to be copied) took about 74 minutes.


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.