April 12, 2020
Estimated Post Reading Time ~

Reverse Replication woes

So, in my previous post, I said how wonderful FP37434 is (the replication stabilization FP). Unfortunately, it did not solve our problem and we now have a large volume of content to reverse replicate (~50k nodes in /var/replication/outbox across all our publish servers).

We are currently facing 2 problems. When the RR agent polls, the publish server with FP37434 exhibits a huge native memory leak (approx 8GB of native memory is being claimed) causing a great deal of paging on the system.

When we batch this down to only 10 items in the outbox, we noticed that the author takes 30 minutes to process 10 nodes.

Adding extra logging (com.day.cq.replication.content.durbo) at DEBUG level shows that the Author is doing valid work for 30 minutes processing just 10 nodes from the outbox.

It turns out that when a node is added to /content/usergenerated/path/to/something then CQ appears to be adding all of the pre-existing sibling nodes in the newly created node under /var/replication/outbox. You can see this by analyzing nodes inside the outbox. This is why 10 nodes take 30 minutes for the author to process - because it's actually unpacking 10000 nodes.

This probably also explains why our CQ author is performing slowly.


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.