March 16, 2020
Estimated Post Reading Time ~

CQ Replication

What is Replication in CQ?
Replication is used for 3 main operations in CQ:
1. To move content from CQ Author to CQ Publish. This is called as Forward Replication or Activating content.
2. To explicitly flush the Dispatcher Cache
3. To synchronize user input (Ex: Blog comments) between Publish Instances. This is done via the concept of Reverse Replication.

What are Replication Agents?
- Replication Agents are used to performing Replication activities.
- Replication Agents are configured on Author and Publish.
- Generally, you configure the following Replication Agents on the Author:
* Forward Replication Agents
* Reverse Replication Agents
- You configure the following Replication Agent on the Publish:
* Dispatcher Flush Agent

What is the Default Agent?
By Default, a Default Forward replication agent is configured on the author instance. This will point to a Publish instance running on port 4503. If you are running the Publish instance on a different port you need to modify this on the Default Agent Configuration.
How do you access Replication Agents on an author instance?
Go to http://localhost:4502/miscadmin#/etc/replication and you can see all the Replication Agents configured on this CQ instance. (Assuming Author is running on 4502 port).

Tree Activation Overview:
Through Activation you can only activate one resource at a time or number of the
limited resources through multi-select. However, sometimes you have a situation
where more than one file or a resource tree needs to be activated (For example a
new section of the website is going live).
Using Tree Activation:
From site admin click on Tools -> Replication
Then double click on activate tree. You can also directly go there by HOST:
PORT/etc/replication/treeactivation.html

You can then select Start Path. All resources under that path and subpath will be activated.
If Possible does not select the path with a lot of children's in it. That might cause performance issues.
You can also various options for activation. Based on the option selected all pages will get activated.
Here is what various options mean
Tree Activation Options

For Resource Option:
Only Modified: only activate pages that have been modified.
Only Activated: only activate pages that have (already) been activated. Acts as a
form of reactivation.
Ignore Deactivated: ignore any pages which have been deactivated.

For Activation:
Select Dry Run if you want to check which pages would be activated. This is only
an emulation; no pages will be activated.
Select Activate if you want to activate the pages.


Curl command to do Tree Activation:
Curl Tree Activation CQ
curl -u admin:admin -F cmd=activate -F ignoredeactivated=true -F onlymodified=
true -F path=/content/geometrixx http://HOST:PORT/etc/replication/treeactivation.html
Important Resource:
https://dev.day.com/docs/en/cq/current/wcm/page_publish/tree_activate.html

Configure Multiple Agents
Overview:
Usually, there are multiple publish instances in CQ set up. CQ OOTB comes with only one replication agent set up.
Set up multiple publish agents:

First, make sure that you have multiple publish instance set up. You can just use steps to start to publish instance with the different port numbers.
For example, suppose your second publish instance running on port 4506.
Here are steps to set up a second replication Agent.

Set up Second Replication Agent on author
The default Replication Agents are already set up for the publish running on the local machine on port
4503, for the second Publish instance it requires that we create a new Replication Agent
1. Log in the Site Administration of CQ5 on the author instance (port 4502 )
2. Open the Tools
3. Browse to the Replication > Agents on author in the tree panel
4. Create a Page using Replication Agent
5. Set the title with "Second Agent "
Set the Name with "publish2 "
Then click on Create
6. Shift +Double-click on the new agent "Second Agent " (this will open the configuration panel
in a new window or tab)
7. Click "Edit "
8. In Settings Panel
Check "Enabled "
Set Description with "Agent that replicates to the second publish instance" (optional)
Set Retry Delay with "60000 "
In the Transport Panel
Set URI with "http://localhost:4506/bin/receive "
9. Click OK to save the settings
After the second replication is set up, You can do the following to test set up.
Test Multiple Replication Agent
1. Go back to the CQ5 Site Administration and open the Websites panel
2. Browse to the Geometrixx Demo Site > English > Products in the tree panel
3. Create a new page, set Title with "Sample Product"
Click Create
4. Open the new page with a double-click
5. Add some text, image, any component you like on your page
6. In the floating Sidekick panel, Select the Page tab
Click Activate Page
Confirm the activation
Your page should now be accessible on both publish instances, simply check with the browser
Open the page http://localhost:4503/geometrixx/en/products/
sample_product.html
Open the page http://localhost:4506/geometrixx/en/products/
sample_product.html
You can also set up a replication chain but this is beyond the scope of this tutorial. More information about the replication chain can be obtained from here

Agent Configurations
Here is what each parameter mean in replication agents configuration:
1) Settings Tab

Setting Tab in Replication
Name
A unique name for the replication agent.
Description
A description of the purpose this replication agent will serve.
Enabled
Indicates whether the replication agent is currently enabled.
Serialization Type
The type of serialization:
• Default: Set if the agent is to be automatically selected.
• Dispatcher Flush: Select this if the agent is to be used for flushing the dispatcher cache.
Retry Delay
The delay (waiting time in milliseconds) between two retries, should a problem be encountered.
Default: 60000
Agent User Id
The agent will use this user account to collect and package the content from the author
environment.
Leave this field empty to use the system user account (the account defined in a sling as the
“administrator user”; by default, this is “admin ”).
Log Level
Specifies the level of detail to be used for log messages.
• Error - only errors will be logged
• Info - errors, warnings and other informational messages will be logged
• Debug - a high level of detail will be used in the messages, primarily for debug purposes
Default: Info
Use for reverse replication
Indicates whether this agent will be used for reverse replication; returns user input from the
publish to the author's environment.
2) Transport Tab
Transport Tab in Replication
URI
This specifies the receiving servlet at the target location. In particular, you can specify the
hostname (or alias) and context path to the target instance here.
For example:
• A Default Agent may replicate to http://localhost:4503/bin/receive
• A Dispatcher Flush agent may replicate to http://localhost:8000/dispatcher/
invalidate.cache
The protocol specified here (HTTP or HTTPS) will determine the transport method.
User
User name of the account to be used for accessing the target.
Password
Password for the account to be used for accessing the target.
NTLM Domain
Domain for NTML authentication.
NTLM Host
Host for NTML authentication.
3) Proxy Tab
Proxy Tab In Replication
The following settings are only needed if a proxy is needed.
Proxy Host
Hostname of the proxy used for transport.
Proxy Port
Port of the proxy.
Proxy User
User name of the account to be used.
Proxy Password
Password of the account to be used.
Proxy NTLM Domain
The proxy NTLM domain.
Proxy NTLM Host
The proxy NTLM host.
4) Extended Tab
Extended Tab In Replication
Interface
Socket interface to bind to.
HTTP Method
HTTP method to use.
HTTP Headers
These are used for Dispatcher Flush agents and specify elements that must be flushed.
{action} indicates a replication action; {path} indicates a path.
Connect Timeout
Timeout (in milliseconds) to be applied when trying to establish a connection.
5) Trigger Tab
Trigger Tab
These settings are used to define triggers for automated replication:
Ignore default
If checked, the agent is excluded from default replication; this means it will not be used if a content the author issues a replication action.
On Modification
Here a replication by this agent will be automatically triggered when a page is modified. This is mainly used for Dispatcher Flush agents, but also for reverse replication.
On-/Offtime reached
This will trigger automatic replication (to activate or deactivate a page as appropriate) when the
on times or offtimes defined for a page occur. This is primarily used for Dispatcher Flush agents.

Additional Details about these Tabs can be obtained from this document:
http://dev.day.com/docs/en/cq/current/deploying/replication.html

Reverse Replication Overview:
Reverse Replication is the process of replicating user data back to author instance from publish instance. This may include any user-related data, comments, form data, etc.
Architecture:
In reverse replication, Publish instance reverse replication agent puts user data in "outbox" (An repository location where data is temporarily held). Author instance has another agent matching agent which polls data from publishing at regular interval. If data is found in the outbox of publish instance, it is sync in author instance. This way synchronization of data between publishing to the author is controlled.

Configure Reverse Replication:
From site admin click on Tools
Click on Replication -> Agents on Author

Click on a new page -> select type as reverse replication (You can also copy and paste OOTB reverse replication agent to create a new one)

Provide information in the settings tab. Make sure that enabled is checked. Retry delay is a delay between two requests to outbox of publish instance.

Provide information in the transport tab. Transport URI is a replication URL of publish instance. Also "admin" uid and pwd of publish instance.

Once reverse replication agent is set go to the agent page and then click on Test Connection.
All publish instance comes with Outbox enabled, If there is any issue with Outbox you can activate publish outbox from author to publish.

Once reverse replication is set up properly. You will see following message in your error.log
Reverse Replication Error Log Message
03.06.2013 15:51:37.635 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse Sending message to localhost:4503
03.06.2013 15:51:37.635 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse >> GET /bin/receive?sling:authRequestLogin=1 HTTP/1.0
03.06.2013 15:51:37.635 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse >> Action: Internal Poll
03.06.2013 15:51:37.635 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse >> Path:
03.06.2013 15:51:37.635 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse >> Handle:
03.06.2013 15:51:37.635 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse --
03.06.2013 15:51:37.635 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse << HTTP/1.1 200 OK
03.06.2013 15:51:37.635 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse << Connection: Keep-Alive
03.06.2013 15:51:37.635 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse << Server: Day-Servlet-Engine/4.1.24
03.06.2013 15:51:37.635 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse << Content-Type: application/octet-stream
03.06.2013 15:51:37.636 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse << Content-Length: 32
03.06.2013 15:51:37.636 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse << Date: Mon, 03 Jun 2013 22:51:37 GMT
03.06.2013 15:51:37.636 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse Message sent.
03.06.2013 15:51:37.636 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse ------------------------------------------------
03.06.2013 15:51:37.636 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse Reverse replication successful.
03.06.2013 15:51:37.636 *INFO* [Reverse Replication Processor] com.day.cq.replication.Agent.publish_reverse Fetched 0 pages from http://localhost:4503/bin/receive?sling:authRequestLogin=1
Note that not all node type / Events is subjected reverse replication. You might have to do some custom setting to get your custom data reverse replicated (That is not in scope of this tutorial). More information about reverse replication can be obtained from here

Forward Replication
Forward Replication Concept:

- Replicating content from an author to publish instances (or dispatcher) is known as forwarding Replication.
- Replication agents to all publish instances to which content should be replicated to, should first be configured on the author instance(s).
- When content is activated (for example a page is activated), the content is placed in the queue of all forward replication agents configured.
- The replication agents then package the content and forward it to the appropriate Publish instance using the protocol configured on the replication agent (default is HTTP).
- A servlet in the Publish instance receives the request and it publishes the received content.
 
Examine the Default Agent: 
- The Default Agent is configured out of the box and is already set-up when CQ is installed and started for the first time.
- The Default Agent is the Forward Replication Agent from the author to a Publish Server and is configured with port 4503 (default port for Publish).
To examine the Default Agent:
- Navigate to http://localhost:4502/miscadmin#/etc/replication (assuming the author is running on port 4502) and select "Agents on author" in the hierarchy tree on the left.
- Double Click on Default Agent from the list to open it.

- You can see that the Replication request is being sent to "http://localhost:4503/bin/receive" - this is the servlet on the publish server that receives the replication request and publishes the content.
- Also, you can observe from the above image, that the Agent status is shown - enabled or disabled. And the Queue status is shown as well - in this picture it says Queue is blocked as the Publish instance running on 4503 is not running and the content in the queue can not be activated until the publish server is running. Hence the 'Queue is blocked' message.
- A few other elements to note here are:
- the 'View log' link which gives insight into the replication activity
- the 'Test Connection' link which can perform a check to see if the author-publish connection is intact or not.
- For any Replication Agent, the Replication Queue shows the content that is pending activation to the Publish Server. You can perform a few actions on the Replication Queue such as - Refresh the queue to see if there is any new content in the queue since the last view, Clear the queue to remove all content, Force Retry to retry activating the content if the content is not being activated and is lying in the queue and Pause to pause activating content to the Publish Server.

Examine the Settings:
- To examine settings of a replication agent, click on the "Edit" link. This pops-up the Replication Agent's Settings. The Agent Settings pop-up window has five tabs. Let us examine each of these:
Settings Tab:


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.