Replications in AEM

Replication agents are central to Adobe Experience Manager (AEM) as the mechanism used to:

Publish (activate) content from an author to a publish environment.
Explicitly flush content from the Dispatcher cache.
Return user input (for example, form input) from the publish environment to the author environment (under control of the author environment).

Requests are queued to the appropriate agent for processing.
User data (users, user groups, and user profiles) are not replicated between author and publish instances.
For multiple publish instances, user data is Sling distributed when User Synchronisation is enabled.

Replicating from Author to Publish
Replication, to a publish instance or dispatcher, takes place in several steps:
the author requests that certain content be published (activated); this can be initiated by a manual request, or by automatic triggers which have been preconfigured.

The request is passed to the appropriate default replication agent; an environment can have several default agents which will always be selected for such actions.

The replication agent “packages” the content and places it in the replication queue.
in the Websites tab the colored status indicator is set for the individual pages.
the content is lifted from the queue and transported to the publish environment using the configured protocol; usually this is HTTP.

A servlet in the publish environment receives the request and publishes the received content; the default servlet is http://localhost:4503/bin/receive.
multiple author and publish environments can be configured.

Replicating from Publish to Author
Some features allow users to enter data on a publish instance.
In some cases, a type of replication known as reverse replication, is needed to return this data to the author environment from where it is redistributed to other publish environments. Due to security considerations, any traffic from the publish to the author environment must be strictly controlled.
Reverse replication uses an agent in the publish environment which references the author environment. This agent places the data into an outbox. This outbox is matched with replication listeners in the author environment. The listeners poll the outboxes to collect any data entered and then distribute it as necessary. This ensures that the author environment controls all traffic.

In other cases, such as for Communities features (for example, forums, blogs, comments, and reviews), the amount of user generated content (UGC) being entered in the publish environment is difficult to efficiently synchronize across AEM instances using replication.
AEM Communities never uses replication for UGC. Instead, the deployment for Communities requires a common store for UGC.

Replication – Out of the Box
To follow this example and use the default replication agents you need to Install AEM with:
the author environment on port 4502
the publish environment on port 4503

Enabled by default :
Agents on author : Default Agent (publish)
Effectively disabled by default (as of AEM 6.1) :
Agents on author : Reverse Replication Agent (publish_reverse)
Agents on publish : Reverse Replication (outbox)

Replication Agents – Out of the Box
The following agents are available in a standard AEM installation:
Default Agent
Used for replicating from author to publish.
Dispatcher Flush
This is used for managing the Dispatcher cache.
Reverse Replication
Used for replicating from publish to author. Reverse replication is not used for Communities features, such as forums, blogs, and comments. It is effectively disabled as the outbox is not enabled. Use of reverse replication would require custom configuration.
Static Agent
This is an “Agent that stores a static representation of a node into the filesystem.”.
For example with the default settings, content pages and dam assets are stored under /tmp, either as HTML or the appropriate asset format.
This was requested so that when the page is requested directly from the application server the content can be seen. This is a specialized agent and (probably) will not be required for most instances.

Replication Agents – Configuration Parameters
When configuring a replication agent from the Tools console, four tabs are available within the dialog:

Settings
Name : A unique name for the replication agent.
Description : A description of the purpose this replication agent will serve.
Enabled : Indicates whether the replication agent is currently enabled.
When the agent is enabled the queue will be shown as:
Active when items are being processed.
Idle when the queue is empty.
Blocked when items are in the queue, but cannot be processed; for example, when the receiving queue is disabled.

Serialisation Type : The type of serializations:
Default: Set if the agent is to be automatically selected.
Dispatcher Flush: Select this if the agent is to be used for flushing the dispatcher cache.

Retry Delay : The delay (waiting time in milliseconds) between two retries, should a problem be encountered.
Default: 60000
Agent User Id : Depending on the environment, the agent will use this user account to:
collect and package the content from the author environment
create and write the content on the publish environment
Leave this field empty to use the system user account (the account defined in sling as the administrator user; by default this is admin).

Caution:
For an agent on the author environment this account must have read access to all paths that you want to have replicated.

For an agent on the publish environment this account must have the create/write access required to replicate the content.

Note:
This can be used as a mechanism for selecting specific content for replication.

Log Level : Specifies the level of detail to be used for log messages.
Error: only errors will be logged
Info: errors, warnings and other informational messages will be logged
Debug: a high level of detail will be used in the messages, primarily for debug purposes
Default: Info

Use for reverse replication : Indicates whether this agent will be used for reverse replication; returns user input from the publish to author environment.

Alias update : Selecting this option enables alias or vanity path invalidation requests to Dispatcher.
Transport
URI
This specifies the receiving servlet at the target location. In particular, you can specify the hostname (or alias) and context path to the target instance here.
For example:
A Default Agent may replicate to http://localhost:4503/bin/receive
A Dispatcher Flush agent may replicate to http://localhost:8000/dispatcher/invalidate.cache
The protocol specified here (HTTP or HTTPS) will determine the transport method.
For Dispatcher Flush agents, the URI property is used only if you use path-based virtualhost entries to differentiate between farms, you use this field to target the farm to invalidate. For example, farm #1 has a virtual host of http://www.mysite.com/path1/* and farm #2 has a virtual host of http://www.mysite.com/path2/*. You can use a URL of /path1/invalidate.cache to target the first farm and /path2/invalidate.cache to target the second farm.

User
User name of the account to be used for accessing the target.

Password
Password for the account to be used for accessing the target.

NTLM Domain
Domain for NTML authentication.

NTLM Host
Host for NTML authentication.

Enable relaxed SSL
Enable if you want self-certified SSL certificates to be accepted.

Allow expired certs
Enable if you want expired SSL certificates to be accepted.

Proxy
The following settings are only needed if a proxy is needed:
Proxy Host
Hostname of the proxy used for transport.

Proxy Port
Port of the proxy.

Proxy User
User name of the account to be used.

Proxy Password
Password of the account to be used.

Proxy NTLM Domain
The proxy NTLM domain.

Proxy NTLM Host
The proxy NTLM domain.

Extended
Interface
Here you can define the socket interface to bind to.
This sets the local address to be used when creating connections. If this is not set, the default address will be used. This is useful for specifying the interface to use on multi-homed or clustered systems.

HTTP Method
The HTTP method to be used.
For a Dispatcher Flush agent this is nearly always GET and should not be changed (POST would be another possible value).

HTTP Headers
These are used for Dispatcher Flush agents and specify elements that must be flushed.
For a Dispatcher Flush agent the three standard entries should not need changing:
CQ-Action:{action}
CQ-Handle:{path}
CQ-Path:{path}
These are used, as appropriate, to indicate the action to be used when flushing the handle or path. The sub-parameters are dynamic:
{action} indicates a replication action
{path} indicates a path
They are substituted by the path/action relevant to the request and therefore do not need to be “hardcoded”:

Note:
If you have installed AEM in a context other than the recommended default context, then you will need to register the context in the HTTP Headers. For example:
CQ-Handle:/<yourContext>{path}

Close Connection
Enable to close the connection after each request.

Connect Timeout
Timeout (in milliseconds) to be applied when trying to establish a connection.

Socket Timeout
Timeout (in milliseconds) to be applied when waiting for traffic after a connection has been established.

Protocol Version
Version of the protocol; for example 1.0 for HTTP/1.0.
Triggers
These settings are used to define triggers for automated replication:
Ignore default
If checked, the agent is excluded from default replication; this means it will not be used if a content author issues a replication action.

On Modification
Here a replication by this agent will be automatically triggered when a page is modified. This is mainly used for Dispatcher Flush agents, but also for reverse replication.

On Distribute
If checked, the agent will automatically replicate any content that is marked for distribution when it is modified.

On-/Offtime reached
This will trigger automatic replication (to activate or deactivate a page as appropriate) when the ontimes or offtimes defined for a page occur. This is primarily used for Dispatcher Flush agents.

On Receive
If checked, the agent will chain replicate whenever receiving replication events.

No Status Update
When checked the agent will not force a replication status update.

No Versioning
When checked the agent will not force versioning of activated pages.

Configuring your Replication Agents
Controlling Access to Replication Agents
Access to the pages used to configure the replication agents can be controlled by using user and/or group page permissions on the etc/replication node.

Note:
Setting such permissions will not affect users replicating content (e.g. from the Websites console or sidekick option). The replication framework does not use the “user session” of the current user to access replication agents when replicating pages.

Caution:
Do not use the “Test Connection” link for the Reverse Replication Outbox on a publish instance.
If a replication test is performed for an Outbox queue, any items that are older than the test replication will be re-processed with every reverse replication.
If such items already exist in a queue, they can be found with the following XPath JCR query and should be removed.
/jcr:root/var/replication/outbox//*[@cq:repActionType=’TEST’]

How do I use reverse replication and what’s necessary to make sure that it works?
Out-of-the box, only cq:Page nodes are reverse replicated. For any other node, it’s necessary to use the two last methods, as a project-specific implementation.
There are three possibilities
Use the SlingPostServlet (that is, do not create any custom post servlets or POST.jsp to handle the incoming requests) so that it implicitly triggers a related PageEvent. Then set a property name “cq:distribute” and set its value to “true” on the nodes you want to reverse replicate.
To implement this solution, it’s unnecessary to write any code. You can use the Form component to set all the necessary hidden fields.
Use your own code that accesses the repository, modify the properties “cq:lastModified,” “cq:lastModifiedBy” and “cq:distribute.”
Posted data can be controlled, internal code writes the data.
To implement this solution, it’s necessary to write the code for your project.

Use your own code that calls the replicate method from Replicator service with options to use distribution mode.
Replication is controlled from your code.
To implement this solution, write the code specific for your project.

Use your own code to implement a reverse replication solution
Add the following code to fire the event related to the page you want to reverse replicate (the example below was extracted from sample PostDataServlet.java):

...

// set the page to hide in the navigation Node pageContainer = newCommentPage.getContentResource().adaptTo(Node.class); pageContainer.setProperty("cq:lastModified", Calendar.getInstance()); pageContainer.setProperty("cq:lastModifiedBy", session.getUserID()); pageContainer.setProperty("cq:distribute", true); ... session.save(); ...

Attached is an example using a component to render the form and display the previous post. For each post, it creates a subpage that contains a paragraph with text in it. By doing so, it ensures that each post can be managed separately (and avoids collision with posts that could be generated from other publish instances). The storage location is defined as a parameter in the component dialog (that is, /content/usergenerated/comments/form1, which you can create using a folder in the siteadmin).

On the author instance, you can define a workflow model that would be launched when a page is created below your comments page. Make sure that you clear the cq:distribute value in your workflow, if you reactivate the content on author to the publish, otherwise it goes in an endless loop !!!

On the publish instance, make sure that the user has sufficient rights to create content. If you test with anonymous, then change the rights accordingly using CRX Explorer for the given jcr path).

Note on replication
For replication to work properly then store data with the following rules:
(1) the replicated (root) node’s nodetype must extend nt:hierarchyNode
(2) all direct child nodes that are not nt:hierarchyNodes must be aggregated
(3) the subtrees of all nodes from (2), apart from nodetypes, must be aggregated
Adobe recommends to use the cq:Page (/jcr:content) as container for your data, as you can then easily manage it and use it with the user interface (siteadmin, and so on). You can use PageManager API to create the page.

Note:
Certain terms related to publishing can be confused:
Publish / Unpublish
These are the primary terms for the actions that make your content publicly available on your publish environment (or not).
Activate / Deactivate
These terms are synonymous with publish/unpublish.
Replicate / Replication
These are the technical terms describing the movement of data (e.g. page content, files, code, user comments) from one environment to another such as when publishing or reverse-replicating user comments.

Note:
If you do not have the required privileges for publishing a specific page:

A workflow will be triggered to notify the appropriate person of your request to publish.
This workflow may have been customized by your development team.
A message will be displayed briefly to notify you that the workflow was triggered.

Depending on your location, you can publish:

From the page editor
From the sites console

From Page Editor
Depending on whether the page has references that need publishing:
The page will be published directly if there are no references to be published.
If the page has references that need publishing, these will be listed in the Publish wizard, where you can either:

Specify which of the assets/tags/etc. you want to publish together with the page, then use Publish to complete the process.
Use Cancel to abort the action.

Note:
Publishing from the editor is a shallow publish, i.e. only the selected page/pages is/are published and any child pages are not.

From Sites Console
In the sites console there are two options for publishing:

Quick Publish
Manage Publication

Quick Publish
Quick Publish is for simple cases and publishes the selected page(s) immediately without any further interaction. Because of this, any non-published references will also be published automatically.
Note:
Quick Publish is a shallow publish, i.e. only the selected page/pages is/are published and any child pages are not.

Manage Publication
Manage Publication offers more options than Quick Publish, allowing for the inclusion of child pages, customization of the references, and starting any applicable workflows as well as offering the option to publish at a later date.

AEM Tutorials for Beginners

May 10, 2020
Estimated Post Reading Time ~