May 10, 2020
Estimated Post Reading Time ~

Link Checker in AEM

AEM External link checker:
AEM Link Checker is based on an event handler and gets triggered on creates and updates for /content and its child nodes. All content under the selected root path is parsed and links are validated. All the validation of links is done asynchronously in the background and the HTML is updated based on verification results.

Note: If you are having a huge repository (/content), that includes frequent updation of links. Then it is not advised to use a link checker due to performance issues. As it gets triggered periodically and traverses the whole repository for validating links. This may cause slowness in your author's instance.

Now let's see how aem link checker works:-
As soon as the author saves any link on the page, either using rte or any custom component. Link checker eventHandler gets triggered.

Link checker event Handler traverse /content node and checks for new/updated links, once found it will store that mapping under /var/linkchecker cache folder.

Then control goes to Day CQ Link Checker Service, It checks for the scheduler.period configuration. Once scheduler time is met, it triggers the scheduler to validate the syntax and structure of the link against all the given configuration like the special prefix that it has to ignore during validation and the patter that the link check should use to verify the syntax of the url.

Once the syntax is validated the results are then pushed to /etc/linkchecker.html. But the links will remain in a pending state until Day CQ Link Checker Task scheduler validated these links by making an ajax GET call. AEM link checker scheduler Day CQ Link Checker Task runs periodically to check the validity of valid and in valid links that are store under /etc/linkchecker.html.

The administrator user can configure the frequency on which he wants to run this scheduler by updating Scheduler Period property its default value is 3600 sec. Once triggered it will remove all the invalid or unreachable links from /etc/linkchecker.html(http://localhost:4502/etc/linkchecker.html).

AEM Link checker is configured using below four services:-
  • Day CQ Link Checker Info Storage Service – configures the link cache size. default is 500.
  • Day CQ Link Checker Service – Configure the frequency of background check, the default interval is 5 seconds
  • Day CQ Link Checker Task – Configure the frequency of background check for validating links.
  • Day CQ Link Checker Transformer – config for all the elements that need to be transformed by the link checker and rewritten.
AEM internal link checker:- Internal Links are validated as soon as content author add any internal links (repository links ex: /content/we-retail/ca) on the page either using rte or any custom component. After validation, if url is no longer valid, then they are removed on the publisher or shown as broken links on the author.

Fixing broken links that link checker is not able to validate:-
Sometimes, you might run into a broken link situation means the link is not available on publish even though it is a valid link. This might be because aem link checker automatically checks links and will not publish a broken link. Sometimes it is good as you have a self-monitoring system that prevents you from publishing a broken link but what happens when you know that the link is correct even though aem is not able to publish it as it is considering it as broken, then it is a problem.

There are two types of links that link checker requires configuration for validating:- Links that have a special prefix (ex: href=”tel:123-123-1234″ or href=”*|something|*”). Links that after post-processing having query param, which you want to mark as always valid or skip validation.


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.