Showing posts with label link checker. Show all posts
Showing posts with label link checker. Show all posts

January 4, 2021
Estimated Post Reading Time ~

How to disable link checker for selective links

Several times in a project, there is a requirement that the OOTB link checking in AEM needs to be disabled for specific links.

There are 2 options to achieve this:
x-cq-linkchecker="valid" : This will make the linkchecker mark the link as valid.
x-cq-linkchecker="skip" : This will make the linkchecker skip the link.

These tags should need to be added on the <a> tag directly, for example:
<a href="xxx" x-cq-linkchecker="skip">Link</a>


By aem4beginner

October 1, 2020
Estimated Post Reading Time ~

Disable link checker in AEM

Example: x-cq-linkchecker="skip">Link
Note: The Link Checker should not be enabled on publish instances.


By aem4beginner

May 23, 2020
Estimated Post Reading Time ~

Rewrite Adobe CQ Image src attribute

In AEM, content such as pages and images contains the '/content/' prefix in them. We are able to rewrite these url via Link Checker Transformer configuration and resourceResolver.map() method. URLs are being rewritten for HTML elements <a> and <form>.

But I want it to work for <img> elements as well.

I tried including the <img> elements to the Link Checker Transformer configuration by adding it to the 'Rewrite Elements' list as img:src:



I also checked the answers from What am I missing for this CQ5/AEM URL rewriting scenario? but both attempts didn't work for this issue.

Is there any way to do this?
Best How To:
Even if the rewriter and Link Checker Transformer didn't work. I used a custom LinkRewriter by using the Transformer and TransformerFactory interfaces. I based on the sample from Adobe for my code. I worked out something like this:

@Component(
     metatype = true,
     label = "Image Link Rewriter",
     description = "Maps the <img> elements src attributes"
)
@Service(value = TransformerFactory.class)
@Property(value = "global", propertyPrivate = true)
public class ImageLinkRewriter implements Transformer, TransformerFactory {

    // some variables
    public CustomLinkTransformer() {  }

    @Override
    public void init(ProcessingContext context,
                     ProcessingComponentConfiguration config) throws IOException {
        // initializations here
    }

    @Override
    public final Transformer createTransformer() {
        return new CustomLinkTransformer();
    }

    @Override
    public void startElement(String uri, String localName, 
                             String qName, Attributes atts) throws SAXException {
        if ("img".equalsIgnoreCase(localName)) {
            contentHandler.startElement(uri, localName, qName, rewriteImageLink(atts));
        }
    }

    private Attributes rewriteImageLink(Attributes attrs) {
        String attrName = "src";
        AttributesImpl result = new AttributesImpl(attrs);

        String link = attrs.getValue(attrName);
        String mappedLink = resource.getResourceResolver().map(request, link);
        result.setValue(result.getIndex(attrName), mappedLink);
        return result;
    }
}

I hope this helps others. Here are a few references:


By aem4beginner

May 15, 2020
Estimated Post Reading Time ~

How to Fix AEM Link Checker Issues with Broken Image Links

Users expect an engaging and quality experience on your site. If little things aren’t working correctly, they are likely to get frustrated and leave.

Link Checker is a useful tool within AEM that validates all external and internal links on content pages. It shows all invalid, expired, and pre-dated links broken in the authoring environment, as shown below:

In this post, I’m going to share an issue with AEM Link Checker that I ran into recently, and walk you through how I resolved it.
The Problem

I was working on a component that takes a page path from authoring dialog and displays an image with a link to the page. I was not able to edit the component whenever I selected any expired or pre-dated page path.
<a href="/any/expired/or/pre-dated/link">
<img src="/valid/image/path">
</a>


Link checker marked the link as expired and added the prefix image of a broken link and opening brace. However, the suffix image of the closing brace was missing, as shown in the following image:


It also added some extra anchor elements with the expired link at the end of the component. Hence, the component was broken and not editable.
<div>
<img src="/libs/cq/linkchecker/resources/linkcheck_o.gif" alt="expired link: /expired/link" title="expired link: /expired/link" border="0">
<a href=”/expired/link”>
<img src="/valid/image/path">
</a>
</div>
<a href=”/expired/link”></a>
<a href=”/expired/link”></a>
<a href=”/expired/link”></a>


The Solution
I found a simple trick that helped me resolve this issue without affecting any other link validation. Simply disable the link checker for the image inside the expired link.
<a href="/expired/link">
<img x-cq-linkchecker="skip" src="/valid/image/path">
</a>

Note: You can disable validation for any specific link by using either x-cq-linkchecker=”skip” or x-cq-linkchecker=”valid” property.

This added the closing brace for the broken image link and also removed the extra anchor tags at the end, as shown below:

<div>
<img src="/libs/cq/linkchecker/resources/linkcheck_o.gif" alt="expired link: /expired/link" title="expired link: /expired/link" border="0">
<a href=”/expired/link”>
<img src="/valid/image/path">
</a>
<img src="/libs/cq/linkchecker/resources/linkcheck_c.gif" border="0">
</div>


An alternative solution is to disable link checking for the expired link itself. While this fixes the issue, it will no longer show the expired link as broken.
<a x-cq-linkchecker="skip" href="/expired/link">
<img src="/valid/image/path">
</a>


I hope you’ve found this post and solution helpful. Comment below and share your AEM solutions!


By aem4beginner

Attack of the AEM Link Checker

Nearly every user of Adobe Experience Manager underestimates the AEM Link Checker. Most people think of the AEM Link Checker as that annoying feature that incorrectly strips links in AEM. But, it can do far more.

Not only will the AEM Link Checker remove links and incorrectly flag links as broken, but it can also bring an AEM instance to its knees.

This isn’t to say that the idea of having a tool to check links is a bad idea. A good crawler, like Screaming Frog, is a vital tool in every digital marketer’s toolbox, but why is it run on every request?

AEM Link Checker in the Wild
Recently, we had this happen with an AEM instance. The instance had externalized links in the navigation so that the navigation could be used on multiple sites. As additional pages were brought into AEM, the load the AEM Link Checker inflicted upon the instance increased geometrically. This eventually leads to severe performance problems.

Initially, we assumed that the increasing performance problems were due to an errant query or Java Filter. However, a particular heap dump told a very different story.
java.lang.Thread.State: RUNNABLE
at java.util.AbstractCollection.containsAll(AbstractCollection.java:317)
at java.util.AbstractSet.equals(AbstractSet.java:95)
at com.day.cq.rewriter.linkchecker.LinkInfo.isSame(LinkInfo.java:228)
at com.day.cq.rewriter.linkchecker.impl.LinkInfoStorageImpl.putLinkInfo(LinkInfoStorageImpl.java:375)
at com.day.cq.rewriter.linkchecker.impl.LinkCheckerImpl.getLink(LinkCheckerImpl.java:275)
at com.day.cq.rewriter.linkchecker.impl.LinkCheckerTransformer.startElement(LinkCheckerTransformer.java:289)
at org.apache.cocoon.xml.sax.AbstractSAXPipe.startElement(AbstractSAXPipe.java:97)
at com.day.cq.mcm.core.newsletter.NewsletterTransformerFactory$NewsletterTransformer.startElement(NewsletterTransformerFactory.java:132)
at com.day.cq.rewriter.htmlparser.DocumentHandlerToSAXAdapter.onStartElement(DocumentHandlerToSAXAdapter.java:105)
at com.day.cq.rewriter.htmlparser.HtmlParser.processTag(HtmlParser.java:640)
at com.day.cq.rewriter.htmlparser.HtmlParser.update(HtmlParser.java:343)
at com.day.cq.rewriter.htmlparser.HtmlParser.write(HtmlParser.java:196)
at java.io.Writer.write(Writer.java:192)
- locked <_0x00000006aab74560> (a com.day.cq.rewriter.htmlparser.HtmlParser)
at java.io.PrintWriter.write(PrintWriter.java:456)
- locked <_0x00000006aab74560> (a com.day.cq.rewriter.htmlparser.HtmlParser)
at org.apache.sling.scripting.core.impl.helper.OnDemandWriter.write(OnDemandWriter.java:75)
- locked <_0x00000006aab9c3c0> (a org.apache.sling.scripting.core.impl.helper.OnDemandWriter)
at java.io.PrintWriter.write(PrintWriter.java:456)
- locked <_0x00000006aab9c3c0> (a org.apache.sling.scripting.core.impl.helper.OnDemandWriter)
at org.apache.sling.scripting.core.impl.helper.OnDemandWriter.write(OnDemandWriter.java:75)
- locked <_0x00000006aab9c428> (a org.apache.sling.scripting.core.impl.helper.OnDemandWriter)
at java.io.PrintWriter.write(PrintWriter.java:456)
- locked <_0x00000006aab9c428> (a org.apache.sling.scripting.core.impl.helper.OnDemandWriter)
at java.io.PrintWriter.write(PrintWriter.java:456)
- locked <_0x00000006aab9c478> (a java.io.PrintWriter)
at java.io.PrintWriter.write(PrintWriter.java:473)
at org.apache.sling.scripting.sightly.apps.example
To confirm, I reviewed the logs and then grepped the error log to confirm what I was seeing:

grep -wc 'External links for host .* has reached the maximum number of' error.log

Shockingly, this returned over 1,300,000 instances of the log message over the last 24 hours. In order to determine what domains were causing the issues, I then ran another command to just find the unique messages:

grep 'External links for host .* has reached the maximum number of' error.log | sort --unique

From there, I ran the original grep command with specific domains to determine what domains were most responsible.

Saving AEM from the Link Checker
Ideally, the AEM Link Checker should not be enabled in production instances to ensure that it does not impact performance. If this is not an option due to the potential for other side effects, you can configure the “Link Check Override Patterns” in the “Day CQ Link Checker Service” as described in this Adobe HelpX Article. For instance, to disable checking of the domain www.example.com, you could use a regular expression like:

^https?:\/\/www\.example\.com

After configuring the AEM Link Checker to ignore the indicated domains, the AEM instance immediately returned to a stable state.


By aem4beginner

May 13, 2020
Estimated Post Reading Time ~

Disabling Link Checker

1) Go to http://localhost:4502/system/console/configMgr

2) Find "Day CQ Link Checker Transformer"

3) Edit the item and check "Disable Link Checker"


By aem4beginner

May 10, 2020
Estimated Post Reading Time ~

Link Checker in AEM

AEM External link checker:
AEM Link Checker is based on an event handler and gets triggered on creates and updates for /content and its child nodes. All content under the selected root path is parsed and links are validated. All the validation of links is done asynchronously in the background and the HTML is updated based on verification results.

Note: If you are having a huge repository (/content), that includes frequent updation of links. Then it is not advised to use a link checker due to performance issues. As it gets triggered periodically and traverses the whole repository for validating links. This may cause slowness in your author's instance.

Now let's see how aem link checker works:-
As soon as the author saves any link on the page, either using rte or any custom component. Link checker eventHandler gets triggered.

Link checker event Handler traverse /content node and checks for new/updated links, once found it will store that mapping under /var/linkchecker cache folder.

Then control goes to Day CQ Link Checker Service, It checks for the scheduler.period configuration. Once scheduler time is met, it triggers the scheduler to validate the syntax and structure of the link against all the given configuration like the special prefix that it has to ignore during validation and the patter that the link check should use to verify the syntax of the url.

Once the syntax is validated the results are then pushed to /etc/linkchecker.html. But the links will remain in a pending state until Day CQ Link Checker Task scheduler validated these links by making an ajax GET call. AEM link checker scheduler Day CQ Link Checker Task runs periodically to check the validity of valid and in valid links that are store under /etc/linkchecker.html.

The administrator user can configure the frequency on which he wants to run this scheduler by updating Scheduler Period property its default value is 3600 sec. Once triggered it will remove all the invalid or unreachable links from /etc/linkchecker.html(http://localhost:4502/etc/linkchecker.html).

AEM Link checker is configured using below four services:-
  • Day CQ Link Checker Info Storage Service – configures the link cache size. default is 500.
  • Day CQ Link Checker Service – Configure the frequency of background check, the default interval is 5 seconds
  • Day CQ Link Checker Task – Configure the frequency of background check for validating links.
  • Day CQ Link Checker Transformer – config for all the elements that need to be transformed by the link checker and rewritten.
AEM internal link checker:- Internal Links are validated as soon as content author add any internal links (repository links ex: /content/we-retail/ca) on the page either using rte or any custom component. After validation, if url is no longer valid, then they are removed on the publisher or shown as broken links on the author.

Fixing broken links that link checker is not able to validate:-
Sometimes, you might run into a broken link situation means the link is not available on publish even though it is a valid link. This might be because aem link checker automatically checks links and will not publish a broken link. Sometimes it is good as you have a self-monitoring system that prevents you from publishing a broken link but what happens when you know that the link is correct even though aem is not able to publish it as it is considering it as broken, then it is a problem.

There are two types of links that link checker requires configuration for validating:- Links that have a special prefix (ex: href=”tel:123-123-1234″ or href=”*|something|*”). Links that after post-processing having query param, which you want to mark as always valid or skip validation.


By aem4beginner

May 2, 2020
Estimated Post Reading Time ~

AEM disable linkchecker for specific links

This is something that comes up a LOT. There is also another good post on it here at wemblog.

Essentially you need to add one of the two classes to links:

x-cq-linkchecker=”valid” This will make the linkchecker mark it as valid no matter what.

x-cq-linkchecker=”skip” As implied, the linkchecker service will skip this link.

You should be able to add these classes to other tags, such as script tags, etc, that you have configured the linkchecker to examine.


By aem4beginner

April 26, 2020
Estimated Post Reading Time ~

Disable link checker in AEM

Add x-cq-linkchecker="valid" parameter in the tag to make sure that links are always marked as valid by the Link Checker.
Optionally use x-cq-linkchecker="skip" in the . Link Checker will not even check for validity.

Example: x-cq-linkchecker="skip">Link

Note: The Link Checker should not be enabled on publish instances.


By aem4beginner

April 22, 2020
Estimated Post Reading Time ~

Disabling AEM/CQ external link checker

Disabling CQ external link checker

Step:1
Go to below url àinput the username and password for the production publisher
http://localhost:4502/system/console/configMgr

Step:2
Search for "Day CQ Link Checker Transformer" and unselect the Disable checking Finally, click on save.

Step:3
Similarly, This needs to be disabled on publishers also.


By aem4beginner

April 13, 2020
Estimated Post Reading Time ~

disabling link checker

http://dev.day.com/content/kb/home/cq5/CQ5Troubleshooting/DisableLinkChecker.html

Of course the above isn't complete.

You also need the following property:
service.special_link_patterns = .*


By aem4beginner

April 1, 2020
Estimated Post Reading Time ~

How to make sure that Links are always valid on a page

Use Case: Some time link checker makes certain links as invalid (As it is not able to verify it).

Solution: You can add x-cq-linkchecker="valid" parameter in the <a> tag to make sure that links are always marked as valid by CQ. In this case, the link checker will check the link but will mark it valid.

You can optionally use x-cq-linkchecker="skip" in the <a> as well. In this case, the link checker will not even check for validity for the link.

There are other options too to make all external link as valid,

Option 2: You can disable external link checker entirely by going to Felix console


Option 3: Additionally you can add override pattern to disable link checking for a certain domain


Example for override pattern: for http://www.day.com you will write ^http://www\.day\.com/

Note:
Following error with respect to external link checker,
com.day.cq.rewriter.linkchecker.impl.LinkInfoStorageImpl No more external links allowed for host <Host Name> Maximum of 128 reached. means,
External link checker only checks the first 128 links per-host basis, And you can find those entries under /var/linkchecker/<protocol>. Unfortunately, this is not configurable yet.


By aem4beginner

March 15, 2020
Estimated Post Reading Time ~

Comprehensive Guide on AEM Link Checker


AEM link checker is used to validate all internal and external links available on the page. The main purpose of developing a link checker is that content author should not worry about bad or broken links on publish environment, it also allows authors to view a list of all valid and invalid links available on his website at a single place.

After completing this tutorial you will have a clear and understanding about:
  • How aem external link checker works.
  • How aem internal link checker works.
  • How to fix broken links that link checker not able to validate.
  • Difference between link checker and link rewriter.
  • How to disable link checker in AEM.
AEM External link checker:
AEM Link Checker is based on an event handler and gets triggered on creates and updates for /content and its child nodes. All content under the selected root path is parsed and links are validated. All the validation of links is done asynchronously in the background and the HTML is updated based on verification results.

Note: If you are having a huge repository (/content), that includes frequent updation of links. Then it is not advised to use a link checker due to performance issues. As it gets triggered periodically and traverses the whole repository for validating links. This may cause slowness in your author instance.

Now let's see how aem link checker works:
  • As soon as the author saves any link on the page, either using RTE or any custom component. The link checker event handler gets triggered.
  • Link checker event Handler traverse /content node and checks for new/updated links, once found it will store that mapping under /var/linkchecker cache folder.

  • Then control goes to Day CQ Link Checker Service, It checks for the scheduler. period configuration. Once scheduler time is met, it triggers the scheduler to validate the syntax and structure of the link against all the given configuration like the special prefix that it has to ignore during validation and the patter that the link check should use to verify the syntax of the URL.

  • Once the syntax is validated the results are then pushed to /etc/linkchecker.html. But the links will remain in a pending state until Day CQ Link Checker Task scheduler validated these links by making an ajax GET call.

  • AEM link checker scheduler Day CQ Link Checker Task runs periodically to check the validity of valid and invalid links that are store under /etc/linkchecker.html.
The administrator user can configure the frequency on which he wants to run this scheduler by updating Scheduler Period property its default value is 3600 sec.

Once triggered it will remove all the invalid or unreachable links from /etc/linkchecker.html.

Below a screenshot of http://localhost:4502/etc/linkchecker.html will provide you a better understanding of how the values are getting fetched from /var/linkchecker and link checker list is updated. You can also request for re-validation and refresh the status of the links here.


After validation Invalid External Links will be displayed as below:


AEM Link checker is configured using below four services:
  • Day CQ Link Checker Info Storage Service – configures the link cache size. default is 500.
  • Day CQ Link Checker Service – Configure the frequency of background check, the default interval is 5 seconds
  • Day CQ Link Checker Task – Configure the frequency of background check for validating links.
  • Day CQ Link Checker Transformer – config for all the elements that need to be transformed by the link checker and rewritten.
AEM internal link checker:
Internal Links are validated as soon as content author add any internal links (repository links ex: /content/we-retail/ca) on the page either using RTE or any custom component. After validation, if the URL is no longer valid, then they are removed on the publisher or shown as broken links on the author.


Fixing broken links that link checker is not able to validate:
Sometimes, you might run into a broken link situation means the link is not available on publish even though it is a valid link. This might be because aem link checker automatically checks links and will not publish a broken link. Sometimes it is good as you have a self-monitoring system that prevents you from publishing a broken link but what happens when you know that the link is correct even though aem is not able to publish it as it is considering it as broken, then it is a problem.

There are two types of links that link checker requires configuration for validating:
  • Links that have a special prefix (ex: href=” tel:123-123-1234″ or href=”*|something|*”).
  • Links that after post-processing having query param, which you want to mark as always valid or skip validation.
  • Links having special prefix:-
  • Go to http://localhost:4502/system/console/configMgr.
  • Search for “Day CQ Link Checker Service” and update Special Link prefix.
  • For example when we add “tel:” as prefix then during syntax and structure validation it will not check or rewrite it. By default few prefixes are already added over here javascript:, data:, mailto:, #, <!—, ${


The link consists of a variable that is updated on post-processing:
These changes need to go at the coding level, where you can add one more attribute x-cq-linkchecker to <a> tag mark up. This attributed tells aem how to process this anchor tag. Let's see in more details below
  • You can add x-cq-linkchecker=” valid” parameter in the <a> tag to make sure that links are always marked as valid by CQ. In this case, the link checker will check the link but will mark it valid. ( For Ex:- <a x-cq-linkchecker=”valid” …>)
  • You can optionally use x-cq-linkchecker=” skip” in the <a> as well. In this case, the link checker will not even check for validity for the link.( For Ex:- <a x-cq-linkchecker=”skip” …>)
Difference between aem link checker and link rewriter:
A link checker is built for checking the validity of URLs. Link checker scheduler runs periodically to validate URLs available under /content in a repository and save the result under /var/linkchecker cache folder. All the links that have been checked or pending can be seen under /etc/linkchecker.html. After validating all the URLs if they are no longer valid, they are removed on the publisher or shown as broken links on the author.

A link rewriter is built if you want to rewrite the URL during the rendering of the HTML. It parses the HTML and rewrites the URLs available inside the html. If you want to do custom rewriting of URLs then you can write your own link rewriter by extending org.apache.sling.rewriter.Transformer interface.



Disable link checker in AEM:
There are two ways to disable link checker in aem, either though Felix console or by overriding Day CQ Link Checker Service regular expression. Follow below steps to disable aem link checker:-

Disabling all link checking by Felix console configuration:


Find the “DAY CQ Link Checker Transformer”
  • Check the “Disable Checking” box and save.
  • Go to /crx/explorer and login as admin
  • Open “Content Explorer“
  • Once all the changes are made browse to /var/linkchecker
  • Right-click the node and select “Delete Recursively”
  • Click “Save All”.

Note: Using this configuration we have an option either to disable only link checking or both link checking and link rewriting.

Disabling link checking of URLs using regular expressions:
AEM Link checker can be configured in such a way either to ignore all links from being processed or pattern of links based on a regular expression.

The following configuration is specific for the publish instance. To configure for author, change the configuration path from ../config.publish/.. to ../config.author/… . If you wish to configure it for both authors and publish change the configuration path from ../config/...


  • Login to crx/de as admin.
  • Create a configuration node (with node type sling:OsgiConfig) in the project ( /apps/<project-name>/config.publish/{OSGi service PID}).
  • Alternatively, you can copy the one from /libs/cq/linkchecker/config/com.day.cq.rewriter.linkchecker.impl.LinkCheckerImplin the config folder of your choice (that is /apps/myapp/config.publish)
  • Change the property service.check_override_patterns from “^system/” to “^.”
^system/:- This expression means ignore checking and rewriting of all external links that start with the system.

^. :- This expression means ignore checking and rewriting of all external links.

^http://www\.google\.com/ :- This expression means ignore checking and rewriting of http://www.google.com.
  • Delete all nodes under /var/linkchecker to stop the link checker from periodically rechecking URLs
  • If the configuration was done on the author, then make a package and install it on your publish instances as well.
Note: If you are using “^.” it will disable all link checking and link rewriting



By aem4beginner

March 12, 2020
Estimated Post Reading Time ~

Disable link checker in Adobe Experience Manager


Example:
 x-cq-linkchecker="skip">Link 

Note:  The Link Checker should not be enabled on publish instances.



By aem4beginner