April 1, 2020
Estimated Post Reading Time ~

How to Remove .html extension from URL in CQ Or How to Implement New Rewriter Pipeline in CQ / AEM

Note: CQ Need an extension to understand and serve the incoming request. There is no way you can have publish site extension less (Unless it is vanity URL). You can, however, make your public site extension less by tweaking apache and CQ config.

For CQ < 5.5

You could use Sling Rewriter to remove html from the URL.
Use this package (With some pom.xml changes)

You need following configuration changes,

create a node in the repository at /apps/myapp/config/rewriter/html-remover (the names 'myapp' and 'html-remover' can be whatever you want). This node must have the following properties:

* enabled (Boolean) - set to true
* serializerType (String) - set to htmlwriter
* generatorType (String) - set to htmlparser
* order (Long) - set to a number greater than 0
* contentTypes (String[]) - set to text/html
* transformerTypes (String[]) - set to linkchecker, html-remover

It might also be simpler to copy the node /libs/cq/config/rewriter/default to /apps/myapp/config/rewriter/html-remover and then add html-remover to the multi-valued transformerTypes property.

Some additional Questions and answer

Question: What are transformerTypes? I just set it to img-src-cdn-prefixer only without linkrewriter, mobile.....etc And it seems to work. Does transformerTypes define entire pipeline? If I specify it to be {"linkchecker", "img-src-cdn-prefixer, mobile, mobiledebug"} will the pipeline be: htmlparser -> linkchecker -> img-src-cdn-prefixer -> mobile -> mobiledebug -> htmlwriter ?

Answer: That is correct

Question What are htmlwriter (serializer) and htmlparser (generator)? I assume these are default generator and serializer for default pipeline for .html?

Answer: serializer marks the end of pipeline. so for /libs/cq/config/rewriter/default it would be htmlwriter->linkchecker->htmlwriter

and for /libs/cq/config/rewriter/pdf it would be html-generator->htmlparser->xslt->fop

Question: Do you recommend defining my own pipeline component? (by setting pipeline.type = "img-src-cdn-prefixer").
Or, should I use the pipeline.mode = "global" ? If I used the pipeline. mode = "global", would I be able to disable my component? Maybe by setting service.ranking to negative?

Answer: the first option is to define pipeline as a component and the second option is once you have a component and you want to extend it. You can put a new component in your pipeline based on ranking.

it is mentioned in http://sling.apache.org/site/output-rewriting-pipelines-orgapacheslingrewriter.html

Question: If I wanted to beautify html output, would I implement TransformerFactory and use tidy html or similar, and put it right before serializer?

Answer: In theory yes.

Note: This feature would be present OOTB in the next release of CQ. You can also ask for a paid feature pack for this for the CQ5.4 version.

For CQ > 5.5

Note that your application does not work with an extensionless URL, CQ needs selectors in order to find appropriate resource. Above setting just remove .html extension from Links but eventually in order to serve those pages you need internal ".html" extension. You can use rewrite rule like this to achieve this

Apache Changes
The above configuration will remove html extension from the embedded link. But your goal is to have incoming URL as extensionless. You need to consider the following,

1) Someone coming with .html extension ----> remove extension
2) Someone coming without extension -----> map to something CQ understands. Mean add .html
3) Someone coming with selectors as extension ----> Just removes what you want to remove
4) Someone coming with a URL for which removing extension is not desirable ----> Exclude them from extension removal

# Set Up Env Variable for extension Less URL
# This mean that if URL does not start with /etc/design or /bin/wemblog (This is where you will have #your custom servlet)
#Also if URL does not have any extension then this url qualifies for extension less
RewriteCond %{REQUEST_URI} !^/bin/wemblog(.*) [NC]
RewriteCond %{REQUEST_URI} !^/etc/designs(.*) [NC]
RewriteCond %{REQUEST_URI} !(.*)\.[a-zA-Z0-9-]+$
RewriteRule .* - [E=EXTENSION_LESS_URL:1]

#If it is selector then don't remove html extension
#Here we are checking for more than 2 dots occurance
RewriteCond %{REQUEST_URI} .*(\.[a-zA-Z0-9-]*){2,}
RewriteRule .* - [E=MULTIPLE_EXTENSION_URL:1]

# To make thing configurable add few thing which will not be considered for extension less
RewriteCond %{REQUEST_URI} !^/<some path you don't want to remove extension>(.*) [NC]
RewriteCond %{REQUEST_URI} !^/etc/designs(.*) [NC]
RewriteCond %{ENV:MULTIPLE_EXTENSION_URL} ^$
RewriteRule .* - [E=EXCLUDE_FROM_EXTENSIONLESS:1]

# If some one come with / in end then remove it
RewriteCond %{REQUEST_URI} !^/$ [NC]
RewriteRule ^(.*)/$ $1 [R=301,L]

# Now remove extension if someone comes with an extension

# If some one comes with .html as extension then redirect them to non html URL
RewriteCond %{ENV:EXCLUDE_FROM_EXTENSIONLESS} !^$
RewriteRule ^(.*)\.html$ https://%{HTTP_HOST}$1 [R=301,L]

# If some one comes with .html as extension then redirect them to non html URL
RewriteCond %{ENV:EXCLUDE_FROM_EXTENSIONLESS} !^$
RewriteRule ^(.*)\.htm$ https://%{HTTP_HOST}$1 [R=301,L]

# Now time to handle extensionless to be passed with an extension so that publish understand request

#If URL has multiple extensions, and can have /content and it ends with html
RewriteCond %{REQUEST_URI} !^/etc/designs(.*) [NC]
RewriteCond %{REQUEST_URI} !^<some more path you want to exclude>(.*) [NC]
RewriteCond %{ENV:MULTIPLE_EXTENSION_URL} !^$ [NC]
RewriteCond %{REQUEST_URI} .*(\.html)
RewriteRule ^(/.*)$ $1 [L,PT]

#If request do not have html extension then add html extension to it
#This assume that you already removed /content from URL
RewriteCond %{REQUEST_URI} !^/content/dam(.*) [NC]
RewriteCond %{REQUEST_URI} !^/content(.*) [NC]
RewriteCond %{ENV:EXTENSION_LESS_URL} !^$
RewriteRule ^(/.*)$ $1.html [L,PT]

# If something is missed in the end
RewriteCond %{ENV:EXTENSION_LESS_URL} !^$
RewriteRule ^(/.*)$ $1.html [L,PT]

Custom generator example for XML: https://github.com/Adobe-Consulting-Services/acs-aem-commons/pull/48


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.