May 11, 2020
Estimated Post Reading Time ~

AEM Dispatcher. Part 3: Enabling Cache



Caching is the key to performance, but also source of many issues when configured incorrectly. In this part I will give an overview and useful practical details on how to correctly setup caching for local development environment and where to look for cached files.

By default resources will be cached if the following conditions are satisfied:
HTTP request method is GET;
Request URL has extension (for example, .html or .xml);
Request URL has no query string (there are no parameters after extension);
Request has no “Authorization” header (unless AllowAuthorized is 1).

Settings are configured in the dispatcher configuration file. In our demonstration case this file is conf/dispatcher.any. The configuration file contains a series of single valued or multivalued properties that control behavior of the dispatcher:
property names are prefixed with a forward slash (“/”);
multi-valued properties enclose child items using braces (“{}”);
comments begin from ‘#’ symbol.

Renders
Renders are AEM instances from which dispatcher receives content that may be cached. Renders are the first thing we will define in our configuration file. If you define more than one render, the dispatcher will automatically balance the load among these AEM instances. In our case we will set only one render: publish AEM instance.
    /renders
      {
      /rend01
        {
        /hostname "localhost"
        /port "4503"
        }
      }

You may restart httpd and check that dispatcher is able to request resources from publish AEM instance. For example, if you have http://localhost:4503/content/geometrixx/en.html working then the dispatcher version of this page should be available at http://localhost/content/geometrixx/en.html. Note that we don’t set the port in request url because the dispatcher (more precisely, httpd) works on port 80 which is default for all browsers.

Filters
The /filter section specifies HTTP requests that dispatcher can accept. All other requests are sent back to the web server with a 404 error code (page not found). Let’s allow access to all the resources for our demonstration case. 

/filter 
 { /0001 { /type "allow" /glob "*" } }
Filters types: “allow” or “deny”.

Globs will be compared against the entire request line, e.g.:
/0001 { /type "allow" /glob "* /index.html *" }

This glob matches request “GET /index.html HTTP/1.1” but not “GET /index.html?a=b HTTP/1.1”.

Instead of “globs” you may use separate “url”, “method”, “protocol”, “extension” to define your filter In addition to “url” you may use “path”, “selectors”, “extension”, “suffix”.

When a request fits multiple filter patterns then only the last filter pattern is applied.

After defining your filters you may restart httpd and check that dispatcher has access to all resources of the publish instance. Of course you should deny access to some resources for security reasons in production environment.

Cache
Cache section determines resources that will be cached by dispatcher. This section has number of the rules which are quite similar to the filters rules, with a few additional settings. For example, /docroot determines location of the directory where cached files are stored. The value must be the exact same path as the document root of the web server so that dispatcher and the web server can handle the same files.

For our demonstration let’s set docroot and allow caching of all resources which are received from our render (publish instance):

    /cache
      {
      /docroot "/Apache22/htdocs"
      /rules
        {
        /0000
          {
          /glob "*"
          /type "allow"
          }
        }
      }

After these changes you may restart httpd, open new private browser window for unauthorized access without using “Authorization” header (Chrome ctrl+shift+n, firefox ctrl+shift+p) and go to : http://localhost/content/geometrixx/en/products.html. Cached resources should appear inside htdocs directory. Resources have the url-like hierarchy: directories form paths and static html-files contain rendered content.

Headers
You saw that cached html files contain only html content. But what should we do if we want to cache response headers received from renders? For example, if response from renders contains “Content-Type” header which determines encoding, the html content may not be displayed correctly without the header. That’s what the /headers block inside /cache sections is designed for. Let’s cache some common and useful headers:
/cache
{
      /headers
        {
        "Cache-Control"
        "Content-Disposition"
        "Content-Type"
        "Expires"
        "Last-Modified"
        "X-Content-Type-Options"
        }
}
After changing headers in your dispatcher configuration file delete all cache from htdocs directory and restart httpd, and then open http://localhost/content/geometrixx/en/products.html. Finally you will find not only the cached html-file products.html in htdocs/content/geometrixx/en directory but also products.html.h. This *.h file contains headers for the cached html-file.

Summary
You can quickly enable cache with the defining the following initial settings in the dispatcher configuration file:
set renders;
set filters;
set htdocs and rules for cache sections;
set headers for storing http-headers.

For detailed and useful documentation go to: https://docs.adobe.com/docs/en/dispatcher/disp-config.html


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.