April 25, 2020
Estimated Post Reading Time ~

Debugging Slowness in your AEM instance : Part 1

So you have designed and developed your AEM website and it all looks good, all features are working fine and now it is deployed into production. Soon you notice that web page performance is not upto the mark, pages are loading slow and your caching is not very optimized.

There could be multiple reasons for slowness in your AEM application and you need to figure out what are the problem areas. AEM comes with some built in tool to monitor and analyze the performance of your AEM pages which you can utilize to your benefit. However when we say slow what does slow means? Are there any guidelines? Well as per Adobe documentation here are some stats shown below, note that this is the recommended time for rendering on the publish server (i.e. non-cached pages)


Identifying Slow Request: Here are some tools that you can use to analyze the performance of your AEM implementations
Request log: For each request and response one log entry is written into request.log file, you can analyze this file to find out the request and response time. The request.log files look like this
23/Jun/2016:00:00:30 +0530 [758] -> GET /libs/granite/csrf/token.json HTTP/1.1
23/Jun/2016:00:00:30 +0530 [758] <- 200 application/json 24ms
23/Jun/2016:00:00:50 +0530 [759] -> GET /bin/company/repo HTTP/1.1
23/Jun/2016:00:00:50 +0530 [759] <- 200 - 19ms
23/Jun/2016:00:02:02 +0530 [760] -> PUT /libs/sling/topology/connector.f282067f-8fa0-4cc9-878a-413d40452f4b.json HTTP/1.1
23/Jun/2016:00:02:02 +0530 [760] <- 200 - 3ms


It contains the following information:
Date and time when request and response is made

The ID of request (758 in the above example), the response for this request will have the same ID so you can find out the request and response easily

Arrow indicating request or response (-> mean request and <- mean response)

Then request line which contains

METHOD (GET, POST or HEAD)
REQUESTED_RESOURCE
PROTOCOL
Response Line contains
Status Code
MIME Type
Response Time

AEM comes with a tool which you can use to analyze the request log file, this tool is called rlog.jar. It can be used to quickly sort request.log so that requests are displayed by duration, from longest to shortest time.
rlog.jar: This jar file is included with an AEM installation in the
/crx-quickstart/opt/helpers folder

To run the tool use below command
$ java -jar ../opt/helpers/rlog.jar request.log

This command will give you all the request sorted by the response time, the request on the top is the slowest one, example output

*Info * Parsed 89 requests.
*Info * Time for parsing: 20ms
*Info * Time for sorting: 1ms
*Info * Total Memory: 121mb
*Info * Free Memory: 119mb
*Info * Used Memory: 1mb

------------------------------------------------------
6379ms 23/Jun/2016:00:08:13 +0530 200 POST /system/sling/tooling/install -
936ms 23/Jun/2016:00:09:07 +0530 200 POST /system/sling/tooling/install -
738ms 23/Jun/2016:00:08:22 +0530 200 POST /system/sling/tooling/install -
491ms 23/Jun/2016:00:08:57 +0530 200 POST /system/sling/tooling/install -
469ms 23/Jun/2016:00:08:43 +0530 200 POST /system/sling/tooling/install -
452ms 23/Jun/2016:00:09:35 +0530 200 POST /system/sling/tooling/install -
……

……

If you want to analyze multiple requests log files you will have to combine all logs files first.

Once you get the information about slow requests using the tool, try to analyze them one by one starting from the slowest. Try to find out
Are the slow request happening for a particular page only, if yes does this page interacts with any other external system to get the data, if yes was there any outage during that time or was there any issue with the external system at that point in time.

Check the timestamp of the slowest request and compare that with other request that are of same time, are response of those request were also slow at that time, if yes you need to find out the reason for slowness on your publish server, if no then was the slowest response from a page which is getting data from some external system.

Check the cache ratio of the system, usually you should not see multiple request frequently of the page which is supposed to be cached (like a request that end with .html, have a status code of 200 and doesn’t have any query parameters), count the number of time a cacheable page is getting requested on publish and if it is too high a number. If that is happening your cache is not configured correctly or you are having very frequent activation of the page. Like for example if the stat file is not set properly every time a page is activated your entire site cache will be invalidated.

Calculating the Cache Ratio
Adobe suggested a cache ratio of 90-95% for the best performance of your website, what this means is that 90% of your content should be served from cache rather than the publish server. This might not be true in all cases especially in case of transactional website, however you should try to achieve a cache ratio which is as close as possible to this figure.

A simple formula to calculate the cache ratio is

Cache Ratio = (Total Number of Requests –Number of Requests on Publisher ) / Total Number of Requests

Total number of request = sum of all request from apache access log

Number of request on publisher = from request log, using rlog.jar

So for example if you are getting total 1000 request and 100 are hitting the publish cache ratio is

Cache ratio = 1000-100 / 1000 = 90%

Note that if you don’t have a 1:1 dispatcher publish setup you need to combine all request from dispatcher and from all publish to calculate the cache ratio.


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.