December 31, 2020
Estimated Post Reading Time ~

Need for Query and Index in AEM

For functionalities like full-text search, retrieve content based on certain property or conditions associated with property/to avoid iterations on a huge volume of content under the root, we write query-based logic in AEM.

Languages supported:
  • XPATH
  • JCR-SQL2
XPath:
  • Created using AEM Query Builder API. - com.day.cq.search.*
  • From a development point of view, we need to be aware of standard OOB predicates to arrive at the XPATH query.
JCR-SQL2:
  • JCR-SQL2 queries are created using QueryManager - javax.jcr.query.QueryManager
  • QueryManager is acquired through JCR Session - session.getWorkspace().getQueryManager()
Query Processing:
  • Before AEM 6.0/Jackrabbit 2, everything in AEM is indexed by default.
  • With Jackrabbit Oak, we can create custom indexes based on the need. OOB indexes are available under /oak:index node in the repository.
  • Oak Query Engine
    • Process queries in the form of JCR-SQL2.
      • This means if we write queries using QueryBuilder API involving predicates, it will result in an XPath query.
      • QueryEngineImpl(org.apache.jackrabbit.oak.query.QueryEngineImpl) then parses XPATH query and converts to SQL2.
  • Uses a cost-based query optimizer to get the cost involved to process the query from all the available indexes.
  • All the available indexes under oak:index will estimate the cost.
  • The cost of traversal is also calculated.
  • The cost value of the index can be "Infinity". This implies that respective index cannot deal with specific condition/respective indexes cannot query the data.
  • Query Engine then picks the index which has the lowest estimated cost.
  • Note:
    • The cost value is an estimated worst-case value and hence need not be accurate.
    • The above-said process happens whenever a query is executed.
Example:
  • If a query is written to get all pages with specific "jcr:title" value, query engine need not traverse the entire repository for jcr:title, instead looks for selected index(based on cost) and then fetch/filter results from that indexed content.
  • oak:index definition for jcr:title of page is available OOB in /oak:index/cqPageLucene/indexRules/cq:Page/properties/jcrTitle
Need for creating the custom index:
  • When the query engine has to traverse the entire repository/more than allowed nodes(per configuration, its100000 - Apache Jackrabbit Query Engine Settings - In-memory read limit) for what we have queried for, it will result in a slow query and eventually throw UnsupportedOperationException to stop further processing. (Observing the logs, we would notice a warn level message suggesting to create an index as shown below)
*DEBUG* [0:0:0:0:0:0:0:1 [1588022611960] GET /libs/cq/search/content/querydebug.html HTTP/1.1] org.apache.jackrabbit.oak.query.QueryImpl no proper index was found for filter Filter(query=select [jcr:path], [jcr:score], * from [nt:unstructured] as a where isdescendantnode(a, '/') /* xpath: /jcr:root//element(*, nt:unstructured) */, path=//*)
*WARN* [0:0:0:0:0:0:0:1 [1588022611960] GET /libs/cq/search/content/querydebug.html HTTP/1.1] org.apache.jackrabbit.oak.query.QueryImpl Traversal query (query without index): select [jcr:path], [jcr:score], * from [nt:unstructured] as a where isdescendantnode(a, '/') /* xpath: /jcr:root//element(*, nt:unstructured) */; consider creating an index

Screenshot of OSGI config for reference:


Play around in your local instance (to visualize the flow mentioned above)
Create a new logger entry in sling log for the below-highlighted APIs
Navigate to http://localhost:4502/libs/cq/search/content/querydebug.html and frame a sample query predicates and execute. (Can also try to intentionally induce a traversal query in local and observe the logs)

path=/content/we-retail
type=cq:Page
1_property=@jcr:content/jcr:title
1_property.value=English
p.limit=-1


Observe the logs either in a log file or directly in /system/console/slinglog (Have highlighted the points mentioned in the flow)



By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.