January 1, 2021
Estimated Post Reading Time ~

Indexing in AEM - Indexing modes and Index types

In order for queries to perform well, Oak supports the indexing concept which will index content stored in the repository based on the index definition/type/indexing mode.
  • Indexing works by comparison of the node state (Difference between base state and modified state ) where NodeState(org.apache.jackrabbit.oak.spi.state.NodeState) represents a specific immutable state of the node.
  • Below mentioned types of indexing modes are defined based on how this comparison is made + when the index content gets updated. 
Indexing modes:
  • Synchronous Indexing
    • This mode updates index content as part of the commit to the actual content. In other words, content update and the respective update in the index will happen together (as with the name synchronous)
    • Supported Index Type: Property Index
  • Asynchronous Indexing
    • Index update is done via a scheduled jobs(AsyncIndexJobUpdate) defined at a specific interval. (5 seconds OOB)
    • As indexing in this mode happens asynchronously irrespective of the updates to the content, there is a chance of slight lag behind the latest repository state and will be eventually consistent.
    • Supported Index Type: Lucene Index and Solr Index
    • Example: /oak:index/cqTagLucene

  • Near Real-time (NRT) Indexing
    • Indexing happens in two modes/at two places
      • Persisted Index: Index updated via job mentioned above, AsyncIndexJobUpdate.
      • Local Index: In addition to persisted index, indexes will be created locally with help of copy-on-read support(Config as part of org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexProviderService). It keeps data between two async index job runs.
        • Note: We can choose to provide a local index path as part of the config. If not mentioned, it will be stored in the "index" directory of our instance repository home.
    • In other words, content that is updated after the last async job run and before commencing the job at the next periodic interval will be able to show up quickly. (With both persisted and local index in place)
    • Supported Index Type: Lucene Index
    • Usage:
      • NRT indexing mode has two variations - nrt and sync.
      • NRT Indexing mode - nrt
        • async property on oak:index is set as ['async', 'nrt']
        • The local index is updated asynchronously
      • NRT Indexing mode - sync
        • async property on oak:index is set as ['async', 'sync']
        • The local index is updated synchronously. This mode indexes slowly compared to nrt mode.
  • Example: Few OOB Lucene index works on "NRT indexing mode -nrt". One such is in /oak:index/cqPageLucene
Types of Oak Index:
  • Property Index
    • Useful for queries that have property constraints that are not full text.
    • Identified by the following properties:
      • type -> property
      • propertyNames -> property name for which index is to be created [Name array]
  • Lucene Index
    • Lucene Fulltext Index
      • Useful for queries involving full-text conditions.
      • Identified by the following properties:
        • type -> Lucene
        • async -> async
    • Lucene Property Index
      • Same as property index mentioned above. Given that it is Lucene property index, it will index in async mode.
      • Identified by the following properties:
        • type -> lucene
        • async -> async
        • fullTextEnabled -> false [Boolean]
        • includePropertyNames -> property names for which index is to be created [String array]
  • Solr Index
    • Used when Apache Solr is used for search functionality.
    • Identified by the following properties:
      • type -> solr
      • async -> async
  • Ordered Index (deprecated)
Note:
  • Based on the query predicates we use, indexes used will differ accordingly (Difference is explained with example query below)
  • Apart from the key properties highlighted above, there are other supporting properties/nodes for an index definition. Will be covered separately for better clarity.
Index Manager, Admin UI:
We have Admin UI OOB for displaying the indexes available in our instance.
  • Navigate to Tools -> Operations -> Diagnosis -> Index Manager
  • We have filter options to filter based on the index name, type, path.
  • On selecting a specific index, we have two options - Index Info, Consistency check.


For node and property level details of oak:index by type, use query predicates like below in
querydebug.html
(From the result set, we can navigate to CRXDE of the respective node and observe the nodes, properties)
  • Get all Property Indexes:
path=/oak:index
type=oak:QueryIndexDefinition
1_property=type
1_property.value=property
p.limit=-1

  • Get all Lucene Indexes:
path=/oak:index
type=oak:QueryIndexDefinition
1_property=type
1_property.value=lucene
p.limit=-1

  • Get all indexes of mode NRT, with scheme nrt:
path=/oak:index
type=oak:QueryIndexDefinition
1_property=type
1_property.value=lucene
2_property=async
2_property.value=%nrt
2_property.operation=like
p.limit=-1


Play around in local (to understand the index modes and types used for your queries)
  • Use query predicates used in your project/create one and execute in the "Explain Query" console mentioned above.
  • Observe the indexes used.
  • Sample full text query to search for the text "we-retail" in pages:
fulltext=we-retail
path=/content/we-retail
type=cq:Page
p.limit=-1

  • Explain Query console with below query.

  • Hit on Explain will result in the below display 

  • The same query without a "type" predicate will result in a different index being used.


Observation:
  • As evident from the above example, if the type is used, the Lucene index which has a definition for a specific type will be picked. (In this case, the type used : cq:Page and index: cqPageLucene)
  • type is removed, full text Lucene index is used - /oak:index/lucene (async mode -> fulltext-async)
Note  
Each of the bulletins explained above by itself is a vast topic and has more details to it. Will try to cover it in upcoming posts.


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.