April 6, 2020
Estimated Post Reading Time ~

Cheat Sheet of AEM Index Definition Structure

SUMMARY:
Oak does not index as much content by default as does Jackrabbit 2. You need to create custom indexes when necessary, much like in traditional RDBMSs.The intention of the article is to provide a cheat sheet of definition structure.

NOTES

  1. For up to date information and more details refer
  2. Get Latest Oak Hotfix
  3. Must watch Recording Video
  4. Useful Tools
  5. Special Thanks to Tommaso Teofili, Chetan Mehrotra, Alex Parvulescu, Andrew Khoury, Thomas Mueller, Davide Giannella, Eren Aydin, Varun Mehrotra, Goran Brodnik and Vikas Saurabh for the willingness to help me in the journey of some of challenging task with index.

Node/Property Name
Type
Default
Description
 oak:index
nt:unstructured
 indexName
oak:QueryIndexDefinition
lucene async index for full text,property,sorting etc...
compatVersion
long
By default Oak uses older Lucene index implementation which does not supports property restrictions, index time aggregation etc. To make use of this feature set it to 2.
type
String
set to lucene. Lucene index can be used to evaluate property constraints, full text constraints, path restrictions and sorting.
async
String
set to async. sends the index update process to a background thread. it might lag behind in reflecting the current repository state while performing the query.
name
String
Captures the name of the index which is used while logging
blobSize
long
Size in bytes used for splitting the index files when storing them in NodeStore.
evaluatePathRestrictions
boolean
If enabled the index can evaluate path restrictions.
includedPaths
String []
/
Oak 1.0.14, 1.2.3 List of paths which should be included in indexing.
excludedPaths
String []
empty
Oak 1.0.14, 1.2.3 List of paths which should be excluded from indexing.
maxFieldLength
long
10000
Numbers of terms indexed per field.
codec
String
By default if the index involves fulltext indexing then Oak Lucene uses OakCodec which disables compression. Due to this the index size may grow large. To enable compression you can set the codec to Lucene46. Refer to OAK-2853 for details.
indexPath
String
Path of the index definition in the repository. To speed up the indexing with CopyOnWrite you would also need to set indexPath in index definition to the path of index in the repository. For e.g. if your index is defined at /oak:index/lucene then value of indexPath should be set to /oak:index/lucene. This would enable the indexer to perform any read during the indexing process locally and thus avoid costly read from remote. For more details refer to OAK-2247. This feature can be enabled via Lucene Index provider service configuration
functionName
String
Name to be used to enable index usage with native query support.
queryPaths
https://issues.apache.org/jira/browse/OAK-2599
reindex
boolean
persistence
String
To store the Lucene index in the file system, in the Lucene index definition node, set the property persistence to file, and set the property path to the directory where the index should be stored.
path
String
To store the Lucene index in the file system, in the Lucene index definition node, set the property persistence to file, and set the property path to the directory where the index should be stored.
 indexRules
nt:unstructured
 ruleName
nt:unstructured
An index configuration can define one or more indexingRules for different nodeTypes.The ruleName will be nodeName like nt:base
inherited
boolean
true
Determines if the rule is applicable on exact match or can be applied if match is done on basis of nodeType inheritance
indexNodeName
boolean
false
since Oak 1.0.20, 1.2.5 If set to true then index would also be created for node name. This would enable faster evaluation of queries involving constraints on Node name. For example:- select [jcr:path] from [nt:base] where NAME() = 'kite'
includePropertyTypes
String []
Applicable when index is enabled for fulltext indexingFor full text index defaults to include all typesString array of property types which should be indexed.
costPerExecution
Double
For each query, the overhead is one operation. For each entry in the index, the cost is one.
costPerEntry
Double
For each query, the overhead is one operation. For each entry in the index, the cost is one.
 properties
nt:unstructured
Each index rule consist of one ore more property definition defined under properties.
 propertyName
nt:unstructured
Can be any name generally provide property name.
name
String
Property name. If not defined then property name is set to the node name. If isRegexp is true then it defines the regular expression only to immediate property. Can also be set to a relative property.
propertyIndex
boolean
Whether the index for this property is used for equality conditions, ordering, and is not null conditions.
isRegexp
boolean
" If set to true then property name would be interpreted as a regular expression and the given definition would be applicable for matching property names. Note that expression should be structured such that it does not match ‘/’. .* - This property definition is applicable for all properties of given node jcr:content/metadata/.* - This property definition is applicable for all properties of child node jcr:content/metadata"
nodeScopeIndex
boolean
"Control whether the value of a property should be part of fulltext index. That is, you can do a jcr:contains(., ‘foo’) and it will return nodes that have a string property that contains the word foo. Example //element(*, app:Asset)[jcr:contains(., ‘image’)]"
boost
double
since Oak 1.2.5 If the property is included in nodeScopeIndex then it defines the boost done for the index value against the given property name.
index
boolean
Determines if this property should be indexed. Mostly useful for fulltext index where some properties need to be excluded from getting indexed.
useInExcerpt
boolean
Controls whether the value of a property should be used to create an excerpt. The value of the property is still full-text indexed when set to false, but it will never show up in an excerpt for its parent node. If set to true then property value would be stored separately within index causing the index size to increase. So set it to true only if you make use of excerpt feature
analyzed
boolean
"Set this to true if the property is used as part of contains. Example //element(*, app:Asset)[jcr:contains(type, ‘image’)]"
ordered
boolean
"If the property is to be used in order by clause to perform sorting then this should be set to true. This should be set to true only if the property is to be used to perform sorting as it increases the index size. Example //element(*, app:Asset)[jcr:contains(type, ‘image’)] order by @size"
type
String
JCR Property type. Can be one of Date, Boolean, Double or Long. Mostly inferred from the indexed value. However in some cases where same property type is not used consistently across various nodes then it would recommened to specify the type explicitly.
nullCheckEnabled
boolean
" Since 1.0.12 If the property is checked for is null then this should be set to true. This should only be enabled for nodeTypes which are not generic as it leads to index entry for all nodes of that type where this property is not set. _//element(*, app:Asset)[not(jcr:content/@excludeFromSearch)] It would be better to use a query which checks for property existence or property being set to specific values as such queries can make use of index without any extra storage cost."
useInSuggest
boolean
since Oak 1.1.17, 1.0.15 controls from which properties terms to be used for suggestions will be taken.
useInSpellcheck
boolean
since Oak 1.1.17, 1.0.13 controls from which properties terms to be used for spellcheck corrections will be taken.
facets
boolean
since Oak 1.3.14 "used for retrieving facets, in order to do so the property facets must be set to true on the property definition."
 facets
nt:unstructured
since Oak 1.3.14 By default ACL checks are always performed on facets by the Lucene property index however this can be avoided by setting the property secure to false in the facets configuration node.
secure
boolean
 aggregates
nt:unstructured
to include the contents of descendant nodes into a single node to easier search on content that is scattered across multiple nodes.
 ruleName
nt:unstructured
An index configuration can define one or more aggregates for different nodeTypes.The ruleName will be nodeName like nt:base
reaggregateLimit
long
5
(See JCR-2989 for details).
 aggregateNodeInclude
nt:unstructured
path
String
"Path pattern to include. Example
jcr:content - Name explicitly specified
* - Any child node at depth 1
*/* - Any child node at depth 2
primaryType
String
Restrict the included nodes to a certain type. The restriction would be applied on the last node in given path
relativeNode
boolean
Boolean property indicates that query can be performed against specific node
 analyzers
nt:unstructured
@since Oak 1.2.0
 default
nt:unstructured
class
String
Example:- org.apache.lucene.analysis.standard.StandardAnalyzer
luceneMatchVersion
String
To confirm to specific version specify it via luceneMatchVersion otherwise Oak would use a default version depending on version of Lucene it is shipped with. Ex:- LUCENE_47
stopwords
nt:file
 charFilters
nt:unstructured
The filters needs to be ordered
HTMLStrip
Mapping
 tokenizer
nt:unstructured
The filters needs to be ordered
name
 filters
nt:unstructured
The filters needs to be ordered
LowerCase
 Stop
nt:unstructured
words
stopx.txt
nt:file
one or more file nodes. x can be 1 to n
PorterStem
nt:unstructured
 Synonym
nt:unstructured
synonyms
synonym.txt
nt:file
one or more file nodes. x can be 1 to n
pathText
nt:unstructured
 tika
nt:unstructured
maxExtractLength
long
config.xml
nt:file
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/resources/org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml
 suggestion
nt:unstructured
suggestUpdateFrequencyMinutes
long
suggestAnalyzed
boolean
Analyzed suggestions can be enabled by setting suggestAnalyzed property to true
 indexName
oak:QueryIndexDefinition
Synchronous property index
type
String
set to property. Is useful whenever there is a query with a property constraint that is not full-text.
propertyNames
Name[]
index one property per index. (If multiple properties are indexed within one index, then the index contains all nodes that has either one of the properties, which can make the query less efficient, and can make the query pick the wrong index.)
unique
boolean
a uniqueness constraint on this property is added. Ensure you set declaringNodeTypes, otherwise all nodes of the repository are affected (which is most likely not what you want), and you are not able to version the node.
declaringNodeTypes
Name[]
the index only applies to a specified node type
reindex
boolean
includedPaths
String[]
/
The index is only used if the query has a path restriction that is not excluded, and part of the included paths.
excludedPaths
String[]
none
The index is only used if the query has a path restriction that is not excluded, and part of the included paths.
entryCount
Long
the estimated number of path entries in the index, to override the cost estimation (a high entry count means a high cost).
keyCount
Long
the estimated number of keys in the index, to override the cost estimation (a high key count means a lower cost and a low key count means a high cost when searching for specific keys; has no effect when searching for “is not null”).
reindex-async
boolean
pushing the property index updates to a background job and when the indexing process is done, the property definition will be switched back to a synchronous updates mode. need to start the dedicated background job via a jmx call to the PropertyIndexAsyncReindex#startPropertyIndexAsyncReindex MBean. future
 indexName
oak:QueryIndexDefinition
This is deprecated. The Ordered index is an extension of the Property index. It keeps the order of the indexed property persistent in the repository.
type
String
set to ordered. Is useful speed up queries with "ORDER BY", equality and range clauses.
propertyNames
Name
It has to be a simple value list of type Name.
async
String
The index can be defind as asynchronous by setting the async property to async.
direction
String
ascending
The direction of the sorting can be configured, by adding the direction property. It can have a value of ascending or descending.
reindex
boolean
The reindex flag which if set to true, will trigger a full content re-index.
 indexName
oak:QueryIndexDefinition
The purpose of the Solr index is mainly full-text search but it can also be used to index search by path, property restrictions and primary type restrictions. This means the Solr index in Oak can be used for any type of JCR query.
type
String
set to solr.
async
String
set to async
reindex
boolean
The reindex flag which if set to true, will trigger a full content re-index.



Source: 
http://www.aemstuff.com/blogs/feb/aemindexcheatsheat.html


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.