SUMMARY:
Oak does not index as much content by default as does Jackrabbit 2. You need to create custom indexes when necessary, much like in traditional RDBMSs.The intention of the article is to provide a cheat sheet of definition structure.
NOTES
- For up to date information and more details refer
- Oak Documentation https://jackrabbit.apache.org/oak/docs/query/query.html
- Oak Queries and Indexing https://docs.adobe.com/docs/en/aem/6-1/deploy/platform/queries-and-indexing.html
- Best Practices for Queries and Indexing https://docs.adobe.com/docs/en/aem/6-1/deploy/best-practices/best-practices-for-queries-and-indexing.html
- Get Latest Oak Hotfix
- Must watch Recording Video
- Useful Tools
- Special Thanks to Tommaso Teofili, Chetan Mehrotra, Alex Parvulescu, Andrew Khoury, Thomas Mueller, Davide Giannella, Eren Aydin, Varun Mehrotra, Goran Brodnik and Vikas Saurabh for the willingness to help me in the journey of some of challenging task with index.
Node/Property Name
|
Type
|
Default
|
Description
|
nt:unstructured
| |||
oak:QueryIndexDefinition
|
lucene async index for full text,property,sorting etc...
| ||
compatVersion
|
long
|
By default Oak uses older Lucene index implementation which does not supports property restrictions, index time aggregation etc. To make use of this feature set it to 2.
| |
type
|
String
|
set to lucene. Lucene index can be used to evaluate property constraints, full text constraints, path restrictions and sorting.
| |
async
|
String
|
set to async. sends the index update process to a background thread. it might lag behind in reflecting the current repository state while performing the query.
| |
name
|
String
|
Captures the name of the index which is used while logging
| |
blobSize
|
long
|
Size in bytes used for splitting the index files when storing them in NodeStore.
| |
evaluatePathRestrictions
|
boolean
|
If enabled the index can evaluate path restrictions.
| |
includedPaths
|
String []
|
/
|
Oak 1.0.14, 1.2.3 List of paths which should be included in indexing.
|
excludedPaths
|
String []
|
empty
|
Oak 1.0.14, 1.2.3 List of paths which should be excluded from indexing.
|
maxFieldLength
|
long
|
10000
|
Numbers of terms indexed per field.
|
codec
|
String
|
By default if the index involves fulltext indexing then Oak Lucene uses OakCodec which disables compression. Due to this the index size may grow large. To enable compression you can set the codec to Lucene46. Refer to OAK-2853 for details.
| |
indexPath
|
String
|
Path of the index definition in the repository. To speed up the indexing with CopyOnWrite you would also need to set indexPath in index definition to the path of index in the repository. For e.g. if your index is defined at /oak:index/lucene then value of indexPath should be set to /oak:index/lucene. This would enable the indexer to perform any read during the indexing process locally and thus avoid costly read from remote. For more details refer to OAK-2247. This feature can be enabled via Lucene Index provider service configuration
| |
functionName
|
String
|
Name to be used to enable index usage with native query support.
| |
queryPaths
|
https://issues.apache.org/jira/browse/OAK-2599
| ||
reindex
|
boolean
| ||
persistence
|
String
|
To store the Lucene index in the file system, in the Lucene index definition node, set the property persistence to file, and set the property path to the directory where the index should be stored.
| |
path
|
String
|
To store the Lucene index in the file system, in the Lucene index definition node, set the property persistence to file, and set the property path to the directory where the index should be stored.
| |
nt:unstructured
| |||
nt:unstructured
|
An index configuration can define one or more indexingRules for different nodeTypes.The ruleName will be nodeName like nt:base
| ||
inherited
|
boolean
|
true
|
Determines if the rule is applicable on exact match or can be applied if match is done on basis of nodeType inheritance
|
indexNodeName
|
boolean
|
false
|
since Oak 1.0.20, 1.2.5 If set to true then index would also be created for node name. This would enable faster evaluation of queries involving constraints on Node name. For example:- select [jcr:path] from [nt:base] where NAME() = 'kite'
|
includePropertyTypes
|
String []
|
Applicable when index is enabled for fulltext indexingFor full text index defaults to include all typesString array of property types which should be indexed.
| |
costPerExecution
|
Double
|
For each query, the overhead is one operation. For each entry in the index, the cost is one.
| |
costPerEntry
|
Double
|
For each query, the overhead is one operation. For each entry in the index, the cost is one.
| |
nt:unstructured
|
Each index rule consist of one ore more property definition defined under properties.
| ||
nt:unstructured
|
Can be any name generally provide property name.
| ||
name
|
String
|
Property name. If not defined then property name is set to the node name. If isRegexp is true then it defines the regular expression only to immediate property. Can also be set to a relative property.
| |
propertyIndex
|
boolean
|
Whether the index for this property is used for equality conditions, ordering, and is not null conditions.
| |
isRegexp
|
boolean
|
" If set to true then property name would be interpreted as a regular expression and the given definition would be applicable for matching property names. Note that expression should be structured such that it does not match ‘/’. .* - This property definition is applicable for all properties of given node jcr:content/metadata/.* - This property definition is applicable for all properties of child node jcr:content/metadata"
| |
nodeScopeIndex
|
boolean
|
"Control whether the value of a property should be part of fulltext index. That is, you can do a jcr:contains(., ‘foo’) and it will return nodes that have a string property that contains the word foo. Example //element(*, app:Asset)[jcr:contains(., ‘image’)]"
| |
boost
|
double
|
since Oak 1.2.5 If the property is included in nodeScopeIndex then it defines the boost done for the index value against the given property name.
| |
index
|
boolean
|
Determines if this property should be indexed. Mostly useful for fulltext index where some properties need to be excluded from getting indexed.
| |
useInExcerpt
|
boolean
|
Controls whether the value of a property should be used to create an excerpt. The value of the property is still full-text indexed when set to false, but it will never show up in an excerpt for its parent node. If set to true then property value would be stored separately within index causing the index size to increase. So set it to true only if you make use of excerpt feature
| |
analyzed
|
boolean
|
"Set this to true if the property is used as part of contains. Example //element(*, app:Asset)[jcr:contains(type, ‘image’)]"
| |
ordered
|
boolean
|
"If the property is to be used in order by clause to perform sorting then this should be set to true. This should be set to true only if the property is to be used to perform sorting as it increases the index size. Example //element(*, app:Asset)[jcr:contains(type, ‘image’)] order by @size"
| |
type
|
String
|
JCR Property type. Can be one of Date, Boolean, Double or Long. Mostly inferred from the indexed value. However in some cases where same property type is not used consistently across various nodes then it would recommened to specify the type explicitly.
| |
nullCheckEnabled
|
boolean
|
" Since 1.0.12 If the property is checked for is null then this should be set to true. This should only be enabled for nodeTypes which are not generic as it leads to index entry for all nodes of that type where this property is not set. _//element(*, app:Asset)[not(jcr:content/@excludeFromSearch)] It would be better to use a query which checks for property existence or property being set to specific values as such queries can make use of index without any extra storage cost."
| |
useInSuggest
|
boolean
|
since Oak 1.1.17, 1.0.15 controls from which properties terms to be used for suggestions will be taken.
| |
useInSpellcheck
|
boolean
|
since Oak 1.1.17, 1.0.13 controls from which properties terms to be used for spellcheck corrections will be taken.
| |
facets
|
boolean
|
since Oak 1.3.14 "used for retrieving facets, in order to do so the property facets must be set to true on the property definition."
| |
nt:unstructured
|
since Oak 1.3.14 By default ACL checks are always performed on facets by the Lucene property index however this can be avoided by setting the property secure to false in the facets configuration node.
| ||
secure
|
boolean
| ||
nt:unstructured
|
to include the contents of descendant nodes into a single node to easier search on content that is scattered across multiple nodes.
| ||
nt:unstructured
|
An index configuration can define one or more aggregates for different nodeTypes.The ruleName will be nodeName like nt:base
| ||
reaggregateLimit
|
long
|
5
|
(See JCR-2989 for details).
|
nt:unstructured
| |||
path
|
String
|
"Path pattern to include. Example
jcr:content - Name explicitly specified * - Any child node at depth 1 */* - Any child node at depth 2 | |
primaryType
|
String
|
Restrict the included nodes to a certain type. The restriction would be applied on the last node in given path
| |
relativeNode
|
boolean
|
Boolean property indicates that query can be performed against specific node
| |
nt:unstructured
|
@since Oak 1.2.0
| ||
nt:unstructured
| |||
class
|
String
|
Example:- org.apache.lucene.analysis.standard.StandardAnalyzer
| |
luceneMatchVersion
|
String
|
To confirm to specific version specify it via luceneMatchVersion otherwise Oak would use a default version depending on version of Lucene it is shipped with. Ex:- LUCENE_47
| |
stopwords
|
nt:file
| ||
nt:unstructured
|
The filters needs to be ordered
| ||
HTMLStrip
| |||
Mapping
| |||
nt:unstructured
|
The filters needs to be ordered
| ||
name
| |||
nt:unstructured
|
The filters needs to be ordered
| ||
LowerCase
| |||
nt:unstructured
| |||
words
| |||
stopx.txt
|
nt:file
|
one or more file nodes. x can be 1 to n
| |
PorterStem
|
nt:unstructured
| ||
nt:unstructured
| |||
synonyms
| |||
synonym.txt
|
nt:file
|
one or more file nodes. x can be 1 to n
| |
pathText
|
nt:unstructured
| ||
nt:unstructured
| |||
maxExtractLength
|
long
| ||
config.xml
|
nt:file
|
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/resources/org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml
| |
nt:unstructured
| |||
suggestUpdateFrequencyMinutes
|
long
| ||
suggestAnalyzed
|
boolean
|
Analyzed suggestions can be enabled by setting suggestAnalyzed property to true
| |
oak:QueryIndexDefinition
|
Synchronous property index
| ||
type
|
String
|
set to property. Is useful whenever there is a query with a property constraint that is not full-text.
| |
propertyNames
|
Name[]
|
index one property per index. (If multiple properties are indexed within one index, then the index contains all nodes that has either one of the properties, which can make the query less efficient, and can make the query pick the wrong index.)
| |
unique
|
boolean
|
a uniqueness constraint on this property is added. Ensure you set declaringNodeTypes, otherwise all nodes of the repository are affected (which is most likely not what you want), and you are not able to version the node.
| |
declaringNodeTypes
|
Name[]
|
the index only applies to a specified node type
| |
reindex
|
boolean
| ||
includedPaths
|
String[]
|
/
|
The index is only used if the query has a path restriction that is not excluded, and part of the included paths.
|
excludedPaths
|
String[]
|
none
|
The index is only used if the query has a path restriction that is not excluded, and part of the included paths.
|
entryCount
|
Long
|
the estimated number of path entries in the index, to override the cost estimation (a high entry count means a high cost).
| |
keyCount
|
Long
|
the estimated number of keys in the index, to override the cost estimation (a high key count means a lower cost and a low key count means a high cost when searching for specific keys; has no effect when searching for “is not null”).
| |
reindex-async
|
boolean
|
pushing the property index updates to a background job and when the indexing process is done, the property definition will be switched back to a synchronous updates mode. need to start the dedicated background job via a jmx call to the PropertyIndexAsyncReindex#startPropertyIndexAsyncReindex MBean. future
| |
oak:QueryIndexDefinition
|
This is deprecated. The Ordered index is an extension of the Property index. It keeps the order of the indexed property persistent in the repository.
| ||
type
|
String
|
set to ordered. Is useful speed up queries with "ORDER BY", equality and range clauses.
| |
propertyNames
|
Name
|
It has to be a simple value list of type Name.
| |
async
|
String
|
The index can be defind as asynchronous by setting the async property to async.
| |
direction
|
String
|
ascending
|
The direction of the sorting can be configured, by adding the direction property. It can have a value of ascending or descending.
|
reindex
|
boolean
|
The reindex flag which if set to true, will trigger a full content re-index.
| |
oak:QueryIndexDefinition
|
The purpose of the Solr index is mainly full-text search but it can also be used to index search by path, property restrictions and primary type restrictions. This means the Solr index in Oak can be used for any type of JCR query.
| ||
type
|
String
|
set to solr.
| |
async
|
String
|
set to async
| |
reindex
|
boolean
|
The reindex flag which if set to true, will trigger a full content re-index.
|
No comments:
Post a Comment
If you have any doubts or questions, please let us know.