April 25, 2020
Estimated Post Reading Time ~

Migrating Jackrabbit 2.X to Jackrabbit Oak



In the last Service Pack announcement for AEM, there was a preannouncement made:

Preannouncement: Removal of CRX2 in next AEM release

AEM 6.0 ships with a compatibility option to run the server with the repository technology that is used in AEM 5.6.1. The documentation also refers to it as CRX2 or Apache Jackrabbit 2.x.

AEM 6.0 introduces the next generation Java Content Repository that the documentation refers to as Apache Jackrabbit OAK or CRX3. It comes with many significant improvements, mainly with regards to scalability, performance, and optimizations for AEM capabilities. For more info, please see Introduction to OAK.

Adobe is planning to remove the compatibility option in the next release of Adobe Experience Manager. The update process of the next release will contain the option to switch to the OAK based repository during the update that can be used for customers that want to update from AEM 5.x or AEM 6.0 using the compatibility option.

Learn more about the available persistence options of OAK. Current info for AEM 6.0 is available on Recommended Deployment.

Therefore, anyone who is still using the CRX2 repository with AEM 6, should do the repository migration in the coming months.

In this post, we will explain three basic steps that need to be done when starting the repository migration process.

These three steps are:
  • Selecting my MK
  • The migration itself
  • Indexing
1. Selecting my MK
AEM 6 introduced the Micro Kernels, which provide an abstraction layer for the actual storage of the content and come with two implementations at the moment, TarMK and MongoMK.

So, the first decision when migrating is: How can I decide which MK is the right one for my needs? Tar or Mongo?

For the Author:
  • The default option is TarMK.
  • If our concern is the Reliability: we should use TarMk + Failover instances.
  • And if the concern is Scalability: use a MongoMK cluster.
For the Publisher:
Publishing is our primary case, use TarMK as a “Farm”.
User-generated content is the primary case, then, use MongoMK in a publish cluster.

For more information on selection of deployment types, go to http://docs.adobe.com/docs/es/aem/6-0/deploy/recommended-deploys.html

2. The migration itself
When you have decided which MK fits your needs, it’s time to do the migration itself.

It’s a simple procedure where you have to use the crx2oak-1.0.0.zip tool, which was made by Adobe to ease the process, and copy & paste folders in different locations.

We are not describing the process here, as is pretty straight forward, you can see the step by step guide at http://docs.adobe.com/docs/es/aem/6-0/deploy/upgrade.html

3. Indexing
OAK doesn’t automatically index content as Jackrabbit 2 did. Custom indexes need to be created when necessary, much like with traditional relational databases. If there is no index for a specific query then the whole repository will be traversed, meaning that all your queries will still work but probably be very slow.

Indexer types and cost calculation
The new Apache Oak based backend allows different indexers to be plugged into the repository. The standard indexer is the Property Index, for which the index definition is stored in the repository itself.

If multiple indexers are available for a query, each available indexer estimates the cost of executing the query. OAK then chooses the indexer with the lowest estimated cost.

A simple explanation of the index selecting procedure is this:



As we mentioned before, indexes must be created and specified manually.

Indexes are configured as nodes in the repository under the oak:index node. The type of the node must be oak:QueryIndexDefinition, there are many configuration options depending on which type of index you want to create.

The types that can be created are:
Property Index: useful for queries that have property constraints but are not full-text.
Ordered Index: extension of the Property index. It keeps the order of the indexed property persistent in the repository.
Lucene Full-text Index.
Lucene Property Index: used to create indexes that involve property constraints that are not full-text.
SOLR Index: The purpose is mainly full-text search but it can also be used to index search by path, property restrictions, and primary type restrictions.

For instructions on creating each type of index, go to http://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/queries-and-indexing.html

Accomplishing these three steps will bring you closer to have a fully working OAK repository. Of course that there are many changes in the underlying APIs that have to be solved, import your content again, and things that worked before, might not work properly now, but we still have taken a huge step towards a migrated OAK repository.


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.