- The current website is built on the technology stack which is now obsolete
- Redesigning/Revamping the existing website, either to address the weaknesses in the current system or to add significant features
- Switching to a new technology platform, such as a new Content Management System (say AEM)
Content Migration is a process of migrating the existing Digital Media of an organization to the new System. This is certainly not a simple process.
A change in technology platforms makes the migration challenges, as does a major restructuring or redesign of the site.
Content Migration can be achieved by either of the following two ways or sometimes combined:
- Manual: Ctrl+C and Ctrl+V are the favorite keyboard shortcuts for every developer. The manual way is always the easiest yet the most painful one. If it is about a few pages, you might want to copy the content from the old site and paste it into the new publishing tool. But, if the old system contains thousands of pages, would you want to follow that route? Maybe you can hire a team of content authors who will do the job for you. But a manual process is error-prone.
- Automated: Option of automating the entire process of migration is clearly an appealing one. Using some tools/methodology where you can define the rules for the migration process. This requires little or no manual effort. Talend Open Studio (ETL tool) is one such tool that can be used to automate the content migration process.
- The input export of the existing content. It can be in any form e.g. Delimited Text file, XML file, etc depending on the existing system.
- The output format i.e. What should be the end result of the migration process? Which data from the existing system should map to the new system (AEM in our case)? You should be clear with all the mapping and transformation rules specific to the new system. As we are dealing with migration to AEM, then we need to define the mappings between the existing content and AEM components. For instance, if the input extract received is an XML file then you would have to define the mappings among XML tags and the properties of an AEM component.
- Loading Mechanism which defines how the content gets loaded into the target system. This is a very important part as a whole migration process will be designed based on the method of the load. We’ve chosen the approach of creating a valid CQ Package that can be installed from CRX package manager. One of the major advantages of using this approach is that we can easily rollback and uninstall the package.
Each block in the above picture is a component, tRunJob in this case which calls another sub-job. The connectors between two such blocks define the transition i.e. how and when do we want the next block to be executed. In this case, these transitions are called triggers.
This main job consists of four sub-jobs. Purpose of each sub-job is explained below:
- Pre-migration Cleanup: This job reads the input content (say XML) and breaks it into smaller manageable chunks (multiple XML files) which can be worked upon individually. The job can be modified to handle scenarios like Internal URL mapping, resolving the character encoding issues, define any tag mapping rules, etc.
- Extraction & Transformation: This job reads the XMLs created in the previous step one by one, transform it to AEM specific .content.xml schema and stores it under the required jcr_root hierarchy on the file system.
- Post Migration Cleanup: This job is required if there are any post-migration cleanups that need to be done.
- Packaging: This is the final step of migration which creates the archive of the pages migrated in the above steps. Keep in mind that the package needs to be AEM compatible i.e. it should contain jcr_root & META_INF folder and associated metadata properties as per AEM packaging standard.
No comments:
Post a Comment
If you have any doubts or questions, please let us know.