April 19, 2020
Estimated Post Reading Time ~

Rebuilding the Sling Launchpad in AEM

Sometimes, while running an AEM system, you’ll one into one of a whole host of error conditions which doesn’t seem to have any decent explanation. A few of them are:

On restarting AEM, some bundles come up in the “INSTALLED” state, but won’t start properly, automatically.
After a deployment, upgrade, service pack release or cumulative fix pack release, AEM fails to start properly on one server, yet it works on another seemingly-identical one.
Changing an OSGI configuration in the Felix console doesn’t take effect – you make the change in, say, /system/console/slinglog, but still don’t see the log you just configured.
Other random class-loading issues, where on some servers (deployed the same way) all of the bundles start, but on one server it refuses to start at all.

If you’ve tried the usual suspects and still are left scratching your head, one go-to troubleshooting step I’ve taken (quite often in fact) is to rebuild the Launchpad.

Table of Contents
  • How the Launchpad Works with your AEM OSGI Configurations
  • Fortunately, Rebuilding the Launchpad is Super Easy
  • Please make a backup first for goodness sakes
  • Caveat: This Won’t Work on AEM 6.3+ Requiring Shared Encryption Keys
How the Launchpad Works with your AEM OSGI Configurations
Adobe Experience Manager is built off of a number of open-source projects, and one of the key components of such is Apache Sling. When you start AEM for the first time, the AEM Quickstart jar file unpacks, and begins its self-configuration to start up. One of those steps is to unpack and create the Sling Launchpad.

When your AEM system starts for the first time, you’ll see log lines that look like this:

12.03.2019 13:19:14.191 *INFO* [FelixStartLevel] org.apache.sling.settings.impl.SlingSettingsServiceImpl Read Sling ID null from file /home/tad/aem/crx-quickstart/launchpad/felix/bundle7/data/sling.id.file
12.03.2019 13:19:14.354 *INFO* [FelixStartLevel] org.apache.sling.launchpad.installer Service [Apache Sling Launchpad Startup Listener,28, [org.apache.sling.installer.api.event.InstallationListener]] ServiceEvent REGISTERED
12.03.2019 13:19:14.357 *INFO* [FelixStartLevel] org.apache.sling.launchpad.installer.impl.LaunchpadConfigInstaller Activating launchpad config installer, configuration path=resources/config, install path=resources/install

As you can see, AEM doesn’t find a SlingID in a Launchpad, sees that it has to create one, and then starts populating configurations into the launchpad.

12.03.2019 13:19:18.425 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Installed bundle com.adobe.cq.cq-deserialization-firewall [22] from resource TaskResource(url=launchpad:resources/install/1/cq-deserialization-firewall-1.0.16.jar, entity=bundle:com.adobe.cq.cq-deserialization-firewall, state=INSTALL, attributes=[org.apache.sling.installer.api.tasks.ResourceTransformer=:26:, Bundle-SymbolicName=com.adobe.cq.cq-deserialization-firewall, Bundle-Version=1.0.16], digest=1552421920000)

Once your first startup is completed, you’ll see a launchpad that looks like this:

tad@tad-ubuntubook:~/aem/crx-quickstart/launchpad$ ls -lathr
total 828K
-rw-r--r-- 1 tad tad 769K Mar 12 13:19 org.apache.sling.launchpad.base.jar.1552421952083
-rw-r--r-- 1 tad tad 9.4K Mar 12 13:19 sling_bootstrap.txt
drwxr-xr-x 3 tad tad 4.0K Mar 12 13:19 startup
drwxr-xr-x 2 tad tad 4.0K Mar 12 13:19 conf
drwxr-xr-x 7 tad tad 4.0K Mar 12 13:19 .
drwxr-xr-x 2 tad tad 4.0K Mar 12 13:25 installer
drwxr-xr-x 521 tad tad 20K Mar 12 13:25 felix
drwxr-xr-x 11 tad tad 4.0K Mar 12 13:25 ..
drwxr-xr-x 4 tad tad 4.0K Mar 12 17:07 config

All of the individual jar files for the bundles are in the launchpad/felix directory, and all of your individual OSGI configurations are in the launchpad/config directory.

Now, lets say you modify (either via the OSGI console or via a REST call or via a package) an OSGI configuration – what happens then? For example, if you were to take your newly-minted AEM server, and go to /system/console/slinglog and make a small modification to your root logger (say, change it from INFO to ERROR or something like this), you’ll see that configuration persisted in a few places.

The source of truth for that configuration is persisted into the JCR, and you’ll see any of these modified OSGI configs in crxde in the /apps/system/config/, as you can see in the screenshot here:


These runtime OSGI configurations are ALSO persisted on disk in the Launchpad, being saved into crx-quickstart/launchpad/config.

tad@tad-ubuntubook:~/aem/crx-quickstart/launchpad/config/org/apache/sling/commons/log/LogManager/factory/config$ cat 7951193d-99c1-492e-938a-1c8ddd1d0ef3.config
org.apache.sling.commons.log.file="logs/error.log"
org.apache.sling.commons.log.level="error"
org.apache.sling.commons.log.names=[ \
"org.apache.sling.scripting.sightly.js.impl.jsapi.ProxyAsyncScriptableFactory", \
]
org.apache.sling.commons.log.pattern="{0,date,dd.MM.yyyy\ HH:mm:ss.SSS}\ *{4}*\ [{2}]\ {3}\ {5}"
service.factoryPid="org.apache.sling.commons.log.LogManager.factory.config"
service.pid="org.apache.sling.commons.log.LogManager.factory.config.7951193d-99c1-492e-938a-1c8ddd1d0ef3"

AEM will then first READ this configuration information out of the launchpad on system startup, to allow AEM to start up faster without having to re-create all of these configs out of the JCR.

This all generally works fine, except sometimes these configs get out of sync, and then you start running into some of the problems listed above. I’ve also seen these problems come about when doing repository & content syncs from one system to another where the running state of one repo doesn’t precisely match the other.

Fortunately, Rebuilding the Launchpad is Super Easy
All you have to do to rebuild the launchpad is this:
Stop AEM
Move the current crx-quickstart/launchpad directory to a temp directory (mv ./crx-quickstart/launchpad /tmp/launchpad.backup)
Start AEM

When AEM starts back up, you’ll see it re-create its sling ID, and start re-creating the launchpad, like the example here:

12.03.2019 17:24:12.538 *INFO* [FelixStartLevel] org.apache.sling.settings.impl.SlingSettingsServiceImpl Read Sling ID null from file /home/tad/aem/crx-quickstart/launchpad/felix/bundle7/data/sling.id.file
12.03.2019 17:24:12.544 *INFO* [FelixStartLevel] org.apache.sling.settings.impl.SlingSettingsServiceImpl Created new Sling ID 6012b6f2-1102-4d2c-8db1-8d04e3b6bbcb


So, before restoring your system from a backup, or doing other costly debugging or open-heart surgery, try this trick. It solves more AEM problems than it should.

Please make a backup first for goodness sakes
Disclaimer: First off, please back up the launchpad to a directory as opposed to deleting it, just in case. I’ve never reverted to a launchpad I moved, but I also don’t want people coming to me telling me “Tad said get rid of it and now AEM is hosed”.

Secondly, I’ve tried this successfully on AEM 6.1, 6.2, 6.3 and 6.4. Other versions of AEM may have different behavior.

Caveat: This Won’t Work on AEM 6.3+ Requiring Shared Encryption Keys
A scenario I was recently reminded of, is noted in the AEM Security Checklist for AEM 6.3+:

Whereas in older versions the replication keys were stored in the repository, beginning with AEM 6.3 they are stored on the filesystem. Therefore, in order to replicate your keys across instances, you need to copy them from the source instance to the target instances’ location on the filesystem.https://helpx.adobe.com/experience-manager/6-5/sites/administering/using/security-checklist.html

In this case, the “location on the filesystem” is in the launchpad. So, if you ever need to rebuild your launchpad, you would need to make sure you have automation in place to deploy out your encryption keys to the rebuilt instance so that you don’t break your authentication.



By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.