January 2, 2021
Estimated Post Reading Time ~

Managing Ghost Users & Repository Growth in AEM

Why do we need to remove users from the Java Content Repository (JCR)?
The answer is simple – we continue to add new users to JCR. As days go by, the size of the repository grows. The child nodes under the “/home/users” path grow. Any query run on the user path will have a performance issue. To increase the performance of the query or traversal of child nodes under the user path, we can remove unwanted nodes. These nodes are ghost user nodes.

What creates ghost user nodes?
  • After registering to the site, a user does not visit the site for an extended period of time. The user node resides in the repository as a ghost.
  • The corporate portal hosted in AEM will have employee information as user nodes. One common scenario occurs when an employee has left the company. Despite the individual leaving, the user node created in AEM will reside as a ghost user node.
  • A user who registered on the website may forget his credentials and subsequently create a new account. When this happens, the old account will remain on the repository as a ghost user node.
Of course, depending on the application, there may be different scenarios unique to that application that cause the creation of ghost user nodes.

How to solve the repository growth caused by ghost user nodes
“Exorcism” on the server? Not exactly. There is a simple solution to remove the unused user nodes.

Factors to consider:
  • Avoid deleting system users and OOTB users.
  • Look closely at users who have not logged in more than 6 months.
  • AEM is often not a single source of user data.
  • If AEM is the single source of user data, consider your corporate policy (in some cases, user data cannot be deleted).
  • Remove the authors who have left the company or are not working in AEM currently.
  • Consider any individual users who are important to the system and company; do not remove them.
What are the options to delete?
  • If the user source is an external system, obtain the list of users who have not logged into AEM in the last six months. Next, run a script to remove those users from AEM publish servers.
  • If the user source is from LDAP, make use of “org.apache.jackrabbit.oak: External Identity Synchronization Management (UserManagement)” JMX Service purgeOrphanedUsers() method to remove ghost users in the repository.
  • Create a JCR query to fetch all the users who have not logged into AEM for a long period of time. Consider the above factors as query parameters.
Sample query:
SELECT * FROM [rep:User] AS s WHERE ISDESCENDANTNODE([/home/users]) AND NOT ISDESCENDANTNODE([/home/users/system]) AND NOT s.”rep:authorizableId” IN (“admin”, “anonymous”,”replication-agent”) AND (s.”jcr:created” < CAST(‘2017-02-20T18:26:49.328-06:00’ AS DATE)) AND s.”rep:externalId” IS NOT NULL

Note: Before running this solution in the production environment, have a comprehensive plan including a detailed backup strategy (should you need it).


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.