Too many workflows in Inbox crash AEM due to pulse.data.json calls

Issue



The AEM author instance is slow and crashing due to pile up of failed workflows and workflow inbox badge pulse.data.json calls. The workflow inbox badge is the bell icon in the upper-right of the Touch UI.
In the access.log, it is observed that many calls to pulse.data.json are occurring from the same users. 

In addition, AEM server shows high CPU utilization and thread dumps captured from AEM threads like the one below: 
Logs:
10.43.34.55 - user 06/Jan/2017:18:17:11 +0900 "GET /mnt/overlay/granite/ui/content/shell/header/actions/pulse.data.json?_=1483664547926 HTTP/1.1" 500 1234

As per thread dumps:

org.apache.jackrabbit.oak.security.authorization.composite.CompositeAuthorizationConfiguration.getPermissionProvider(CompositeAuthorizationConfiguration.java:134)
    at org.apache.jackrabbit.oak.core.MutableRoot$1.createValue(MutableRoot.java:126)
    at org.apache.jackrabbit.oak.core.MutableRoot$1.createValue(MutableRoot.java:123)
    at org.apache.jackrabbit.oak.core.LazyValue.get(LazyValue.java:53)
    - locked <0x0000000726732148> (a org.apache.jackrabbit.oak.core.MutableRoot$1)


Environment

AEM 6.2 SP1

Resolution

Install the latest Cumulative Fix Pack to fix the bug.


I. Apply the Workaround

Since it is not possible to install a fix pack on a production AEM environment in a short timeframe, as a temporary workaround, do the following:




  1. Go to CRXDe http://aem-host:port/crx/de/index.jsp and log in as admin.
  2. Create this folder structure out of sling: Folder nodes /apps/granite/ui/components/shell/clientlibs/shell/js 
  3. Click "Save All"
  4. Browse to overlay for /libs/granite/ui/components/shell/clientlibs/shell/js/badge.js and modify the code as shown below:


    Before:
    1
    2
    3
    setInterval(function() {
    updateBadge(el, src, true);
    }, 2000);


    After (set to update every 5 minutes):
    1
    2
    3
    setInterval(function() {
    updateBadge(el, src, true);
    }, 300000);

II. Purge Old Running Workflows and Tasks

In addition to fixing the interval of the workflow notification badge, the cause for the problem is due to too many tasks pending in the user's inbox.  To address this, you have to delete workflow inbox items and tasks that are no longer needed:




  1. Go to the Workflow Maintenance JMX object:
    http://host:port/system/console/jmx/com.adobe.granite.workflow%3Atype%3DMaintenance


  2. If you don't need actively running workflows, then run the workflow purge on them by initiating purgeActive with dryRun = false.


  3. Go to http://host:port/crx/explorer/index.jsp and log in as admin.





  4. Open Content Explorer.


  5. Browse to /etc/taskmanagement/tasks.


  6. Delete tasks by right clicking the folder node and selecting Delete Recursively.


  7. Disable Preliminary Scan and run the deletion.


  8. In addition, you have more tasks under projects in /content/projects.  Use the /projects.html to remove old projects that are no longer needed.




  9. Use CRXDe to browse /content/projects subnodes and delete any tasks which are no longer required. For example: /content/projects/geometrixx/outdoors/jcr:content/dashboard/gadgets/tasks

No comments:

Post a Comment