Troubleshooting Slow Queries in AEM


Statement: Slow Query Classifications


Solution :



Slow Query Classifications



There are 3 main classifications of slow queries in AEM, listed by severity:
  1. Index-less queries
    • Queries that do not resolve to an index and traverse the JCR's contents to collect results
  2. Poorly restricted (or scoped) queries
    • Queries that resolve to an index, but must traverse all index entries to collect results
  3. Large result set queries
    • Queries that return very large numbers of results.

Note: 
  • The first 2 classifications of queries (index-less and poorly restricted) are slow, because they force the Oak query engine to inspect each potential result (content node or index entry) to identify which belong in the actual result set. 
  • In AEM 6.3, by default, when a traversal of 100,000 is reached, the query fails and throws an exception. 
  • This limit does not exist by default in AEM versions prior to AEM 6.3, but can be set via the Apache Jackrabbit Query Engine Settings OSGi configuration and QueryEngineSettings JMX bean (property LimitReads).

1. Detecting Index-less Queries



During Development



Explain all queries and ensure their query plans do not contain the /* traverse explanation in them. Example traversing query plan:
  • PLAN: [nt:unstructured] as [a] /* traverse "/content//*" where ([a].[unindexedProperty] = 'some value') and (isdescendantnode([a], [/content])) */


Post-Deployment



  • Monitor the error.log for index-less traversal queries:
    • *INFO* org.apache.jackrabbit.oak.query.QueryImpl Traversal query (query without index) ... ; consider creating and index
    • This message is only logged if no index is available, and if the query potentially traverses many nodes. Messages are not logged if an index is available, but amount to traversing is small, and thus fast.
  • Visit the AEM Query Performance operations console and Explain slow queries looking for traversal or no index query explanations.

Query Performance



The Query Performance page allows the analysis of the slowest queries performed by the system. This information is provided by the repository in a JMX Mbean. 
In Jackrabbit, the com.adobe.granite.QueryStat JMX Mbean provides this information, while in the Oak repository, it is offered by org.apache.jackrabbit.oak.QueryStats.
The page displays:
  • The time when the query was made
  • The language of the query
  • The number of times the query was issued
  • The statement of the query
  • The duration in milliseconds


chlimage_1


Explain Query



For any given query, Oak attempts to figure out the best way to execute based on the Oak indexes defined in the repository under the oak:index node.
 Depending on the query, different indexes may be chosen by Oak. Understanding how Oak is executing a query is the first step to optimizing the query.
The Explain Query is a tool that explains how Oak is executing a query. It can be accessed by going to Tools - Operations - Diagnosis from the AEM Welcome Screen, then clicking on Query Performance and switching over to the Explain Querytab.
Features
  • Supports the Xpath, JCR-SQL and JCR-SQL2 query languages
  • Reports the actual execution time of the provided query
  • Detects slow queries and warns about queries that could be potentially slow
  • Reports the Oak index used to execute the query
  • Displays the actual Oak Query engine explanation
  • Provides click-to-load list of Slow and Popular queries
Once you are in the Explain Query UI, all you need to do in order to use it is enter the query and press the Explain button:


chlimage_1


The first entry in the Query Explanation section is the actual explanation. The explanation will show the type of index that was used to execute the query.
The second entry is the execution plan.
Ticking the Include execution time box before running the query will also show the amount of time the query was executed in, allowing for more information that can be used for optimizing the indexes for your application or deployment.

chlimage_1

Detecting Poorly Restricted Queries

During Development



Explain all queries and ensure they resolve to an index tuned to match the query's property restrictions.
  • Ideal query plan coverage has indexRules for all property restrictions, and at a minimum for the tightest property restrictions in the query.
  • Queries that sort results, should resolve to a Lucene Property Index with index rules for the sorted by properties that set orderable=true.

For example, the default cqPageLucene does not have an index rule for jcr:content/cq:tags


Before adding the cq:tags index rule
  • cq:tags Index Rule
    • Does not exist out of the box
  • Query Builder query
    • type=cq:Page
      property=jcr:content/cq:tags
      property.value=my:tag
  • Query plan
    • [cq:Page] as [a] /* lucene:cqPageLucene(/oak:index/cqPageLucene) *:* where [a].[jcr:content/cq:tags] = 'my:tag' */
This query resolves to the cqPageLucene index, but because no property index rule exists for jcr:content or cq:tags, when this restriction is evaluated, every record in the cqPageLucene index is checked to determine a match. This means that if the index contains 1 million cq:Page nodes, then 1 million records are checked to determine the result set.
After adding the cq:tags index rule
  • cq:tags Index Rule
    • /oak:index/cqPageLucene/indexRules/cq:Page/properties/cqTags
      @name=jcr:content/cq:tags
      @propertyIndex=true
  • Query Builder query
    • type=cq:Page
      property=jcr:content/cq:tags
      property.value=myTagNamespace:myTag
  • Query plan
    • [cq:Page] as [a] /* lucene:cqPageLucene(/oak:index/cqPageLucene) jcr:content/cq:tags:my:tag where [a].[jcr:content/cq:tags] = 'my:tag' */
The addition of the indexRule for jcr:content/cq:tags in the cqPageLucene index allows cq:tags data to be stored in an optimized way.
When a query with the jcr:content/cq:tags restriction is performed, the index can look up results by value. That means that if 100 cq:Page nodes have myTagNamespace:myTag as a value, only those 100 results are returned, and the other 999,000 are excluded from the restriction checks, improving performance by a factor of 10,000.

Post-Deployment



  • Monitor the error.log for travesal queries:
    • *WARN* org.apache.jackrabbit.oak.spi.query.Cursors$TraversingCursor Traversed ### nodes ... consider creating an index or changing the query
  • Visit the AEM Query Performance operations console and Explain slow queries looking for query plans that do not resolve query property restrictions to index property rules.


Detecting Large Result Set Queries


During Development


Set low threshholds for oak.queryLimitInMemory (eg. 10000) and oak.queryLimitReads (eg. 5000) and optimize the expensive query when hitting an UnsupportedOperationException saying “The query read more than x nodes..."

Post-Deployment


  • Monitor the logs for queries triggering large node traversal or large heap memory consumption :
    • *WARN* ... java.lang.UnsupportedOperationException: The query read or traversed more than 100000 nodes. To avoid affecting other tasks, processing was stopped.
    • Optimize the query to reduce the number of traversed nodes
  • Monitor the logs for queries triggering large heap memory consumption :
    • *WARN* ... java.lang.UnsupportedOperationException: The query read more than 500000 nodes in memory. To avoid running out of memory, processing was stopped
    • Optimize the query to reduce the heap memory consumption
For AEM 6.0 - 6.2 versions, you can tune the threshold for node traversal via JVM parameters in the AEM start script to prevent large queries from overloading the environment. The recommended values are :
  • -Doak.queryLimitInMemory=500000
  • -Doak.queryLimitReads=100000
In AEM 6.3, the above 2 parameters are preconfigured by default, and can be modified via the OSGi QueryEngineSettings.

Query Development Tools

Adobe Supported

Community Supported


  • Oak Index Definition Generator
    • Generate optimal Lucence Property Index from XPath or JCR-SQL2 query statements.
  • AEM Chrome Plug-in
    • Google Chrome web browser extension that exposes per-request log data, including executed queries and their query plans, in the browser's dev tools console.
    • Requires Sling Log Tracer 1.0.2+ to be installed and enabled on AEM.

AEM Code Pitfalls cases

Statement - Code pitfalls in AEM

Solution :

  • Avoid Sling Bindings in Java code

        Sling Bindings are an inappropriate way to get access to a service in 90% of cases. Instead,          you should use @Reference or @Inject annotations.

  • Avoid Thread.interrupt in Java code

         Thread.interrupt is dangerous because it can close files, including Lucene files and                          persistent cache files, when called at the wrong time.

  • Avoid mixing Java synchronization with ReadWriteLocks

          This can lead to a race condition in which the code will eventually deadlock.    

Coding Tips in AEM

  • Use taglibs or HTL as much as possible

Including scriptlets in JSPs makes it difficult to debug issues in the code.  Additionally, by including scriptlets in JSPs, it is difficult to separate business logic from the view layer, which is a violation of the Single Responsibility Principle and the MVC design pattern.

  • Write readable code

Code is written once, but read many times.  Spending some time up front to clean the code that we write will pay out dividends down the road as we and other developers need to read it later.

  • Choose intention-revealing names

In the AEM code base, the following conventions are used:
  • A single implementation of an interface is named Impl, i.e. ReaderImpl.
  • Multiple implementations of an interface are named , i.e. JcrReader and FileSystemReader.
  • Abstract base classes are named Abstract or Abstract.
  • Packages are named com.adobe.product.module.  Each Maven artifact or OSGi bundle must have its own package.
  • Java implementations are placed in an impl package below their API.
Ideally, names should reveal their intention.  A common code test for when names are not as clear as they should be is the presence of comments explaining what the variable or method is for:
Unclear
Clear
int d; //elapsed time in days
int elapsedTimeInDays;
//get tagged images
public List getItems() {}
public List getTaggedImages() {}

  • Don't repeat yourself

DRY states that the same set of code should never be duplicated.  This also applies to things like string literals.  
Code duplication opens the door for defects whenever something has to change and should be sought out and eliminated.

  • Avoid naked CSS rules

CSS rules should be specific to your target element in the context of your application. 
 For example, a CSS rule applied to .content .center would be overly broad and could potentially end up impacting lots of content across your system, requiring others to override this style in the future. 
 .myapp-centertext would be a more specific rule as it is specifying centered text in the context of your application.

  • Eliminate usage of deprecated APIs

When an API is deprecated, it is always better to find the new recommended approach instead of relying on the deprecated API.  This will ensure smoother upgrades in the future.

  • Write localizable code

Any strings that are not being supplied by an author should be wrapped in a call to AEM’s i18n dictionary through I18n.get() in JSP/Java and CQ.I18n.get() in JavaScript.  
This implementation will return the string that was passed to it if no implementation is found, so this offers the flexibility of implementing localization after implementing the features in the primary language.

Escape resource paths for safety

While paths in the JCR should not contain spaces, the presence of them should not cause code to break. 
Jackrabbit provides a Text utility class with escape() and escapePath() methods.  For JSPs, Granite UI exposes a granite:encodeURIPath() ELfunction.

Use the XSS API and/or HTL to protect against cross-site scripting attacks

AEM provides an XSS API to easily clean parameters and ensure safety from cross-site scripting attacks.  Additionally, HTL has these protections built directly into the templating language.  An API cheat sheet is available for download at Development - Guidelines and Best Practices.
Implement appropriate logging
For Java code, AEM supports slf4j as the standard API for logging messages and should be used in conjunction with the configurations made available through the OSGi console for the sake of consistency in administration.  
Slf4j exposes five different logging levels. We recommend using the following guidelines when choosing which level to log a message at:
  • ERROR: When something has broken in the code and processing cannot continue.  This will often occur as a result of an unexpected exception.  It is usually helpful to include stack traces in these scenarios.
  • WARN: When something has not worked properly, but processing can continue.  This will often be the result of an exception that we expected, such as a PathNotFoundException.
  • INFO: Information that would be useful when monitoring a system.  Keep in mind that this is the default and that most customers will leave this in place on their environments. Therefore, do not use it excessively.
  • DEBUG: Lower level information about processing. Useful when debugging an issue with support.
  • TRACE: The lowest level information, things like entering/exiting methods. This will usually only be used by developers.
In the case of JavaScript, console.log should only be used during development and all log statements should be removed before release.

Avoid cargo cult programming

Avoid copying code without understanding what it does.  When in doubt, it is always best to ask someone who has more experience with the module or API that you are not clear on.