AEM_TarMK Corruption Issues


Symptoms of TarMK Corruption
  • Instance is inoperable after offline compaction.
  • Instance stuck in Startup in progress state.
  • Log files or compaction command output report SegmentNotFoundException.
What causes corruption issues
  • The segment is removed by manual intervention (e.g. rm -rf ).   
  • The segment is removed by revision garbage collection or the segment cannot be found due to some bug in the code.   
  • The segment cannot be found due to some bug in the code.
  • Various maintenance tasks are not performed on time leading to repository growth and low disk space.
  • Forcefully stopping AEM by killing java process.
Diagnosing repository corruption issues:
  • Review the error.log file and check if there is SegmentNotFoundException or IllegalArgument Exception.
  • To determine whether a segment has been removed by revision garbage collection,  check the output of the org.apache.jackrabbit.oak.plugins.segment.file.TarReader-GC (enable debug log) logger. That logger logs the segment ids of all segments removed by the cleanup phase. Only when the offending segment id appears in the output of that logger is revision garbage collection the cause for the exception.    
  • In case of corruption in external datastore, search log file for all occurrences of error Error occurred while obtaining InputStream for blobId. This error means that you are missing files from your AEM datastore directory.
Solution to repair corruption issues:
  • Determine the last known good revision of the segment store by using the check run-mode of oak-run.  Manually revert the corrupt segment store to its latest good revision. This operation will revert the Oak repository to a previous state in time.  You should completely backup the repository before performing this operation.
    • To perform check and restore, follow steps mentioned in this article.
    • If the check fails with ConsistencyChecker - No good revisions found then implement the steps in part B of this article.
  • If you are already using a datastore and you encounter the error "Error occurred while obtaining InputStream for blobId", then there are likely files missing from the datastore. Follow this article to resolve the issue.
  • If you are not using a datastore, then use an external file, S3 or Azure datastore, instead of default segmentstore.
    • Using a datastore provides better performance.
    • Migrate the instance to one with a datastore using crx2oak.
  • Apply the latest Service Pack and Cumulative Fix Pack and Oak Cumulative Fix Pack.

No comments:

Post a Comment