Tracing
 



Summary
This performance Insight article discusses instrumentation for the purpose of tracing and profiling, and why today most such instrumentation efforts have a poor cost-benefit analysis.


Introduction
Visiting customers and reading the various Java related blogs on the web one soon realizes that there are a large number of Java development teams instrumenting their code with tracing and profiling concerns in many different ways. Some of the more common ways include:

- explicit direct and/or indirect calls to java.lang.System.out
- explicit direct and/or indirect calls to java.util.Logger
- explicit direct calls to a custom built Tracer/Profiler class
- injected calls to a tracing aspect via AOP (AspectJ, Spring, JBoss AOP) or Java bytecode manipulation tools (ASM, BCEL)
- container-component request interception (Hibernate, JBoss AS, EJB3, JDO, CORBA, Spring, Servlet Filters)

In general developers understand the importance of instrumenting code and are more than capable of inserting code or injecting byte code that records execution flow via method calls to a tracing concern. A typical tracing concern creates an event from the arguments passed and enhances the event with additional contextual data such as the name of the current executing thread and the timestamp. The trace event is then transferred to one of more trace event stores where the event is written to storage for post processing. A trace store is normally a log file but it could also be a Java message queue, socket connection, or database table.

Some tracing solutions strangely strip away all the important contextual data collected and turn the trace event into a JMX attribute counter. In this case the tracing concern becomes a primitive metric counter with no context data with the user forced to use a simplistic JMX console for performance analysis.


The amount of tracing data collected can be overwhelming for large scale enterprise applications sometimes creating its own performance problems and resulting in additional unplanned work for development teams in creating and maintaining various ad hoc scripts to process and analyses the large volume of generated trace data files. For most projects the costs benefit analysis would not be favourably for the instrumentation effort and performance overhead. The main reasons for this are (1) the limited amount of contextual data collected, (2) inaccurate and incomplete resource consumption usage data between trace event intervals, (3) poor trace event storage management, (4) no support for clusters of managed JVM processes, and (5) inadequate information visualizations to help developers, tester, support and operations to understand the data and derive important insight and value. In short, instrumentation is only the first part of an overall application test, management and support solution. To fully realize the benefits of instrumentation IT management needs to invest in data management, mining and visualization tooling. But tool development is not (and in general should not be) the core business of any IT organization so it would be best to leverage integration possibilities with other tools.


Clock Logging Example




Listed below is the console output which includes both the clock time System.out calls as well as the output from a console log handler. Though this is an extremely simple application and the code has already being presented above it is still not easy to visually process the textual output and relate start and stop trace events as well as calculating the duration times for the different trace intervals one of which is nested. With this kind of output a development team is forced into investing more time and resources creating text processing scripts to extract, filter, aggregate, transform, and present the trace flow information.




Clock Tracer Example
Here is the Clock class changed to call out to JXInsight's Tracer. The method calls Logger.entering() and Logger.exiting() have been changed to Tracer.start() and Tracer.stop() respectively. To be fault tolerant and record thrown exceptions with the Tracer the start and stop calls should ideally be used with a try..catch..finally block.



Upon the first method call to the Tracer class a JXInsight server will be initialized inside the JVM process. Once the server has been initialized the JVM becomes accessible to the JXInsight application analysis and management console and terminal tools. With this tool integration development, testing, and operations staff have a powerful solution that delivers enterprise software quality runtime analysis, inspection and management capabilities across a potential cluster of instrumented JVM processes. So what kind of new capabilities and benefits has the above changes delivered?

- Process and Cluster Management: User defined cluster management. Remote JVM runtime administration.
- Dynamic Monitoring: Change levels of monitoring depending on circumstances. Memory profiling for short periods during problem resolution.
- Runtime State Inspection: On-demand inspection runtime state of components, containers and systems for configuration management.
- Metric Analysis: High level monitoring of JVM and component execution flows for routine service level management.
- Profile Analysis: Aggregated statistics for identified component execution flow patterns. Component runtime dependency analysis. Service and response time distributions.
- Timeline Analysis: Advanced trace and transaction history analysis. Detection of resource concurrency problems including thread safety, deadlocks, and excessive lock contenton.
- Snapshot Management: Reduced problem resolution times via easy snapshot sharing. off-line analysis, and collaborative features such as annotations and notes.
- Terminal Scripting: Automation tooling for performance test data collection as well as routine runtime administration and service level analysis.

Process and Cluster Management
With the easy creation of user defined clusters of managed processes afforded by the JXInsight management console, IT management staff can perform a number of operations at the cluster or process level such as invoking a remote garbage collection cycle, dumping thread stack traces, injecting traces, pinging a process or cluster, resetting analysis snapshots, accessing information about the internal runtime and configuration state of each JVM via JXInsight's JVMInsight repository providers. The management console allows staff to mount and cluster servers based on arbitrary groupings such as application, application type, network subnet, etc. Learn More



Dynamic Monitoring
The level of JVM event monitoring can be dynamically configured from within the management console helping operations staff to capture more detailed information during peak periods, or when alerts have been triggered because of longer response times being reported by users and/or external automated probes. Learn More




Runtime State Inspection with JVMInsight
JVMInsight is an extensible and configurable repository inspection solution that enables the JXInsight management and monitoring console to display additional state information acquired from a server's (local and remote) repositories. Having the ability to access such detailed runtime configuration and state information from a production JVM is crucial in resolving service performance and outage problems. The screen shot below shows call stack inspection of the thread created within the Clock application. Learn More




Metric Analysis
Switching to the Metric analysis mode provides a wealth of high level runtime information for the Clock application including trace statistics, thread wait times and counts, JVM memory heap and on-heap size, number of threads running, GC event cycle times and counts etc... The following screen shot shows some of the 100+ metrics that the JXInsight agent and server will start collecting and recording continously during the lifetime of the JVM process.




Profile Analysis

The Profile analysis mode aggregates clock times and resource consumption times and sizes over the course of the profile window. This information can be accessed in real-time from any number of application management consoles as well as being exported to a snapshot file for off-line analysis. The Traces perspective within the Profile analysis mode shows clearly the trace calling sequence with nesting of child traces within the tree table view. Statistics for clock times, CPU times, GC times, thread wait times, object allocation sizes are recorded for each trace interval. Note: The JXInsight agent delivers a high resolution clock counter for all supported platforms.



Trace injection can be performed from within the management console inserting a new trace identifier onto the trace stack of all future traces within a remote server's JVM. The injection spans thread groups and threads. Injecting traces can be useful when looking to partition aggregated profiling information over different time periods. For operations staff this ability can be used to partition response times while a configuration change is being made to one or more related configuration items. An injected trace can be removed at anytime with all future trace event statistics being aggregated into normal groupings. Learn More



Call stacks are collected for all trace points within both the profile and timeline models. Each model builds a near replicate of the internal JVM class meta model including implementation and interface inheritance trees, class loaders, Java 5 annotations, code sources and technology classifications. The meta model can be browsed from within the Classes perspective of the model (off-line and on-line).




Timeline Analysis

With JXInsight's advanced timeline analysis that can span clusters or JVM processes application management staff have powerful visualizations that can relate trace events in time. The timeline visualization provides distributed concurrency analysis that can help detect resource bottlenecks caused by extreme peaks in workload. Learn More



Snapshot Management

The JXInsight management console provides the ability to export timeline, metrics, and profile snapshots to a mounted file system. Exports can be performed at the cluster or process level. At the cluster level the console will extract each server's snapshot and merge them in memory before writing them to the file system. Snapshots already exported to the file system can later be merged into a consolidated model. Learn More



Offline snapshots can be annotated with notes by development, test and operation team members supporting a collaborative support to performance analysis and management. Note: The notes and contained in a separate file allowing for easy transfer of the snapshot to other teams without sensitive information contained in annotations.




Terminal Scripting

Most of the server command features available in the management console can also be invoked from within the Terminal (command line interface) as well as terminal scripts providing a level of automation to the performance testing. Learn More




Summary
I hope that by highlighting some of the powerful capabilities of JXInsight APTM solution that development staff in future will consider incorporating JXInsight's Tracer API into their internal instrumentation efforts irrespective of the underlying instrumentation technique used.

Today, the HP OpenView ServiceDesk 5.0 development team is enabling the powerful management capabilities of the JXInsight solution by providing a pluggable trace framework that dispatches trace events (begin and end), generated by a custom AOP trace instrumentation technique, to multiple trace listeners one of which calls out to the JXInsight Tracer class.


William Louth, JXInsight's Product Architect