This performance Insight article discusses instrumentation for the purpose of tracing and profiling, and why today most such instrumentation efforts have a poor cost-benefit analysis.
Introduction
Visiting customers and reading the various Java related blogs on the web one soon realizes that there are a large number of Java development teams instrumenting their code with tracing and profiling concerns in many different ways. Some of the more common ways include:
- explicit direct and/or indirect calls to
java.lang.System.out
- explicit direct and/or indirect calls to
java.util.Logger
-
explicit direct calls to a custom built
Tracer/
Profiler class
- injected calls to a tracing aspect via AOP (
AspectJ,
Spring,
JBoss AOP) or Java bytecode manipulation tools (
ASM,
BCEL)
- container-component request interception (
Hibernate,
JBoss AS, EJB3, JDO, CORBA,
Spring, Servlet Filters)
In general developers understand the importance of instrumenting code and are more than capable of inserting code or injecting byte code that records execution flow via method calls to a tracing concern. A typical tracing concern creates an event from the arguments passed and enhances the event with additional contextual data such as the name of the current executing thread and the timestamp. The trace event is then transferred to one of more trace event stores where the event is written to storage for post processing. A trace store is normally a log file but it could also be a Java message queue, socket connection, or database table.
Some tracing solutions strangely strip away all the important contextual data collected and turn the trace event into a JMX attribute counter. In this case the tracing concern becomes a primitive metric counter with no context data with the user forced to use a simplistic JMX console for performance analysis.
The amount of tracing data collected can be overwhelming for large scale enterprise applications sometimes creating its own performance problems and resulting in additional unplanned work for development teams in creating and maintaining various ad hoc scripts to process and analyses the large volume of generated trace data files. For most projects the costs benefit analysis would not be favourably for the instrumentation effort and performance overhead. The main reasons for this are (1) the limited amount of contextual data collected, (2) inaccurate and incomplete resource consumption usage data between trace event intervals, (3) poor trace event storage management, (4) no support for clusters of managed JVM processes, and (5) inadequate information visualizations to help developers, tester, support and operations to understand the data and derive important insight and value. In short, instrumentation is only the first part of an overall application test, management and support solution. To fully realize the benefits of instrumentation IT management needs to invest in data management, mining and visualization tooling.
But tool development is not (and in general should not be) the core business of any IT organization so it would be best to leverage integration possibilities with other tools.
Clock Logging Example
Listed below is the console output which includes both the clock time
System.out calls as well as the output from a console log handler. Though this is an extremely simple application and the code has already being presented above it is still not easy to visually process the textual output and relate start and stop trace events as well as calculating the duration times for the different trace intervals one of which is nested. With this kind of output a development team is forced into investing more time and resources creating text processing scripts to extract, filter, aggregate, transform, and present the trace flow information.
Clock Tracer Example
Here is the
Clock class changed to call out to JXInsight's
Tracer. The method calls
Logger.entering() and
Logger.exiting() have been changed to
Tracer.start() and
Tracer.stop() respectively.
To be fault tolerant and record thrown exceptions with the Tracer the start and stop calls should ideally be used with a try..catch..finally block.
Upon the first method call to the
Tracer class a JXInsight server will be initialized inside the JVM process. Once the server has been initialized the JVM becomes accessible to the JXInsight application analysis and management console and terminal tools. With this tool integration development, testing, and operations staff have a powerful solution that delivers enterprise software quality runtime analysis, inspection and management capabilities across a potential cluster of instrumented JVM processes. So what kind of new capabilities and benefits has the above changes delivered?
-
Process and Cluster Management: User defined cluster management. Remote JVM runtime administration.
-
Dynamic Monitoring: Change levels of monitoring depending on circumstances. Memory profiling for short periods during problem resolution.
-
Runtime State Inspection: On-demand inspection runtime state of components, containers and systems for configuration management.
-
Metric Analysis: High level monitoring of JVM and component execution flows for routine service level management.
-
Profile Analysis: Aggregated statistics for identified component execution flow patterns. Component runtime dependency analysis. Service and response time distributions.
-
Timeline Analysis: Advanced trace and transaction history analysis. Detection of resource concurrency problems including thread safety, deadlocks, and excessive lock contenton.
-
Snapshot Management: Reduced problem resolution times via easy snapshot sharing. off-line analysis, and collaborative features such as annotations and notes.
-
Terminal Scripting: Automation tooling for performance test data collection as well as routine runtime administration and service level analysis.
Process and Cluster Management
With the easy creation of user defined clusters of managed processes afforded by the JXInsight management console, IT management staff can perform a number of operations at the cluster or process level such as invoking a remote garbage collection cycle, dumping thread stack traces, injecting traces, pinging a process or cluster, resetting analysis snapshots, accessing information about the internal runtime and configuration state of each JVM via JXInsight's JVMInsight repository providers. The management console allows staff to mount and cluster servers based on arbitrary groupings such as application, application type, network subnet, etc.
Learn More
Dynamic Monitoring
The level of JVM event monitoring can be dynamically configured from within the management console helping operations staff to capture more detailed information during peak periods, or when alerts have been triggered because of longer response times being reported by users and/or external automated probes.
Learn More

Runtime State Inspection with JVMInsight
JVMInsight is an extensible and configurable repository inspection solution that enables the JXInsight management and monitoring console to display additional state information acquired from a server's (local and remote) repositories. Having the ability to access such detailed runtime configuration and state information from a production JVM is crucial in resolving service performance and outage problems. The screen shot below shows call stack inspection of the thread created within the
Clock application.
Learn More
Metric Analysis
Switching to the Metric analysis mode provides a wealth of high level runtime information for the
Clock application including trace statistics, thread wait times and counts, JVM memory heap and on-heap size, number of threads running, GC event cycle times and counts etc... The following screen shot shows some of the 100+ metrics that the JXInsight agent and server will start collecting and recording continously during the lifetime of the JVM process.