Skip to content

Instrumenting Every Line of Code Without Slowing Down Every Request?

“Very few businesses realize how much valuable information is locked in their application code. By instrumenting every line of code in your business, you have a window into what’s going on in your business in real time. The possibilities for how you can use that data are endless.” – CTO, APM Vendor

I’ve pulled the above statement from an interview, just published a few days ago, with the CTO of highly marketed APM company. What surprised me the most about this statement was not the fact this was coming from a company that today relies mainly on the use of call stack sampling to collect measurement data but that after more than +10 years in the industry he would consider this actually feasible.

Though our technology offers the lowest measurement overhead on the market (and we’ve benchmarks to back that claim up), instrumenting every line of code is a recipe for disaster in production unless this “every line” set is adaptively shrunk based on an assessment of actual costs, value, risk (cover) and context. Even just instrumenting code but not actually performing any measurement will kill the throughput and response times for an application. If you are not convinced then add -XX:+ExtendedDTraceProbes to the command line of your Java application or service. This is effectively like instrumenting every line of your code. In our own benchmarking of this option we have observed more than 5x drop in throughput for Apache Cassandra 2.0.x and that was without any D script active and collecting data. The only time this would not create an observable performance impact would be when you have a huge database latency cost that dwarfs the cost of any code execution within the request processing pipeline.

Instrumenting every line of Java code would only ever be feasible if you could adaptively shrink the instrumentation set at runtime based on a cost-benefit analysis or similar evaluation strategy. This is exactly what we have done with intelligent activity metering (IAM) in adaptively eliminating instrumentation, measurement and data collection costs based on one or more behavioral aspects of a probe or runtime. But dynamically reducing instrumentation at runtime does not address all scaling problems that go with high code coverage as we still need to weigh up the cost of measurement, collection, transmission and storage. If you are using any other performance monitoring technology you don’t even have this luxury because whatever is instrumented is measured and what is measured is collected, transmitted and stored. That is if indeed your application performance monitoring solution does instrument and not just collect call stack samples. It can at times appear as an “all or nothing” proposition. And as you attempt to get greater control by increasing measurement you move further away from it as the overhead cost negatively impacts the code execution which in turn lessens the value of what is collected.


Over the  years I have devised many novel approaches and techniques in order to slay this dragon and have my cake and eat it, some extremely effective, others less spectacular, with still more brewing in the labs including Signals, which I am most excited about. The most successful released approach up to now, and one that plays to the very nature of the JVM, is based on adaptive measurement, which if carefully crafted (coded) can result in this process being transformed into adaptive instrumentation as the dynamic hotspot compiler works its magic. What is wonderful about this approach is that after a period of measurement, under actual (or realistic) production workloads, all that remains measured is what should only be measured. This approach not only benefits the application but also the person that needs to view the results. Less is more!



Adaptive measurement is so important to everything else we do including adaptive control, QoS, recording as well as simulation, that there is an actual row section in our monitoring console that allows you to see this process in action and determine how much signal noise might be present.

The following cropped screenshot was taken during the initial start of an Apache Cassandra 2.0.2 benchmark run with a single thread performing 500K inserts. Half way during the stress execution we can see that 3,314 or more methods had been instrumented. 605 of which were labeled as disabled. 35, that is 1%, of the probes were labeled as a hotspot and of those 19 were deemed super hot (!managed) and not requiring any further hotspot evaluation.

Note: The number of methods instrumented can be greater than the number of probes as we don’t by default treat overloaded methods within the same class separately.


At the end of the 500K insert execution the number of probes labeled as disabled had risen to 713 from 605. The hotspot count has increased from 35 to 39 with !managed growing to 21 from 19.


After the second batch of 500K inserts the disabled count is now at 724 with a jump of 2 in the hotspot count.


With the execution of a third batch the disabled count has jumped from 724 to 759. Whilst all of this is going on the throughput is gradually getting higher and higher. This adaptive process will continue throughout the running.


You might ask what of the 2,500 or so probes that are still enabled? Why is the adaptive measurement not so effective in closing the gap between what has been instrumented and what has been disabled (with instrumentation possibly removed)? The disablement process needs to observe and assess many invocations of a particular probe or method. Many of these 2,500 or so probes will probably have only ever executed once (during class initialization). And whilst the instrumentation does remain intact within such probes (methods) it does not matter so much because they don’t actually get executed.

Far more important to know is how much of the instrumented method invocation that was measured and collected related to probes not labeled hotspot. Ideally you want only to be measuring hotspots or better still non-managed (!managed) hotspots. This frequency breakdown analysis is performed within our monitoring console in 1 second intervals. For each label classification we list its frequency, per second, in [] brackets. Below you can see that a total of 2,158,484 method invocations were metered in one particular second interval for single Apache Cassandra 2.0.2 server node benchmark run.  The frequency attributed to hotspot labeled probes was 2,158,138. For non-managed (a subset of hotspot) probes the frequency was 2,157,796.


Later in the stress test run the number of metered method invocations per second has risen from 2.1 million to 2.7 million and the difference between the hotspot and !managed labeled probes is miniscule – just 229.


Because this adaptive mechanism eliminates unnecessary measurements it also reduces unnecessary data collection costs that would be incurred in the updating of statistical data structures such as quantization tables. To reduce the memory footprint of such data structures we can piggyback on the labeling performed by the adaptive mechanism by conditional guarding the execution of optional data collection until a probe has obtained a certain label.


The following config line items only perform quantization of metering measurements when a probe has been labeled as a hotspot.


Whilst I’ve long searched for ways to instrument more and more execution behavior I am well aware of the cost overhead in doing so and the law of diminishing returns in relation to the value obtained. I don’t know of any sensible person that thinks every line of code in an application has real business value. The problem is we don’t know which lines do, so we must instrument to some degree but then learn and adaptively changing course as well as the code itself. The most disappointing part in all of this is that today we still don’t have a standard runtime library that enables the development of similar adaptive mechanisms.

Further Reading
Performance Measurement Truths – Rarely Pure and Never Simple
Reality, Reactivity, Relevance and Repeatability in Java Application Profiling
Why every nanosecond counts in application performance management?