Skip to content

Archive for February, 2012

JXInsight/OpenCore 6.3.2 Released

The second update to JXInsight/OpenCore 6.3 has been published on our developer site. This release includes a new intelligent software activity metering extension, outline, that dynamically enables metering of probes which represent root or leaf call(er|ee) tree nodes. By default all probes enhanced by this extension are disabled in terms of metering unless fired as a root caller or have previously being an unmetered call leaf.

Note: When the metering runtime has other intelligent activity metering extensions enabled the call tree outline can over time move inwards as leaf nodes are disabled by such extensions and their callers become leafs themselves (in terms of metering).

Note: It is also possible, via a system property, to disable the metering of root callers.

Event: Low Latency Performance Measurement – London Java Community 13th March ’12

William Louth, the architect of JXInsight and our new revolutionary Signals technology, will be giving a FREE talk at The Skills Matter eXchange in London on the 13th March 2012. In his talk he will discuss the challenges in the performance measurement of low latency high throughput applications and how best to derive the greatest amount of relevant measurement data whilst limiting instrumentation and measurement overhead to a minimum or within predefined budgets. Various approaches to software performance engineering (SPE) and application performance management (APM) will be covered including call sampling, call/event tracing, transactions monitoring, activity/resource metering as well as signaling as a much more efficient means of eventing and logging that also includes aspects of complex event processing (CEP).

http://www.meetup.com/Londonjavacommunity/events/52286902/

Case Study: Scala Compiler – Part 10

This is the tenth part in a series of articles on the “how to of software performance analysis” using JXInsight/OpenCore. Please read the firstsecondthirdfourthfifth sixthseventheight and ninth parts as well as the preamble before proceeding if you have not already done so.

In this final part in the series we look at how to achieve some degree of continuous performance management (CPM) across releases by comparing particular elements of the metering performance model collected during performance testing cycles.

Continuous performance management can be somewhat difficult to achieve in practice. One of the main stumbling blocks is being able to get repeatable results (model) across multiple executions of the same performance test and release code. If there is significant variance across such runs then how will you be able to distinguish from a test anomaly and an actual underlying behavioral and consumption change. Fortunately while there will always be some differences in the model there will generally be a number of execution points that contribute significantly to the overall execution cost and which have relatively little variation. These execution points once identified can then be tracked in various ways across releases (or changes).

To determine the degree of variation in the Scala compiler compilation test I executed the following script 4 times using the agent bundle generated from the very last IMRR loop of the hotspot performance execution in part 6.

../../bin/scalac -J-agentpath:/OpenCore/bin/osx-64/libjxinsight.jnilib=prod -J-javaagent:./opencore-ext-aj-javaagent.jar -J-noverify -J-Xmx1024M -nobootcp -d classes @classes.list

I then copied then 4 *-probes.ocs metering snapshot files, exported on test completion, to a compare directory and executed the following script.

java -jar /OpenCore/console/opencore-console.jar -compare --probe --total ./compare/

Here is the resulting metrics snapshot generated from the metering snapshot charting the metering clock.time total for each probe across the 4 snapshots. The first two numeric columns show the first and last metric value. The last column shows the difference between the lowest and highest value across all snapshots for a particular metric (${probe}.${meter}.total). The greatest degree of variation (10-15%) is in the IO related classes (to be expected) with a relatively small variation (2-4%) in other more cost (latency) consuming execution points.

Note: The embedded sparkline graphic uses a minimum value baseline per metric instead of a zero baseline accentuating variation.

If a performance test varies in terms of throughput you can instead track the count metric (--count) or if performance (as in timing) is still of concern use the average metric (--avg).

java -jar /OpenCore/console/opencore-console.jar -compare --probe --avg ./compare/

It it even possible to track performance variation across snapshots in terms of inherent (self) averages times.

java -jar /OpenCore/console/opencore-console.jar -compare --probe --iavg ./compare/

If you are handy with writing text processing scripts you can extract the metering model from the snapshot using the -print-probes-metering console command.

java -jar /OpenCore/console/opencore-console.jar -print-probes-metering --probe --avg ./compare/probes-1.ocs

There is nothing to stop you using the Probes Open API to create a metering ProbesPlugin which monitors the performance test live or during a playback of a recording as introduced in part 9.

You can also use the Snapshot interface in the Probes Open API to inspect offline the metering model contained within a snapshot with the added advantage that you can fine tune any thresholds used in assert expressions without having to re-run whole performance tests every time there is a change accepted in a performance baseline or objective.

If you wish you can develop more elaborate or tailored offline delta analysis reporting tools on top of the Probes Open API.

Software performance engineering (SPE) is not about the occasional firing up of a profiler when a severe problem is detected during test or worse in production and hoping for the best that the underlying issue is of the low hanging fruit type and reproducible. It is very much the process of knowledge discovery and acquisition of a software’s execution behavior and resource consumption patterns which can better enable us to manage applications as they move from development through test and finally into production. We have only touched on a small number of the activities performed in practicing SPE as well as the vast capabilities offered by our incredibly powerful, versatile and extensible metering engine. In our next case study we will explore these and much more as we shift focus to the operational aspects of performance monitoring and management (control) in production.

Visions of Cloud Computing – PaaS Everywhere

Just came back from the CloudConnect conference in Santa Clara and after visiting a very large number of vendor booths on the expo floor I was somewhat disappointed at the lack of progress in the solutions being offered. There was pretty much no differentiation. Every vendor had their “me too” management console focused on VM provision and monitoring with some degree of AWS integration. There was some sparks of innovation in the storage pace but nothing radical that would really take us all to the next stage in cloud computing – getting rid of the computer (at least vm image). In fact some of the vendors present had pretty much nothing to offer specific to cloud computing except for maybe they use the cloud themselves to generate load to performance/stress test other sites (not necessarily in the cloud). There was a lot of talk but its translation to the cloud and its resulting transformation in the engineering community was largely absent. I don’t even recall seeing any PaaS signage on the expo floor.

Maybe CloudConnect is not the right conference for this or that PaaS is viewed to be a market that will eventually only be dominated by a few very big vendors not present except for Microsoft (who  offered the nicest t-shirt). Its a great conference to connect with others pushing ahead (in their minds and hallway chats) but the conference itself really has an looming identity crisis beyond this point (at least for me) which is probably a result of the cloud encompassing so many technologies, processes, service delivery models even ideologies (Ops => DevOps => NoOps).

On the long haul home I decided it was time to blog on some of the visions I have of cloud computing that I would like to eventually see become subject matter at such a conference with this first one looking briefly at PaaS but not as most see it today which is largely focused on taking existing enterprise apps deployed to a fixed number of application server targets in some data center and pushing them out to the cloud with a potentially unlimited number of application server targets and in many cases switching in cloud service based version of services offering access to a resource pool of storage and computing.

After spending considerable time engineering CORBA and grid based solutions the scalability problems the above approach presents (on many levels) is all too familiar. But what if in designing such interfaces engineers offered a means to see and touch more of the data (form and function), the data hidden behind the remote interface because of matters related to this mode/style of interaction. What if there was a remote interface as well as a set of local interfaces which could be accessed by code pushed to (or pulled from) the remote service. The code would execute within some managed (and metered) shell (or container), collocated with the data and functions, and controlled by the context within its own payload. What if the code could transmit responses or signals back to its source whilst it migrated the flow of execution (or control) from one service end point to another. Every cloud service provider would then become a PaaS  solution provider but in a narrow domain specific way. Instead of the application (or web app) being the deployment unit it would be an activity containing data, code as well context which would expose metadata for the purpose of security, metering, quality of service (QoS), billing, routing and service delegation (a subject of the next vision blog).

This is not as far fetched as one might initially think. We already have client browsers on end user devices performing some of aspects of this today as well as SaaS solutions offering ways to execute custom business logic under very controlled (governed) conditions. The future of cloud computing it not about provisioning vm instances…its about being able to move our (transient) data, code and workflow (process) from one computing/storage device to another…its about abstracting the consumption of computing power and in doing so expanding the potential capacity beyond that of a single public provider even one as big as Amazon AWS. Its about creating a large service economy in the cloud and applying service supply chain management in the design and deployment of applications and services.

If this could be realized we might actually come back to the situation in which software vendors don’t have to ship proprietary (black box) processors and storage devices with their software which is kind of a like what SaaS is today without the ability to hug such devices. In theory a software vendor would make available their software at some point on the grid with resource consumption (or “provisioning” ) done via one of more service’s (delegates) exposed in the activity context and owned (or billed) to the calling client (or app or user). Do we really want every service provider needing to provision and be billed for computing capacity when the cause of such consumption rests elsewhere in the caller chain?

In forthcoming blogs I would like to explore in greater detail how this could be achieved and the challenges the industry faces without some standardization of the service context meta data needed for safe, efficient and (cost) controlled execution and flow migration.

Metering In The Cloud – Visualizations Part 1

Case Study: Scala Compiler – Part 9

This is the ninth part in a series of articles on the “how to of software performance analysis” using JXInsight/OpenCore. Please read the firstsecondthirdfourthfifth sixthseventh and eight parts as well as the preamble before proceeding if you have not already done so.

In this penultimate part we are going to look at an incredibly powerful and innovative extension to the metering engine that allows you to record the calls into JXInsight/OpenCore’s Probes Open API inserted into an application’s code base by dynamic or manual instrumentation and then play them back in a separate process execution. What is truly unique about our approach is that for all intents and purposes there is no distinction between the metering of the original production environment and that of the playback. In playback, threads are recreated, probes are fired and meters read as per the values recorded. The metering engine does not even know whether its a simulation or not so metering extensions performing dynamic filtering or data collection can be enabled including reporting plugins and management integrations. In this sandbox environment the playback has no application class path dependencies other than the metering engine library.

This recording capability is extremely useful when you have limited time in a performance test environment to try out the various metering extensions, in particular data collectors, across different test runs. It is also a safe environment to experiment with custom metering extensions especially ones you are developing to query for particular consumption rates or to investigate certain behavioral anomalies. It is the ultimate in retrospection of software execution behavior because within the playback execution sandbox you have unlimited access to the metering model as the execution proceeds. You can slow things down without impacting the resulting model because the meter readings are already pre-recorded for every probe that will fire.

To create a recording simply enable the recorder metering extension in the actual performance test execution environment.

[jxinsight.override.config]
jxinsight.server.probes.recorder.enabled=true

Then to playback the recording in a separate process and with a different metering configuration execute the following:

java -agentpath:/OpenCore/bin/osx-32/libjxinsight.jnilib=prod -cp /OpenCore/lib/opencore.jar com.jinspired.jxinsight.server.probes.RecorderPlayback xxxx-xxxx-probes.log

During the playback you can connect the management console to the process if the console metering extension has been enabled in this environment.

[jxinsight.override.config]
jxinsight.server.probes.console.enabled=true

Here is a metering model exported on completion of a recordings playback.

Lets playback the recording again this time adding a new meter, type.time, to our performance model based on an existing meter in the recording. With this custom derived meter we can very easily determine which packages call into the scala.tools.nsc.typechecker package.

[jxinsight.override.config]
jxinsight.server.probes.meter.probes=type.time
jxinsight.server.probes.meter.probe.type.time.meter=clock.time
jxinsight.server.probes.meter.probe.type.time.group=scala.tools.nsc.typechecker
jxinsight.server.probes.meter.probe.type.time.name=type.time
jxinsight.server.probes.meter.probe.type.time.statistic=total
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.console.enabled=true

Here is the resulting metering model with the new type.time meter which we did not have included in the original performance test execution environment (though we could have included it). Notice how the metering totals and counts for clock.time have not changed.

We can have as many custom meters as we wish. Lets add another one that meters the call count along with our call time.

[jxinsight.override.config]
jxinsight.server.probes.meter.probes=type.time,type.count
jxinsight.server.probes.meter.probe.type.time.meter=clock.time
jxinsight.server.probes.meter.probe.type.time.group=scala.tools.nsc.typechecker
jxinsight.server.probes.meter.probe.type.time.name=type.time
jxinsight.server.probes.meter.probe.type.time.statistic=total
jxinsight.server.probes.meter.probe.type.count.meter=clock.time
jxinsight.server.probes.meter.probe.type.count.group=scala.tools.nsc.typechecker
jxinsight.server.probes.meter.probe.type.count.name=type.count
jxinsight.server.probes.meter.probe.type.count.statistic=count
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.console.enabled=true

With our new playback generated metering model we can now see the number of calls executed within scala.tools.nsc.typechecker originating from a particular caller package.

If you only want to count the boundary calls into the scala.tools.nsc.typechecker package you can enable the queue metering extension and then map a custom derived meter to the queue count which only records entry point calls into a group (or probe).

[jxinsight.override.config]
jxinsight.server.probes.queue.enabled=true
jxinsight.server.probes.meter.queues=type.count
jxinsight.server.probes.meter.queue.type.count.group=scala.tools.nsc.typechecker
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.console.enabled=true

The metering total for type.count has dropped significantly now that we are only metering the boundary calls into the scala.tools.nsc.typechecker package and in addition the inherent total is now assigned to other packages – the direct callers.

Even the metering inherent total for a particular package can be mapped as a meter.

[jxinsight.override.config]
jxinsight.server.probes.meter.probes=ast.time
jxinsight.server.probes.meter.probe.ast.time.meter=clock.time
jxinsight.server.probes.meter.probe.ast.time.group=scala.tools.nsc.ast
jxinsight.server.probes.meter.probe.ast.time.name=ast.time
jxinsight.server.probes.meter.probe.ast.time.statistic=itotal
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.console.enabled=true

From the metering model below we can see that scala.tools.nsc.transform spends 13 seconds exclusively within the scala.tools.nsc.ast package in the course of its own execution.

In our sandbox we can install metering interceptors that again need not be aware of the actual environment and can augment the data collection during the recordings playback. Here is a basic implementation that tags the current of execution when a particular probe is started and untags following its completion. We can use this to see which probes are fired during a particular phase in the compilers execution.

Here is the configuration to enable both the interceptor and tag metering extensions and to install the above interceptor.

[jxinsight.override.config]
jxinsight.server.probes.tag.enabled=true
jxinsight.server.probes.interceptor.enabled=true
jxinsight.server.probes.interceptors=pt
jxinsight.server.probes.interceptor.pt.factory.class=phase.tagging.InterceptorFactory
jxinsight.server.probes.interceptor.pt.imports=name,tag
jxinsight.server.probes.interceptor.pt.import.name=scala.tools.nsc.ast.parser.SyntaxAnalyzer$ParserPhase.apply
jxinsight.server.probes.interceptor.pt.import.name.type=name
jxinsight.server.probes.interceptor.pt.import.tag=syntax
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.console.enabled=true

The metering model now lists the named tag, @syntax, under all metering groups that were metered during the period this tag was set. From this we can see that calls made into scala.tools.nsc.ast.parser only ever occur when the SyntaxAnalyzer$ParserPhase.apply is executing (hint: the count and totals are the same for the group and its extended tagged group).

It is very easy to create simple reports (or character based visualizations) within the sandbox. Here is a custom metering plugin that prints out the stack as it changes showing what has not changed (_), what has changed (@) as well as the maximum depth (.) within (a minimum of) a second window. Notice how time within the sandbox is determined via the probe meter readings because time via the java.lang.System class can go faster or slower in the sandbox in relation to the underlying recording.

[jxinsight.override.config]
jxinsight.server.probes.plugin.enabled=true
jxinsight.server.probes.plugins=stack.printer.PluginFactory
jxinsight.server.probes.event.enabled=true
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.console.enabled=true

Here is the terminal output of the plugin in the playing back of the Scala compiler recorded performance test execution.

Case Study: Scala Compiler – Part 8

This is the eight part in a series of articles on the “how to of software performance analysis” using JXInsight/OpenCore. Please read the firstsecondthirdfourthfifth sixth and seventh parts as well as the preamble before proceeding if you have not already done so.

Up to this point in the series we have focused on how to effectively measure the performance of Scala’s compiler, a very complex single threaded piece of software, in terms of latency (meter) and execution (activity) count, looking at ways to eliminate noise, reduce measurement error and increase the accuracy and relevance of the resulting performance model. We started with instrumenting 14,000 methods in the scala.* codebase and metering (measuring) 16.5 billion executions. We eventually whittled this down to an impressive ~400 methods and less than 1 million executions by letting the metering engine learn and refine the instrumentation set itself without much intervention from ourselves other than to set thresholds reflecting cost tolerance. In this part we will initially revisit our metering of the entire codebase and execution of the compiler but this time focused on measuring of of the most common reasons for latency slowdowns other than the workload itself – garbage collection (GC).

Remember our famous quote “you don’t manage cost(s), you manage its cause(s)” in part 6. Well this is very much the case for GC because we have very limited control over it (at least not explicitly or directly) though the execution of our code influences its occurrence and behavior (duration) as well as to some extent the JVM arguments we use to set capacity limits and collection options. Because this performance test is largely single threaded we can use a built-in JXInsight/OpenCore meter, jvm.alloc.bytyes, to estimate the heap allocation rates. This meter is implemented similar to the custom meter, jitter.count and jitter.count, introduced in the article, Metering JVM Jitter without a Hiccup, in that the heap information exposed via the Runtime interface is continuously polled to detect increases in the heap size with automatic adjustment for decreases in the used memory assignment following garbage collection or stack execution unrolling. Its relatively cheap and reasonably accurate without having to resort to instrumenting every single class in the JVM and sizing every allocation or worse taking heap dumps repeatedly during the execution of the performance test (heap dumps are for retention, heap metering is for allocation).

[jxinsight.override.config]
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.console.enabled=true
jxinsight.server.probes.meter.clock.time.include=false
jxinsight.server.probes.meter.jvm.alloc.bytes.include=true
jxinsight.server.probes.meter.jvm.alloc.bytes.sync.interval=1

Here is the resulting metering model following the compilation of the compilers own source code base.  The overall metering total is close to 11 GB with scala.tools.* and scala.collection.* being the primary allocators (consumers).

Here are a few of the top “allocation” probes ordered by inherent total. Looking at the last name part in some of the probes we can already gain some confidence in the estimation of this meter – newBuilder, <init> (new), toString, map, make, append and $plus$eq (+=).

Below is a metering model based on the instrumentation set derived from the final IMRR hotspot metering model in part 6 created using the following command.

java -jar /OpenCore/console/opencore-console.jar -generate-aj-bundle hotspot-t25-w100-toolsonly-probes.jar /OpenCore/bundle/java/default/non-strategy/opencore-ext-aj-javaagent.jar

The slight difference in total is due to metering not taking place until Global$Phase.applyPhase is called (in a loop). Note that the inherent metering total for scala.collection.* has now moved into the scala.tools.nsc.* package because of the instrumentation and hotspot metering strategy which largely effects assignment of cost but not the allocation of cost itself (at least not intentionally).

Here are some of the top “allocation” probes with this new and more relevant activity metering model. Again the method names (last name part of probes) hint at the accuracy of this particular async heap polling meter.

Whilst the jvm.alloc.bytes can be used in metering the performance of single threaded software executions or single thread metered executions using the tracepermit metering extension there is a candidate meter which is much more appropriate for multi-threaded usage even in production – gc.time.

[jxinsight.override.config]
jxinsight.server.probes.console.enabled=true
jxinsight.server.probes.meter.gc.time.include=true

Comparing the gc.time inherent totals in the metering model below with the jvm.alloc.bytes inherent totals in the above metering model we can see that the top two biggest package contributors are the same within each model – typechecker and backend.

Here are some of the top “gc impacted” probes ordered by inherent total. There is certainly a strong similarity with the top “allocation” probes listing. We like to say “where there is smoke there is fire” or “if you are always found at the crime scene you are either the cop or the criminal”. Overtime if a probe consistently suffers some degree of GC, inherently that is, then it is an allocation site or a driver of allocation via direct or indirect calls it makes.

Note: We actually applied an instrumentation set derived from a hotspot metering model based on a latency meter, clock.time, and then included the gc.time meter in a later execution.

Case Study: Scala Compiler – Part 7

This is the seventh part in a series of articles on the “how to of software performance analysis” using JXInsight/OpenCore. Please read the firstsecondthirdfourthfifth and sixth parts as well as the preamble before proceeding if you have not already done so. If that sounds like too much work then at least read part 4 , part 5 and part 6.

Up to this point in the series we have obsessed over how to capture in our metering model the most important execution behavior of a performance test and at the lowest measurement overhead and in doing so eliminating noise and measurement error as much as possible. We have gone from metering 16.5 billion calls in the Scala compiler codebase to metering just under 1 million calls using intelligent activity metering extensions. In this part we move from instrumentation and measurement to collection augmenting the metering model with additional statistical and behavioral data structures.

The quantization metering extension can help us capture and understand the range(s) and distribution of metering values recorded for fully qualified metered probes. Enabling this collection extension is as easy as dynamic metering extensions.

jxinsight.server.probes.quantization.enabled=true

By default this enhancement will be applied to all metered probes (methods in our case study). This is generally overkill so lets limit the enhancement to our main entry point in the execution.

jxinsight.server.probes.quantization.filter.enabled=true
jxinsight.server.probes.quantization.filter.include.name.groups=scala.tools.nsc.Global$GlobalPhase.applyPhase

Here is the metering distribution (in powers of two) for the applyPhase method rendered in the management console. There are some outliers (> 100,000 microseconds) which contribute significantly to the overall metering total for the performance test.

We can use the transaction metering extension to determine what actually transpired (changed) during the execution of those methods which exceeded the 100,000 clock.time meter mark.

jxinsight.server.probes.transaction.enabled=true
jxinsight.server.probes.transactions=phase
jxinsight.server.probes.transaction.phase.name.groups=scala.tools.nsc.Global$GlobalPhase.applyPhase
jxinsight.server.probes.transaction.phase.meter=clock.time
jxinsight.server.probes.transaction.phase.threshold=100000

The above will result in metering change sets being collected within our metering model for each metering of the applyPhase which exceeded 100,000  in wall clock. A metering change set is effectively a metering model within a metering model representing the metering changes that occurred at the thread level within the metering model during the execution of the applyPhase method. Its not a trace or a path. Its a true transaction in the sense that it models (ex)change(s) in state - in this case the state is the metering model itself (associated with the executing thread).

Below are four transactions within the metering model that have been partially expanded revealing strong similarities in the metering change sets in terms of the (proportional) inherent total across the typechecker, ast and transform packages.

This view can also help reveal behavioral differences across different executions of the applyPhase method. From the three transactions shown below we can see different classes appearing under packages common to transactions as well as packages that only appear in one transaction such as matching in the second transaction and backend in the third transaction. From this information new transaction types can be defined starting at a lower and more specific entry point in the metered execution and/or using behavioral trait metering extension to more cleanly demarcate the metered work.

Metering transactions can be filtered on the occurrence of a particular probe name within the transactions underlying metering change set. This is incredibly useful when we have identified a problem probe (or group) in one transaction and what to compare its resource (meter) consumption profile across other similar transactions.

In the next part we will look at one of the major reasons for latency slow downs, garbage collection (GC), and its primary driver – object allocations.

Case Study: Scala Compiler – Part 6

This is the sixth part in a series of articles on the “how to of software performance analysis” using JXInsight/OpenCore. Please read the firstsecondthirdfourth and fifth parts as well as the preamble before proceeding if you have not already done so. If that sounds like too much work then at least read part 4 and part 5.

In the last part we looked at eliminating noise and measurement error in the metering model by way of dynamic disablement of metering as the execution of the software proceeded down the call stack looking for cyclic and recursive call patterns between caller and callee, direct and indirect, as well as the stack depth of the execution itself. In this part the focus is still on the caller-to-callee patterns but not in terms of the identity of the callers but in terms of the metering consumption and measurement cost via JXInsight/OpenCore’s innovative inline and budget metering extensions.

But first lets talk about measurement cost and change control. A quote we often use when discussing cost and its control is “You don’t manage cost, you manage its causes”. The cause of measurement cost is instrumentation. If code (or an activity) is not instrumented then it can’t have any attributable measurement cost. But what should be instrumented? If the code resides outside of a software performance engineers change control then they only way to control the costs associated with the code is by controlling the causes to how the cost is incurred – calls from the code under change control to methods within the code not under change control.

Applying the above to our Scala compiler performance test we should only instrument code within the scala.tools.nsc.* package which removes the second biggest measurement sink scala.collection.*. Likewise Scala generates a huge number of anonymous Java class methods which are best managed from their enclosing method which if metered should be sufficient for our performance measurement purposes as well as finding a suitable starting point to look further down the execution stack for possible performance optimizations. This can be achieved using the following instrumentation filter patterns.

[jxinsight.aspectj.filters.config]
scala.tools.nsc.
!/$$anon

Below is a revised version of the hotspot metering model presented in part 4 following a complete execution of IMRR script. The overall metering count has come down from 13.5 million to 8.6 million and the metering count for scala.tools.nsc has come down from 12.5 million to 8.6 million. Reasons for this drop off in counts are the initial significant reduction in instrumentation via class level filtering and the followup instrumentation reduction in the event of lower average metering times for some methods which resulted in the hotspot metering strategy disabling such probes and the IMRR process then eliminating them across executions. This is the new baseline model which will be used in comparing models generated by the inline and budget metering extensions presented later in the article.

For comparison purposes and for those interested in the improving the Scala compiler (written in Scala but compiled to Java bytecode) here are the top 25 probes ordered by inherent total. A lot of the same usual suspects from previous models are present.

The inline metering extension, recently developed when we started looking at the challenges of performance measuring bytecode generated by the Scala compiler, dynamically disables the metering of probes with an inherent minimum that is below a specific threshold and which is called directly within the metering scope of another probe that also is below the same threshold. Here it is configured alongside the hotspot metering strategy.

jxinsight.server.probes.inline.enabled=true
jxinsight.server.probes.inline.threshold=1
jxinsight.server.probes.inline.meter=clock.time
jxinsight.server.probes.strategy.hotspot.meter=clock.time
jxinsight.server.probes.strategy.hotspot.threshold=25
jxinsight.server.probes.strategy.hotspot.warmup=100
jxinsight.server.probes.strategy.hotspot.statistic=avg
jxinsight.server.probes.console.enabled=true
jxinsight.server.probes.console.dump.file.name=probes.ocs

The above configuration will inline, in terms of measurement, any probe (instrumented method) that has an inherent minimum less than 1 unit of the inline meter which in this case is clock.time (microseconds). In the graphic below the second callee and the fifth callee are not metered, though of course called in the course of execution, because both had an inherent minimum less than 1 as well as their immediate parent caller. With inline we can never have a metered caller-callee chain with two consecutive calls having a minimum inherent less than the threshold. This is a dynamic metering extension so it changes over the course of execution. It is best to think of it very much like the underlying hotspot dynamic compiler technology with the JVM which does code inlining but in this case we are not folding code but measurement.

Below is the metering model exported on completion of the very last iteration of the IMRR script. The overall metering count has dropped down from 8.6 million to 3.7 million though the overall metering total has not reduced significantly. The biggest change in this metering model from the model at the very start of this article is the drop in both the count and the metering total for the scala.tools.nsc.ast package though the inherent total remains relatively unchanged. The typechecker package has also halved its metering count though both it’s total and inherent totals have changed very little.

Here are the top 25 probes ordered by inherent total. The biggest metering count reduction is in the tranformStats probe on the second row down from 0.5 million to 0.069 million.

Changing the threshold for the inline metering extension from 1 to 5 caused further measurement inlining and more reduction in the metering count with a more noticeable reduction in the overall metering total. The scala.tools.nsc.ast package has dropped even further down the table for both total and inherent total.

Here is a revised top 25 probe listing with the new threshold of 5. Inherent totals have been pushed up into Transform$Phase.apply and Typers$Type.typeStats.

JXInsight/OpenCore’s budget metering extension makes it much more explicit how measurement cost is controlled by dynamically managing the metering overhead by way of a set fixed cost unit for each metered probe and an overhead running allowance calculated based on a percent of a firing probe’s metering average which is limited by the allowances calculated for callers and the remaining balance not already used up in other metering of callee probes.  Here it is configured alongside the hotspot metering strategy as well as the nonrecursive metering extension to reduce possibly understatements of the average.

jxinsight.server.probes.budget.enabled=true
jxinsight.server.probes.budget.cost=1
jxinsight.server.probes.budget.meter=clock.time
jxinsight.server.probes.budget.percent=5
jxinsight.server.probes.budget.warmup=100
jxinsight.server.probes.nonrecursive.enabled=true
jxinsight.server.probes.strategy.hotspot.meter=clock.time
jxinsight.server.probes.strategy.hotspot.threshold=25
jxinsight.server.probes.strategy.hotspot.warmup=100
jxinsight.server.probes.strategy.hotspot.statistic=avg
jxinsight.server.probes.console.enabled=true
jxinsight.server.probes.console.dump.file.name=probes.ocs

In the following graphic the call execution and its actual metering per the above configuration are depicted. Whilst the top caller has an allowance of 5 cost units only 4 probes are actually metered because a budget can be redefined (decreased only) as it moves down the call execution tree. When it comes to the probe with an average of 50 in terms of clock time only 2 units are left (actually 3 in terms of top caller budget) – one for the probe and one for one of its callee probes. Note also that if there is a loop in the call tree then each iteration depletes the budget allowance for all callers on the stack until they themselves are popped off the stack. The budget metering extension offers the ultimate in measurement overhead control for production environments tying the metering of probes to their metering profile and their current calling context.

Here is the final IMRR metering model generated with the budget metering extension enabled. A huge drop in the overall metering count down from 8.6 million to 0.7 million with the biggest drop being in the typechecker along with ast and transform packages. The matching package does not exist in the model anymore. The inherent total for the ast appears to have grown at the expense of the symtab, matching and possibly transform packages.

Below is a listing of the top 25 probes ordered by inherent total. The clustering of the inherent total in the top 3 probes points to the cut off point in the execution stack at which point the budget was depleted for the current execution context.

With the budget metering extension managing cost is as simple as changing the percent used in calculating the measurement overhead allowance. Note you can change both the cost and percent to have fractions. Here is a metering model with the percent set to 10. The overall metering count has increased with this additional increase in budget allowance.

And the top 25 metering probes order by inherent total are….

For the last few parts in this series we have obsessed over what is measured and when, controlling costs and directing the model to capture particular behaviors of interest. In the next parts we will look at collecting additional data such as metering transactions and distributions seeing individual threads of execution move up and down the execution stack – mini metering models with a much bigger model. Hopefully we have eliminated noise and error and now its time to strengthen the signal that is most important.

Case Study: Scala Compiler – Part 5

This is the fifth part in a series of articles on the “how to of software performance analysis” using JXInsight/OpenCore. Please read the firstsecondthird and fourth parts as well as the preamble before proceeding if you have not already done so.

In this part of our series investigating the performance of the Scala compiler in compiling its own source files we switch attention to eliminating as much noise as possible in the metering model and in the course reducing measurement error which is the difference between a measured value of quantity and its true value.

How do you know when you have a significant amount of measurement error? In some cases it is pretty easy. For example running the performance test referenced throughout this series takes on overage 110 seconds. Running the same performance test with Oracle’s HPROF, with cpu=times, took 1,041 minutes (17 hours 21 minutes). The lowest performance test run with JXInsight/OpenCore and its hotspot strategy took 121 seconds and a large part of that was the actual startup and shutdown costs incurred during the dynamic instrumentation of the class bytecode and the exporting of the metering model as a snapshot on exit. Overhead wise we have Oracle’s HPROF with 1,039 minutes compared with JXInsight/OpenCore’s <11 seconds. Its for this reason why many are forced to use cpu=samples even though in such cases the resulting data is pretty much useless and riddled with false leads. You get the data faster but the wrong data which admittedly beats waiting a day to get the wrong but different data.

Pretty much every performance profiling solution we have tested has the same problem because fundamentally they are dumb measurement machines not taking into account cost, relevance and value in instrumenting, measuring and collecting – no self awareness or adaption(s). Without some sort of intelligence the measurement solution very quickly becomes part of the performance problem itself and worse exacerbating the problem so significantly that the only finding at the end of the test is the performance measurement solution has a performance problem itself.

Part of the problem here is that developers mistakenly believe that a performance measurement solution should direct them to the actual line of code that is the problem when generally the best starting point for both observation and tuning is just above the problem in the caller-callee chain providing an appropriate enclosing context to the execution. Below is a graphic we use with new customers showing them the target points we are aiming for in our metering of their applications in development, test and production.

Sampling solutions tend to focus on call points in the ambiguous and ineffective observation space, primitive event/trace instrumentation based solutions aim for the high frequency bottleneck point but get deflected elsewhere completely distorting all the data, whilst strategy based measurement solutions like our intelligent activity metering technology focus on finding safe and optimal observation and tuning points respectively.

JXInsight/OpenCore makes it incredibly easy to reduce noise and eliminate (more like manage) as much as possible measurement error via extensions that dynamically decide at the point of firing to meter or not the (source) probe, and over the iterations of the IMRR loop eliminating the instrumentation of the probe altogether. In the rest of this article we will look at the metering model derived from using three such extensions that have been selected with the previously observed deep call stacks in mind: tracedepth, nonrecursive and tracecycle. We are not going to use them in isolation as the hotspot strategy will remain active (which is the the default setting). These extensions will simply further refine the metering model during the course of the test execution.

Here is the configuration required to install the tracedepth with the maximum depth of the probe metering stack from caller to callee set to 50.

[jxinsight.override.config]
jxinsight.server.probes.tracedepth.enabled=true
jxinsight.server.probes.tracedepth.upper=50
jxinsight.server.probes.strategy.hotspot.threshold=25
jxinsight.server.probes.strategy.hotspot.warmup=100
jxinsight.server.probes.strategy.hotspot.statistic=avg
jxinsight.server.probes.console.enabled=true
jxinsight.server.probes.console.dump.file.name=probes.ocs

Below is the metering model exported on completion of the very last iteration of the IMRR script. Compared with the metering model in part 4 there is an overall metering count reduction of more than 5 million though the metering totals are relatively similar except for scala.tools.nsc.symtab which has lost more than 50% of its previous total indicating code in this package are executed deeper in the stacked execution. The overall metering total has dropped by 8 seconds.

The top probe listing ordered by inherent total is also very similar to what was reported in part 4. We got roughly the same results for a cheaper price (in terms of measurement cost) – always a good thing if the balance is kept right between cost, relevance and value. That does not mean we should disregard the differences that do exist between the models as they reveal the stack execution nature of the probes in question.

Metering extensions such as tracedepth are extremely useful because they allow the software performance engineer practically near complete control over the measurement cost and with the minimal of surface configuration. An engineer can simply change some setting to suit the situation and environment in terms of coverage, collection and acceptable cost.

Here is a metering model collected with the above configuration but with the tracedepth upper set to 100. The overall metering count and totals have risen as expected. The scala.tools.nsc.symtab package has returned close to its previous metering totals but not altogether hinting at deeper call depths in the normal execution of the software. The scala.tools.nsc.matching package has now reappeared after disappearing in the upper=50 performance test run. Whats also worth noting is that the scala.tools.typechecker package has slightly increased its metering total but its metering count has grown by over 55% whilst its inherent metering total has remained pretty much the same. This clearly points out the recursive or re-entrant call nature within this package.

Here again are the top probes ordered by inherent total with the maximum metered call depth of 100. If you look at the bottom of the table you will see that even after 20 iterations of the IMRR loop some probes were still being disabled in this case scala.tools.nsc.transform.Erasure$Eraser.typed1.

Here is a metrics model generated by comparing the two tracedepth metering model filtered by clock.time.inherent. Can you notice the swings in the assignment of inherent totals between caller and callee?

Lets try another strategy this time eliminating the metering of same named caller-to-callee calls using the nonrecursive metering extension (there is also a nonreentrant version).

[jxinsight.override.config]
jxinsight.server.probes.nonrecursive.enabled=true
jxinsight.server.probes.strategy.hotspot.threshold=25
jxinsight.server.probes.strategy.hotspot.warmup=100
jxinsight.server.probes.strategy.hotspot.statistic=avg
jxinsight.server.probes.console.enabled=true
jxinsight.server.probes.console.dump.file.name=probes.ocs

In the graphic below the call to x.y.z from x.y.z is still executed at runtime but not measured by the metering engine.

Here is the metering model collected with the above configuration. Whilst the metering total looks very much like the metering models in part 4 it is the count and inherent total values that have changed somewhat with the biggest change occurring in the scala.tools.nsc.ast and scala.tools.nsc.transform packages. The increase and decrease in inherent totals is more than likely caused by the change in the value of the average statistic used by the hotspot strategy in deciding whether to continue to meter a particular firing probe. By reducing the count of the nested recursive calls the average increases possibly beyond the threshold used by hotspot.

Below is a listing of the top probes ordered by inherent total. Comparing this to the original hotspot metering model the transformStats probe has increased its inherent total by more than 5 seconds and the typed1 probe has decreased by 4 seconds whilst the metering totals have remained the approximately the same.

Here is a metrics model generated by comparing the nonrecursive metering model with the original hotspot metering model filtered by clock.time.inherent. The probes listed on rows 3-5 did not appear in the original metering model more than likely because of the overstatement of the count due to recursion (or method overloading as signatures are not used here) causing the average to be understated which resulted in the probe being disabled during the IMRR process execution.

Generally when a repeated call sequence is detected on the call stack we don’t want to meter any further down in the call execution until the sequence is popped off the stack. This is what the tracecycle metering extension provides.

The following configuration enables the tracecycle metering extension specifying the minimum cyclic call sequence length to be 3.

[jxinsight.override.config]
jxinsight.server.probes.tracecycle.enabled=true
jxinsight.server.probes.tracecycle.length=3
jxinsight.server.probes.strategy.hotspot.threshold=25
jxinsight.server.probes.strategy.hotspot.warmup=100
jxinsight.server.probes.strategy.hotspot.statistic=avg
jxinsight.server.probes.console.enabled=true
jxinsight.server.probes.console.dump.file.name=probes.ocs

With this configuration we have eliminated more than 10 million calls from the metering model presented in part 4 and we have recorded the best benchmark time up to now. What’s truly amazing is that we have gone from metering 16.5 billion possible calls to just under 1.5 million and we never had to decide (or click) what should and should not be measured. And to top it off we have a metering model that at the package level looks relatively similar to our previous models in terms of totals and inherent totals though the counts have changed significantly with a reduction in the number of metered probes in the scala.tools.nsc.typechecker package from just over 7 million to a mere 55 thousand. The metering count for the scala.tools.nsc.ast package has also dropped significantly down from 2.5 million to 205 thousand.

From the following top probes listing ordered by inherent total its pretty easy to identify the probes that represent the last call point in a repeating call sequence. Of course what the inherent total means here and what it means in other models collected up to now is different. But that does not mean it is wrong. We like to say “There are good performance models, poor (bad) performance models and then there is call stack sampling”. The differences in the models are just as revealing as the similarities and largely reflect movement of inherent cost from one probe to another in a caller-callee chain. The art in software performance measurement is how to alter this assignment of cost in such a way to better observe the core behavioral patterns (performance drivers) without perturbing the system and to identify the appropriate and most effective starting point for further more specific performance investigation and tuning. But unlike other straw man approaches such as call stack sampling the differences in the models for each performance test can be reasoned on and explained.

Again as a software performance engineer we can control the degree of measurement by simply changing one or more properties without having to get into the details of the code base. Here is the metering model collected with the minimum cycle call sequence length set to 5 for the tracecycle metering extension.

Here is a revised listing ordered by inherent total.

In this part we focused on filtering out (or compressing) particular call execution behaviors. Deep call stacks. Recursive calls. Cyclic stacked call sequences. In the next part we are still going to look at patterns on the execution call stack but not in terms of the probes themselves but in terms of their consumption of cost which in the case of performance profiling is latency (wall clock time).

Case Study: Scala Compiler – Part 4

This is the fourth part in a series of articles on the “how to of software performance analysis” using JXInsight/OpenCore. Please read the firstsecond and third parts as well as the preamble before proceeding if you have not already done so.

Performance measurement is about deriving meaningful, useful and (relatively) accurate models that reflect (or abstract) the execution behavior and resource consumptions patterns of a software and its internal components, systems and functions. What you quickly learn in the field is that there is no single model. Models are in some way designed (or constrained) by the measurement tools and techniques used and how they are applied by the software performance engineering to the problem. Models vary in the quality of information measured in terms of  coverage, comprehension, integrity and validity (precision and accuracy). The best model is the one that offers the best cost-benefit analysis but finding that is no easy task especially in relatively short running tests.

Whilst there are large differences between various application performance management products on the market there are three basic functions all provide in some form or another: instrumentation, measurement and collection. Generally these are tightly coupled, what gets instrumented gets measured gets collection, except in the case of JXInsight/OpenCore which cleanly separates each of these allowing for both offline and online strategies to be  applied to each individual concern – what gets instrumented might not get measured and what gets measured might not get collected.

This separation has allowed us to innovate in the design of software performance engineering (SPE) processes around our tooling and technology. One such process is IMRR which stands for instrument, measure, refine and repeat. Basically the same performance test is repeatedly executed with each iteration refining the instrumentation set (classes and methods) based on a hotspot analysis of the metering model derived from the measurement of the previous instrumentation set. This refinement does not just happen across cycles but as well as at runtime during measurement in the case of either the hotspot and/or autotune metering extensions being enabled.

Here is an example script putting this into practice with our dynamic instrumentation agent (which includes the metering engine) and the console’s command line tooling. Two executions of the performance test are performed in each iteration before a revised instrumentation is generated using the -generate-aj-bundle command.

cp /OpenCore/bundle/java/default/opencore-ext-aj-javaagent.jar .
opts="-J-agentpath:/OpenCore/bin/osx-64/libjxinsight.jnilib=prod -J-javaagent:./opencore-ext-aj-javaagent.jar"
count=0
while [ $count -lt 20 ]
do
count=`expr $count + 1`
rm -r classes
mkdir classes
time ../../bin/scalac $opts -J-noverify -J-Xmx1024M -nobootcp -d classes @classes.list
rm -r classes
mkdir classes
time ../../bin/scalac $opts -J-noverify -J-Xmx1024M -nobootcp -d classes @classes.list
java -jar /OpenCore/console/opencore-console.jar -generate-aj-bundle probes.ocs /OpenCore/bundle/java/default/opencore-ext-aj-javaagent.jar
rm jxinsight.aspectj.cache

done

To refine both the dynamic instrumentation and the dynamic measurement (which is done at runtime) the strategy metering extension is by default configured to use the hotspot strategy labeling probes hotspot, !hotspot and disabled (dynamically disabling metering of such firing probes). Here is the configuration used in our initial performance profiling of Scala’s compiler.

[jxinsight.override.config]
jxinsight.server.probes.strategy.hotspot.warmup=100
jxinsight.server.probes.strategy.hotspot.threshold=25
jxinsight.server.probes.strategy.hotspot.statistic=avg
jxinsight.server.probes.strategy.hotspot.meter=clock.time
jxinsight.server.probes.console.enabled=true
jxinsight.server.probes.console.dump.file.name=probes.ocs

This configuration will disable any probe (named method) which after each 100 metered firings has an average metering clock.time value of less than 25 microseconds.

Here is a chart of the timing for the instrumented test over the 20 loops (in total 40 executions) compared with a non-instrumented baseline time. The delta of 10-12% can be attributed to the instrumentation of code at start up (4-5%), the measurement of the instrumented execution and the export of the metering snapshot on shutdown (3-4%).

To put the above into perspective here is the timing for a test run with the DTrace probes enabled (-XX:+ExtendedDTraceProbes) within the runtime but with no actual measurement performed (the best case possible and most useless).

Note: We actually ran the same test with another commercial profiling solution with full instrumentation running it once overnight and then killing the process after coming in the next morning and finding it was still running. The test should normally only take 110 seconds.

Below is the metering model exported on completion of the very last iteration of our IMRR script. The overall metering count has dropped down from 16.5 billion calls to 13.3 million! No guess work. No human action. All driven by the data itself. The measurement system is actually learning and within (few) parameters defined by the software performance engineer.

We can see a significant amount of time is spent within the scala.tools.nsc.typechecker.* package with a clock.time metering total of 58 seconds 47 of which was inherent to the package. Remember measurement is dynamic with the hotspot strategy metering extension enabled so the inherent total will change as child probes belonging possibly to other packages are disabled and the cost is then assigned back up the caller-callee chain which means there is never a single true performance model at least not one that is complete.

Here is a listing of the top probes ordered by inherent total. Note the big difference between the total and the inherent total values for the scala.collection.Iterator$class.foreach probe. This is basically a”pass thru” method.

To eliminate the metering of “pass thru” methods we can re-configure the hotspot strategy as follows:

[jxinsight.override.config]
jxinsight.server.probes.strategy.hotspot.warmup=100
jxinsight.server.probes.strategy.hotspot.threshold=5
jxinsight.server.probes.strategy.hotspot.statistic=iavg
jxinsight.server.probes.strategy.hotspot.meter=clock.time
jxinsight.server.probes.console.enabled=true
jxinsight.server.probes.console.dump.file.name=probes.ocs

This new configuration will disable any probe (named method) which after each 100 metered firings has an inherent average metering clock.time value of less than 5 microseconds.

Here is our test times across iterations of our IMRR loop.

With the new strategy configuration the number of metered probe firings has dropped further from 13.5 million to 8 million which relatively is impressive though we did start with 16.5 billion calls. Notice how the metering total for scala.collection.* has now changed using the inherent average as a means to manage overhead and increase accuracy of the parts of the model that are much more relevant.

Here is a listing of the top probes ordered by inherent total. There are some differences with the last model but for the most part a greater number of similarities.

An inexperienced software performance engineer might at this stage ask which is the “right” model. Both models are right to some degree but like any type of model have slight measurement errors (under or over statements of metering total and inherent total) and answer slightly different questions.

Note: A performance hotspot acts very much like a black hole (reality distortion field) in that as we getting closer to observing it the measurement tool gets sucked in and becomes part of the problem itself.

In the next part we will look at other dynamic metering extensions (tracecycle, nonrecursive, tracedepth) which can be combined with hotspot along with techniques which can help us eliminate noise and error from our software performance models and reduce overhead even further without changing the laws of physics.