Skip to content

Posts from the ‘Article’ Category

JXInsight/Simz – It’s the Application not the Process that’s important

During a recent talk on JVM Performance – Past, Present and Predicted in which I gave a brief demonstration of JXInsight/Simz in the final moments in discussing how to manage applications I made the statement that it was time we made the application and its activities the focus of our monitoring and management initiatives and not the OS process or the network for the most part though naturally consideration needs to be given to both the logical and physical design. Whilst the JXInsight/Simz service does allow one to determine the process of a simulated thread within its near real-time application replay runtime it is the threads and their activities (named probes => methods) that are given center stage as well as the metered resource usage recorded. This is especially relevant in discussing cloud computing elasticity with nodes being provisioned and decommissioned at fine granularities and JVM runtimes starting and stopping even more frequently as applications scale up and down in response to dynamic workload demands. It is also relevant when looking at Platform-as-a-Service (PaaS) in which the concept of a distinct identifiable process is all but lost.

One line of thought (and analogy) I have used to help with this argument is to discuss how companies report their size outside of financials. One typically dimension is the global head count reflecting on some implied capacity to respond to customer product and service needs. Rarely does a company list the number of office blocks that house employees though an international/local presence might be emphasized in adverts. Employees (and systems) get work done (directly or indirectly) whereas buildings offer a efficient means to share/consume/coordinate space, infrastructure and supporting services.

When companies think of expansion it is generally the workforce that comes before offices & facilities. Only when new (or proposed) additions to the workforce cannot be accommodated does management start looking at increasing office capacity. The same also applies to how a company organizes and manages itself. And this is how we should views applications (and their activities) alongside operating system processes. Managing applications by way of process monitoring is kinda like managing a workforce by counting the office lights turned on at each site? Processes are a means to consume local resources and coordinate such local interactions reflecting more of the organizational aspects of the software. It is the threads and executing activity (code) that service requests and respond to events that represent the application behavior which should be monitored and managed first and foremost.

With this in mind it is easy to see why when it comes to application performance monitoring and capacity management that process is for the most part a largely redundant if not an irrelevant element in an application management model (especially in the cloud). When it comes to software observation across hundreds or thousands of transient machines and processes why not have a single pane of glass on a single process that replays / simulates the work of thousands of threads irrespective of location. A kind of matrix in which the streams of a software process are played out in another software process – all together and in near realtime.

JXInsight/Simz – Where Big Data meets Big Activity in the JVM

Big data is nothing new in the application performance monitoring space. Applications have been generating huge amounts of data using various logging and metric utilities. The data collected by such tools is then either pulled or pushed to some central management server wherein it is filtered, transformed, aggregated and stored/updated for offline viewing by operators connected to the server via management consoles and reporting dashboards. Here live application behavior becomes decontextualized departed data which is then transformed into measured metrics and finally persisted points. Behavior becomes data then deprecates and dies.

JXInsight/Simz is very different in its approach to the data collected pertaining to a call that is metered and its consumption of resources during execution. The data is pushed in near real-time to one or more Simz services where it is used to drive the execution of playback threads with one for each metered application thread. Each playback thread makes the same underlying metering calls, begin() and end(), made by the instrumented methods in the application and with the exact same meter readings. Behavior becomes data and is then resurrected becoming behavior all over again but in a slightly different form.

We don’t need to have the class bytecode of the callers/callees and we don’t need to have application state (it would never fit anyway in scaling to hundreds and thousands of JVMs). Our underlying metering model of named probes (activities) and meters (resources) is more than sufficient to capture the execution essence of an application, and its many instances within a cluster, all within a single simulated runtime.

And whilst today both the application and Simz service are based on the same runtime there is nothing really stopping the usage of our universal metering model and instant replay[back] technology to simulate other application language runtimes with the added benefit that plugins written in one language can be reused across multiple client languages without change.

JXInsight/Simz – Write Once, Load Once, Plugin Everywhere

One of the primary design goals for JXInsight/Simz was that all custom plugins using the Probes Open API would be able to work in a single JVM runtime as well as a simulated multi-JVM runtime without any code changes whatsoever. The only difference is that a custom plugin loaded by the JXInsight/Simz service would be able to observe the metering of threads across multiple JVM runtimes from within a single runtime as calls to the Probes API are replayed in near real-time. Write Once. Run in one JVM. Plug into every JVM.

To demonstrate how seamless this truly is lets create a custom metering plugin that prints out the name of a thread the very first time it is registered with the metering runtime which typically happens just before the execution of the first instrumented method by a thread.

Here are the system properties that need to be added to a jxinsight.override.config file to load the custom metering plugin within a metered runtime.

Lets now create a sample application which will be instrumented and metered by JXInsight/OpenCore.

A single line is placed in a jxinsight.aspectj.filters.config instructing the agent to only instrument classes in the acme.* package(s).

Finally the script to run up 3 Java runtimes each given a different thread name argument.

Here is the output from running the script with the above custom metering plugin printing out 2 lines for both the main and argument named thread per JVM runtime execution. It is important to note that the custom metering plugin is collocated with the application.

Lets now disable the loading of the custom metering plugin and instead enable the simz client metering extension which will re-route calls into the Probes Open API to a remote JXInsight/Simz service. Unless specified the extension assumes the service is running on the same host.

The JXInsight/Simz service operates identical to a metered runtime and so the custom metering plugin properties can be simply copied over to a jxinsight.override.config file located in the services jxinsight.home directory.

Here is the script needed to launch the JXInsight/Simz service with the custom metering plugin available on the classpath. Note there are no requirements to have the actual application classes available as this JVM runtime only simulates the behavior on threads in terms of metering (probes, meters, readings,…).

The output from the client script now only lists those System.out.println() calls made by the application and not the custom metering plugin.

In the JXInsight/Simz JVM runtime the custom metering plugin observes the metering of threads across each of the 3 simz client connected JVM runtimes. The global remote observation is achieved without any code changes.

Changing the custom metering plugin to output the name of the Probes.Context will allow us to see the UUID associated with each simz client connected JVM runtime.

Here is a sample of the JXInsight/Simz output with this change applied to the custom metering plugin.

Alternatively the Probes.Environment interface associated with the Probes.Context interface can be used to lookup up the server UUID in both real and simulated environments.

Click here to see a screen recording walking through the above steps within a Java IDE.

How (not) to design a Metrics API – Part 4: Naming and Names

This is part four in a series of articles on how best to design an application metrics monitoring library, in particular its API, providing versatility both in terms of application of the library across domains & environments and in its implementation by one or more vendors. Please read part 1, part 2 and part 3 if you have not already done so.

Naming

Following on from domain modeling the hardest task in designing an API is in the naming of classes and methods. From our own experience this activity consumes an inordinate amount of the time though it is probably an area that needs far more attention (especially in engineering books) than it is given today considering its importance in the formation of a users conceptual/mental model of an API and in the communication of usage and intent as well as lessening of ambiguity. Time spent here is not only beneficial for users of an API but the implementors of such an API. One good rule of thumb is consistency.

Whilst there are many slightly different ways to measure and manage application performance we have found many common and similar concepts across such different techniques and approaches, in doing so we have reused interface names (and some in cases cloned implementations). Here is table listing some (but not all) of the interfaces common to both the Metrics Open and Probes API in JXInsight/OpenCore.

The commonality in naming of classes continues to extend to newer technologies we are developing to address sub-microsecond software performance execution analysis.

We are not simply reusing names but also the associated structural and behavioral patterns but in a slightly different observation and measurement context giving the underlying implementation the ultimate flexibility whilst still affording some limited form of internal reuse even if just at the conceptual level.

If a user is familiar with one of our Open API’s then they will feel very much at home with the other API’s though type wise they are different as all interfaces are defined as inner classes with each of our main entry point classes: Probes, Metrics and Signals.

Names

Anyone that has worked with the javax.management API knows all too well that if you don’t get the means to identify (lookup) metrics right everything else becomes laborious, unusable, and error prone. Whilst we understand the thinking that went into the javax.management.ObjectName class we believe it is one of its more fundamental flaws and its implementation has never being able to adequately compensate for the design decision to use an unordered map of name-value pairs strings as an means of identification. The amount of engineering that has gone into maintaining such, an otherwise simple class (if not for the parsing and mapping), is extraordinary especially in light of the huge execution cost in constructing names and the resulting memory footprint which today still is still flawed in that string representations of the name-value pair map are interned with maps even created when a string is passed into the constructor. JMX is a design disaster in a local client context though admittedly its original design was driven by the needs of legacy system management vendors for the purpose of remote management component lookup. One could very easily argue that it is not exactly a Metrics API though it does have a CounterMonitor and GaugeMonitor interface resulting in many libraries perceiving this to be the case and using it as the basis for their own designs and implementations (see alternatives below). It is because of this that today Java still does not ship with a well designed and scalable metrics measurement runtime.

In the design of the Name interface in our Metrics API (as well as Probes and Signals) we have opted for a hierarchical namespace approach with Name instances representing an ordered sequence of components similar to the javax.naming.Name interface (though immutable) with ‘.’ acting as the delimiter in a string form.

Here are the few methods in the Metrics.Name interface.

Metrics.Name instances are constructed using one of the following static utility methods in the Metrics class or by calling name(String) on an existing instance returning an extended Metrics.Name instance.

With Metrics.Name being an interface and all construction of such are under the control of the underlying Metrics SPI implementation, instances of Metrics.Name can be compared using reference equality. We have found this allows for performance optimizations in the caller code space as well as in the underlying implementation which can use this for efficient metric lookup and memory management (see benchmarks below).

We have also found this to be extremely useful in passing meta information, labels, from metric extensions hidden below the API surface up to client callers especially when such extensions automatically create metrics themselves from base registered metrics in the event of some condition or operation (tagging).

Note: Name labels are implemented as bit masks hence the reason to look by string name first. This is much more efficient than arbitrary tag sets.

You will notice that Name instances do not contain environment specific values. Such information is best located and accessible from an Environment and Context interface both of which available in our Probes API as well as our Signals API.

Whilst the Metrics class offers utility methods to create Name instances from Class references, they are not in anyway tied to a particular source, such as a Class, or have a fixed number of composite parts.

Alternatives

Jammer

This metrics library expects that all metrics can be identified by 4 string parameters (group, type, name, scope) passed into a MetricName constructor, which are then concatenated to form an identifier referred to as “mBeanName“. This is the same design flaw highlighted above in the JMX API (which is hinted at in the naming) and inherited possibly due to an underlying dependency on JMX to expose data to remote clients though the author does state in his documentation that the JMX RPC API is “bonkers and fragile”.

The mBeanName field which is accessible from the MetricName class via the getBeanName() method is used to implement (via call delegation) the hashCode and equals methods for the MetricName class. It is possible to pass in a fifth constructor parameter, the value for the mBeanName field, which could have an entirely different value for the group, type, name and scope attributes within its string representation (a decision that invites trouble). The limits imposed and the dependencies introduced are completely unnecessary.

If the fifth parameter were not to be passed in then the mBeanName field would take the value “group:type=type,scope=scope,name=name“. This is the basic syntax of a JMX ObjectName.

NetFlicks

To create a metric name in this library you must use MonitorContext.Builder. MonitorContext is actually the metric name itself holding two fields: name and tags. Both fields are used to uniquely identify a metric though a metric in this library is referred to as a Monitor similar to JMX. The similarities with JMX don’t end there as the tags field is actually the JMX ObjectName attribute map revisited as such its construction is both expensive in terms of execution & allocation cost as well as footprint. There is actually a Metric class but that is in fact a metric sample.

Here is another version with class and counter tags added.

Benchmarks

Creation and lookup of measures and metrics needs to be implemented very efficiently in a metrics API because of possible high frequency of such activity. Here are results from a quick benchmark test comparing the JXInsight/OpenCore metrics runtime with the Jammer metrics library.

Note: We did try run the NetFlicks metrics library with our tests but it kept crashing with OutOfMemory errors and efforts to circumvent this only resulted in other exceptions being thrown related to duplication counter registration which could not be avoided entirely because the API does not offer the means to test whether a metric has already being registered.

Here is the Jammer code snippet benchmarked.

Here is the JXInsight/OpenCore code snippet benchmarked.

With static imports on the Metrics class the above code can be simplified.

Considering there is not much to actually creating and looking up a counter in a local context the performance difference is astonishing in executing 1,000,000 loops – OpenCore is approximately 13 times faster.

Note: For a lookup of a Counter without creating the Name but instead using an exiting reference held in a static field the call cost for JXInsight/OpenCore drops down to low single digit nanosecond timings (naturally with the reference being already in cpu memory cache).

Commentary

One of the reasons behind this series of articles on Metrics API design is the belief that the art of good API design is being lost due to a number of growing trends in the industry: (1) a preference for open source over standards even when standards allow both commercial and open source implementation (there are positives and negatives of both approaches), (2) growing willingness of developers to accept a single implementation dependency, (3) a shift in focus from local (client) API design to remote (service) API design, (4) the need to get things done (and updated) today in an ever changing environment reflecting largely current usage, needs and demands (inline with growing uncertainty and unpredictability), and (5) the sidelining of any sort of technical due diligence in favor of “if it is good enough for X (high profile cloud/web service) and it’s free then it is probably good enough for us” based on the assumption that those successful and scalable companies able to keep services from falling over must also be good at engineering the underlying software when in fact it is more likely the heroics of a few persons in operations in daily firefights.

Developing an ad hoc solution that meets current usage and scale within a particular context is not the same as designing and developing an API (and reference implementation) that meets not just current but future needs and scales, and in ways that are not completely known or fully understood at the time of creation for a much more diverse user/consumer group. API designers have to always leave sufficient flexibility, versatility, capacity and adaptability in their design to allow innovation to foster above the API as well as below at the implementation level and in different contexts and environments. Good designers know when to procrastinate in the present so as not to constrain productivity at a later time when more information is available and other enhancements in technology have occurred.

Part 5: Alternatives to Reset

Costing, Counting and Controlling Code Blocks with Hybrid Instrumentation

This article is a follow-up to the question, Which What are some Java profiling tools that will provide time measurement of code blocks?, posted on Quora with the additional comment “I am looking for a time measurement breakdown by code block and not just high level method names”.

In general I would question the need for such low level data especially as there is a cost (overhead) element in trying to do this which if not carefully managed could render all measurements useless (in terms of information quality). But unfortunately I have being around long enough to see single methods containing hundreds if not thousands of lines of code greatly diminishing the value of method level performance measurement though if this is indeed the case do you need to really look deeper (you’ve got much more serious problems than just performance). Unless some serious refactoring of the method code (the preferred option) is done or a brittle and convoluted approach to dynamic instrumentation of the method is implemented then a hybrid approach is probably the best option initially.

JXInsight/OpenCore is pretty unique in an increasingly crowded application performance monitoring space in seamlessly supporting both dynamic agent based instrumentation as well as manual instrumentation via its Probes Open API which offers both the ability to generate measurements as well as accessing such measurements. The metering engine below the surface of the Probes Open API is completely oblivious to the instrumentation source of the measurement calls. There is for the most part no distinction made. Incidentally the dynamic instrumentation agent uses the very same Open API used as means for ad hoc manual instrumentation purposes in the proceeding examples below.

Lets start with a very simple example of a method, inner, with two labeled code blocks, b1 and b2.

Here is a metering model of the above with dynamic (class load time) agent based instrumentation at the method level.

Below is a revised version of the class this time with instrumentation calling into the Probes Open API. Note the executing of the begin() and end() calls does not guarantee measurement will take place as our metering engine is adaptive. Also note that the instrumentation does not dictate how a code block is measured. The decision to meter such activity with one or more meters is left to operations and done via external configuration.

Here is a revised metering model with the code block measurements alongside the method level measurements. Note the inherent clock.time total has now changed for the CodeBlock.inner method probe with the inherent cost assignment now going to the individual code blocks themselves.

Often the need for code block level analysis is because there is a particular code line whose execution frequency drives the performance cost of the enclosing method like in the following contrived example mimicking the iteration over a database query result set with varying number of returned rows per query execution.

Here is a sample metering quantization of the above CodeBlock.inner method. There is a significant spread from 256 microseconds upwards to 131,072 microseconds.

The metering model can be extended to include not just clock.time meter readings but a work.count meter that represents the cost driver for each method. Below is a revised version of the class with the required calls into Probes Open API to create and update this counter. The context() call returns the thread’s metering context which provides a means to update a local counter, which can be mapped to a meter by operations at deployment time.

To map the counter, work.count, to a meter of the same name the following is added to the jxinsight.override.config file.

j.s.p.meter.counters=work.count

Here are the resulting work.count metering quantizations for the CodeBlock.inner probe.

Having all methods update the work.count counter based on some internal performance driver will only work in terms of inherent (self) total if the counter is only ever updated when the method is metered. If the method is not metered because of a metering strategy then the inherent total of a calling method will be overstated. To remedy this the event metering extension can be used in a very unusual way in registering a listener for the scope of the method invocation and on receiving a call back upon completing (exiting) of the method updating the counter. This is a pretty cool example in that the listening code is dependent on instrumentation that is not in the source but which will be weaved into the class and its method at load time.

Turning attention to control the ability to create a probe name based on some request processing context enables quality of service (QoS) classifications that go beyond the code naming/structure itself. For example a probe could be fired that reflects some category (gold, silver, bronze) of a customer making a service request. The example below makes a distinction in the naming of a contextual probe based on the evenness/oddness of an integer parameter.

The following QoS for Apps configuration applies a different rate limiting to each of the above contextual probes. With a default timeout of 1 millisecond the odd numbered executions should be slower by 1 millisecond (1,000 microseconds) compared to the even numbered probes.

j.s.p.qos.enabled=true
j.s.p.qos.services=even,odd

j.s.p.qos.service.even.name.groups=com.acme.e3.CodeBlock.call@even
j.s.p.qos.service.even.resources=l
j.s.p.qos.service.even.resource.l.shared=false
j.s.p.qos.service.even.resource.l.rate.limit=1000
j.s.p.qos.service.even.resource.l.rate.interval=1

j.s.p.qos.service.odd.name.groups=com.acme.e3.CodeBlock.call@odd
j.s.p.qos.service.odd.resources=l
j.s.p.qos.service.odd.resource.l.shared=false
j.s.p.qos.service.odd.resource.l.rate.limit=1
j.s.p.qos.service.odd.resource.l.rate.interval=10

Here is the resulting metering model showing the difference in distribution with the even number executions falling mainly in the 1,024 to 2,048 band and the odd falling largely in the 2,048 to 4,096 range.

The metering of the contextual control probes need not be done though the control itself is performed. The following property stops the QoS probe enhancement forwarding execution on to the underlying bare metal measurement probe.

j.s.p.qos.forwarding.enabled=false

Only the call probe for the enclosing method of the contextual probes is now reported in the metering model.

From Anomaly Detection to Root Cause Analysis via Self Observation

“Anomaly detection, also referred to as outlier detection refers to detecting patterns in a given data set that do not conform to an established normal behavior. The patterns thus detected are called anomalies and often translate to critical and actionable information in several application domains. Anomalies are also referred to as outliers, change, deviation, surprise, aberrant, peculiarity, intrusion, etc.” – Wikipedia

There are two general techniques used in performing anomaly detection in software systems. The first technique is based on time series analysis of sampled measures (metrics) which is generally done offline (or online but sufficiently in the past). The second technique is event based comparing one or more event specific measurements (clock, cpu,…) with predefined or dynamic thresholds, which is generally performed at the point of its occurrence (in time and space).

In the context of event based analysis a number of approaches have been used that allow moving on from detection through to root cause analysis. One approach used by solutions that are largely call stack sample based in their measuring of code performance is to have each thread on beginning of a request to register its self for observation with a supervisory thread. This supervisory thread then every so often (in milliseconds) checks on the progress of the registered threads. When the supervisory thread detects a thread has passed the time threshold (which maybe pre-defined or dynamic) for a particular request/operation it begins sampling the call stack of the request thread at regular fixed intervals (in milliseconds) until the thread eventually completes and unregisters itself for further observation until the next request.

The primary advantage of this approach is that no measurement is performed until the threshold has been exceeded, then when it does measure it is only call stack sample based. The primary disadvantages of this approach stems from the fact that no measurement is performed until the threshold has been exceeded. Yes you are reading it correctly. Why is it that what is good is at the same time also bad? Well because its simply a trade-off between possible lower overhead measurement cost and many important information quality metrics such as accuracy, precision, resolution, coverage, composition, comprehension and completeness.

With call sampling measurements following on from the point of the time threshold being exceeded information quality is lost in terms of

  • Accuracy: It is not possible to accurately determine the time spent by methods that are sampled. Even if measurement were event based it would still not be possible for those methods started before the threshold which completed sometime after the threshold.
  • Precision: By its very nature call stack sampling is not precise especially in the JVM without unique identifiable stack frames.
  • Resolution: The default time interval for most call stack samplers is 10 ms. Which by todays standards is pretty coarse. There are a number of reasons for this including the cost of stack collection for a high number of threads, the every increasing depth of such stacks and the resulting impact on garbage collection. Many of our customers execute transactions (trades) in under 1 ms so this approach has no value whatsoever.
  • Coverage, Composition, Comprehension and Completeness: These are all severely impacted because there is no detailed behavioral evidence (code frequency & timing) gathered before the threshold point which is more than likely were the actual problem is. Data collected following on from the threshold could very well only report on the normal execution timing of completion and cleanup code. This is further compounded by the fact that most (humans as well as machine algorithms) will set thresholds significantly high so as not to have too many alarms. On top of this there is no understanding of what constitutes normal behavior within the processing itself so there is effectively nothing to compare with other than all other requests that also exceeded the threshold. If requests don’t exceed the threshold by a huge amount then you are out of luck…if they do then you are out of excuses.

Note: This approach first appeared on the Java performance scene in the now defunct Glassbox project but more recently used by AppDynamics and NewRelic. It’s typically used by vendors with very little in the way of intelligent adaptive measurement and by those with high overhead due to inefficient measurement and collection code.

Self Observation – The Better Approach

A much better approach is to have the thread themselves (continuously) measure and observe their own execution behavior (code) and performance (clock, cpu,…) and then only store such a collection of aggregated measurements in the event of a threshold being exceeded at completion of the request processing. Combining this with some degree of intelligence in measurement gives the best of both worlds, low overhead and high information value, whether its monitoring requests that take seconds, milliseconds, or microseconds.

Before a thread begins processing a request it creates a SavePoint (checkpoint) referring a particular point in time in terms of the metering (frequency & timing) state for the threads Context. Then on completion of the request it generates a ChangeSet holding the Changes in the metering state since the SavePoint by way of comparison with the threads metering Context. Note the compare function need not be called by the thread unless the threshold is exceeded.

All of the problems with the previous approach are mostly solved. The information is complete allowing comparative analysis with normal behavior patterns which can be defined by aggregating, binning and classifying such collections. What is even more important is that the collection set can already be trimmed or even better a signal raised and all data discarded immediately. We seriously have to start (re)considering are we and not the machine the primary and most appropriate consumers of management data in this new era of computing in the cloud.

Here is how this is achieved using our Probes Open API though its not necessary as our built-in transaction metering extension will perform the same function under the hood of the metering instrumentation inserted at runtime into classes.

With such self observation capabilities it is very easy to ask self reflecting questions of a threads code execution behavior between two points in time. Here is how to determine the number of times the method com.acme.App.leaf was called directly or indirectly by the com.acme.App.call method using the ChangeSet that was generated following completion of the call method.

Whilst this is a unique and novel approach it is still relatively simple to understand and incredibly powerful in its application to the many management tasks that applications, platforms and runtimes will be required to perform over the next coming years as the influence and usage of the cloud expands. Here is how the latency for each package, class or method executed can be determined.

Note: This is not exactly tracing as the collection only holds information on what has transpired in terms of measurement aggregation but not how such measured execution was called and in what order it was performed because as all professional performance engineers know path tracing (especially distributed) does not scale in terms of overhead at runtime or analysis during offline viewing.

Video: Here is a screen recording showing behavioral introspection in action.

Software with embedded self observation can get a much better sense of the underlying execution behavior of classes and components up and down layers in the stack without ever knowing of such beforehand. A thread can use this self diagnosis during inflight processing of a request at particular check points in its execution, then reason on the behavior and measurements collected and take possible corrective action at that moment or in the immediate future such as holding back further processing of requests for a short time whilst an underlying resource is experiencing performance or reliability issues.

Building efficient and effective application performance management solutions on top of this approach and creating behavioral signatures from such change sets becomes relatively straightforward. Take a look at how change sets make up transactions in our management console.

Note: We firmly believe that it is this kind of innovation which is sorely missed in the Java runtime today and which could inspire a whole new crop of innovations in the area of self regulation, self correction, resource management and execution optimization once it is included.

Video: Here is a screen recording showing automatic collection of metering changes by the transaction metering extension without the need to explicitly use the Probes Open API.

Article: The Good and B(AD) of Application Performance Management Measurement

Using System Dynamics for Effective Concurrency & Consumption Control of Code

When we started out on the design of our Quality of Service (QoS) for Apps technology we were for the most part unaware of the field of system dynamics though we had come across references to it during our research into self aware, adaptive and self regulated software especially in the context of complex adaptive systems and emergent computing. So when we did finally get time to read Thinking in Systems we were pleasantly surprised to find such strong similarities in our thinking and resulting design for software system control and the model and approach used in system dynamics.

If you have not already read Thinking in Systems then please do so immediately as it’s going to play a critical part in the engineering of software and services in the cloud and beyond!!! In the meantime here is a definition from Wikipedia.

System dynamics is an approach to understanding the behaviour of complex systems over time. It deals with internal feedback loops and time delays that affect the behaviour of the entire system. What makes using system dynamics different from other approaches to studying complex systems is the use of feedback loops and stocks and flows“.

In system dynamics a system consists of three kinds of things: elements, interconnections, and a function or purpose. In terms of modeling, elements are represented by stocks and interconnections by flows to and from stocks as well as feedback loops relaying signals and sensory measurements. Stocks are the foundation of the system. They are things that can be counted or measured. The value or quantity of a stock changes over time through the actions of a flow. Flows drain and fill stocks. In system dynamics feedback loops are seen as the overriding mechanism that controls the accumulation and depletion of stocks in a system that exhibits a consistent pattern of behavior over time. Such feedback loops cause changes in the flows into or out of a stock, based on changes in the stock itself.

Feedback is a process in which information about the past or the present influences the same phenomenon in the present or future. As part of a chain of cause-and-effect that forms a circuit or loop, the event is said to “feed back” into itself.” – Wikipedia

Here is an example of such a model applied to populations. The reinforcing loop causes the population to grow (more people => more births) and a balancing loop causing it to shrink (more people => more deaths). If both the fertility rate and death rate remain constant the behavior is simple (and predictable).

Leaving aside feedback loops for now this looks very similar to the underlying reservation based mechanism that powers our QoS for Apps technology. Stocks in system dynamics become QoS Resources. Flows in system dynamics become QoS Services which on entering a method enhanced with QoS capabilities reserve (outflow) units from the resources pool (stock) and then release (inflow) such units back into the resource pool on completion (exit) of the method.

The ordering within the graphic is not natural in terms of how most would envisage a reservation process so another way is to imagine flows between stocks. The reservation depletes the resource stock resulting in an inflow into the reservation stock held by the service during its execution. Control is built into this model because generally there is an initial or fixed capacity defined for a resource and if there is no stock in the resource then no outflow will occur which means no execution of the method mapped to this outflow via service classification.

Applying this to thread concurrency control we would have a “pool” resource representing the maximum thread population allowed for a particular system (defined by flows/services and stocks/resources) and another resource representing the number of threads executing with reservations held on the “pool” resource.

Mapping such control dynamics to an internal system within an identifiable (by namespace) software package is as easy as adding the following system properties to a jxinsight.override.config file limiting execution concurrency to 10 threads for all methods in the com.acme.sysone package and its sub-packages.

Note: j.s.p is a short hand version of jxinsight.server.probes recognized by our measurement runtimes.

j.s.p.qos.enabled=true
j.s.p.qos.resources=pool
j.s.p.qos.resource.pool.capacity=10
j.s.p.qos.services=sysone
j.s.p.qos.service.sysone.resources=pool
j.s.p.qos.service.sysone.name.groups=com.acme.sysone

This is achieved without any code changes. The same code dynamically weaved into the application codebase at load time for the purpose of resource (gc, wall clock, cpu) consumption metering is used also to control concurrency of the code. This is how concurrency control can be achieved without resorting to changing an applications entire architecture to conform explicitly to some message/packet queueing paradigm and model. Calls in our model already embody messages and queues though granted such things are transient and the delay temporary (for liveliness reasons).

This approach to code level currency control is extremely powerful. Lets say you wanted to limit the number of concurrent executing requests at various executing layers in an application stack such as web, service and db. This can be done using a locally defined resource, l, that is not shared and visible only to each service.

j.s.p.qos.enabled=true
j.s.p.qos.services=w,s,d

j.s.p.qos.service.w.name.groups=com.acme.web
j.s.p.qos.service.w.timeout=l000
j.s.p.qos.service.w.resources=l
j.s.p.qos.service.w.resource.l.shared=false
j.s.p.qos.service.w.resource.l.capacity=20

j.s.p.qos.service.s.name.groups=com.acme.services
j.s.p.qos.service.s.timeout=l000
j.s.p.qos.service.s.resources=l
j.s.p.qos.service.s.resource.l.shared=false
j.s.p.qos.service.s.resource.l.capacity=15

j.s.p.qos.service.d.name.groups=com.acme.dao,oracle.jdbc
j.s.p.qos.service.d.timeout=l000
j.s.p.qos.service.d.resources=l
j.s.p.qos.service.d.resource.l.shared=false
j.s.p.qos.service.d.resource.l.capacity=10

The above will restrict the concurrency to 20 at the web layer, 15 at the services layer and 10 at the db layer but it does not restrict the global concurrency to 20 as a thread could call into the services layer without going to the web layer or call into the db layer without going through either of the other two layers. If you really need to limit concurrency globally across these services you can introduce a new shared resource (stock), g, into the configuration (model).

Note: You can’t change the local pool resource to a global resource to achieve such control as reservation obtained at the web layer will be (re)used in the reservation negotiation at the service and database layer.

j.s.p.qos.enabled=true

j.s.p.qos.resources=g
j.s.p.qos.resource.g.capacity=20

j.s.p.qos.services=w,s,d

j.s.p.qos.service.w.name.groups=com.acme.web
j.s.p.qos.service.w.timeout=l000
j.s.p.qos.service.w.resources=g,l
j.s.p.qos.service.w.resource.l.shared=false
j.s.p.qos.service.w.resource.l.capacity=20

j.s.p.qos.service.s.name.groups=com.acme.services
j.s.p.qos.service.s.timeout=l000
j.s.p.qos.service.s.resources=g,l
j.s.p.qos.service.s.resource.l.shared=false
j.s.p.qos.service.s.resource.l.capacity=15

j.s.p.qos.service.d.name.groups=com.acme.dao,oracle.jdbc
j.s.p.qos.service.d.timeout=l000
j.s.p.qos.service.d.resources=g,l
j.s.p.qos.service.d.resource.l.shared=false
j.s.p.qos.service.d.resource.l.capacity=10

Again this unprecedented level of fine grain control is achieved without any code changes and more importantly it can be configured externally to the application and tuned at deployment time for a particular environment or workload. It is also far easier to understand than the actual underlying concurrency code that would be needed to achieve the same effect. It is far more productive in that control mechanisms can be very easily experimented (with iteratively), only requiring a restart of the process under control.

Video: Checkout this screen recording demonstrating our “NO CODE CHANGES NEEDED FOR CONCURRENCY CONTROL” claims.

Note: In theory this configuration could be imported into a simulation tool and the dynamics of the system observed without actually running the code (assuming the presence and production of ingress flows).

What you are actually doing is taking a dynamic system defined externally then injecting it and its dynamics into the software, merging normal execution processing with resource control (and feedback loops). A wonderful unification of abstractions in system dynamics and concurrency control. This is what DevOps should be more about rather than automated deployment of packages.

The approach is incredibly versatile for example lets say you wanted to limit concurrency at the db layer for code that does not call (flow) through the web layer.

j.s.p.qos.enabled=true

j.s.p.qos.resources=g
j.s.p.qos.resource.g.capacity=20

j.s.p.qos.services=w,d

j.s.p.qos.service.w.name.groups=com.acme.web
j.s.p.qos.service.w.timeout=l000
j.s.p.qos.service.w.resources=g

j.s.p.qos.service.d.name.groups=com.acme.dao,oracle.jdbc
j.s.p.qos.service.d.timeout=l000
j.s.p.qos.service.d.resources=g
j.s.p.qos.service.d.resource.g.capacity=10

Looking at the above configuration you might think that you have actually limited the overall concurrency at the db layer to 10 (threads) but remember reservations are held until the service completes its execution. So when a call comes into the web layer it reserves a single unit (if one is available if not it waits) from the g resource then when it arrives at the db layer it uses this unit to proceed immediately without any further delay as the default reservation method is to calculate the required units as 1.

The capacity that has been redefined for the g resource within the d service is only applicable when additional units are required such as when a thread attempts to call into the db layer code base without having already obtained a reservation from the g resource at the web layer. It is important to note that this capacity restriction is in addition to the capacity restriction within the underlying resource so even if 10 threads not originating from the web layer called into the db layer they could still be blocked (not indefinitely) because of existing outstanding reservations at the web layer. The web layer, w service, is in effect able to over subscribe at the db layer on the g resource.

Up to now attention has been on concurrency control but this model is well suited to the control of resource consumption which is not to say that consumption can be reduced but that the rate of consumption within a time window can be reduced using system dynamics defined using our QoS for Apps technology. Which brings up an important aspect of system dynamics – delay.

Delays are ubiquitous in systems. Every stock is a delay…A stock takes time to change, because flows take time to flow…Stocks usually change slowly. They can act as delays, lags, buffers, ballast, and sources of momentum in a system.” – Thinking in Systems

We like to say that you don’t manage cost you manage the cause of such cost. In the case of software and resource consumption that cause is the code execution (and its caller chain). But as already stated above the goal is not to change what is consumed but the rate at which it is consumed.

Rate limiting is used to control the rate of traffic sent or received on a network interface. Traffic that is less than or equal to the specified rate is sent, whereas traffic that exceeds the rate is dropped or delayed.” – Wikipedia

This can be modeled in system dynamics as another inflow refilling a stock (resource) at regular fixed intervals up to a desired and maximum level.

Renewable resources are flow-limited. They can support extraction or harvest indefinitely, but only at a finite flow rate equal to their regeneration rate.” – Thinking in Systems

Returning to the web layer example the maximum number (lets say 2,000) of requests (throughput) allowed to execute in 1 second intervals (1,000 ms) can be configured as follows:

j.s.p.qos.enabled=true

j.s.p.qos.resources=t
j.s.p.qos.resource.t.rate.limit=2000
j.s.p.qos.resource.t.rate.interval=1000

j.s.p.qos.services=w

j.s.p.qos.service.w.name.groups=com.acme.web
j.s.p.qos.service.w.timeout=l000
j.s.p.qos.service.w.resources=t

The degree of concurrency can still be controlled by simply adding a capacity specification to the t resource.

j.s.p.qos.resource.t.capacity=20

In the configuration examples above the dynamics of the flow and stock system were simplified in assuming that all services consume (reserve) and replenish (release) only a single unit of a resources capacity. This default reservation behavior can be changed from “one” to “meter” or “inc” or “lease” or “timer” giving us much more variability in the flow rate across different services or mapped methods as in the case of “meter” or “timer” which track the resource profile of the probes to determine the required reservation units. This can result in much more interesting and useful feedback loops as demonstrated in the article titled “Optimal Application Performance and Capacity Management via QoS for Apps”.

Video: In this screen recording rate limiting is used to delay the frequency of a calls execution using multiple millisecond intervals.

There are some important differences in our approach that are specific to software systems. The first one is with regard to timed boxed delay. Each services can have its own timeout value which is the maximum delay that can be introduced by the injected dynamics in making a reservation. If this time expires then the code proceeds as normal skipping over the resource release phase in its enhanced execution. Another is reservation prioritization which allows for some services to jump ahead of the queue in the making of (or waiting on) resource reservations under contention across threads which we support via a priority.level setting at the service level and reservation lanes at the resource level.

Further Reading:
Optimal Application Performance and Capacity Management via QoS for Apps
Fairer Web Page Servicing with QoS Performance Credits
Dynamic QoS Prioritization
Introduction to QoS for Applications

How (not) to design a Metrics API – Part 2: Delegation & Separation

This is part two in a series of articles on how best to design an application metrics monitoring library, in particular its API, providing versatility both in terms of application of the library across domains & environments and in its implementation by one or more vendors. Please read part 1 if you have not already done so in which we introduce the alternative libraries that will be compared with the JXInsight/OpenCore Metrics Open API.

There are many definitions of what constitutes an “Open API” (or “Openness in an API”) but in our opinion there are at least four key ingredients: delegation, separation, extension and contextualization. In this article the first two will be discussed. The remaining two will be discussed in a forthcoming posting.

Note: When it comes to openness of the implementation we need to look at configuration and integration.

Delegation

The single most important requirement for any Open API should be the ability to use an alternative implementation that adheres to the API contract whilst offering additional benefits over the default implementation in terms of performance, resource management, reliability, scalability, extensibility, serviceability, tooling, integration and so on. If you can’t switch to an entirely different implementation then it most certainly is not open, irrespective of the ability to access the source, and its debatable whether it really is an API (more so a library).

To use an alternative implementation of the JXInsight/OpenCore Metrics Open API all that is needed is the setting of a service provider interface (SPI) system property specifying the service provider factory class that should be used to create an instance of the actual service provider.

-Djxinsight.server.metrics.spi.factory=

The Metrics class, representing the entry point into the API and the only implementation class in the entire API, will on class initialization create an instance of a MetricsProvider via the class implementation of the MetricsProviderFactory specified in the above system property.

Below is a snippet code showing what an implementation of the above SPI interfaces would look like.

And here is actual code from the Metrics class showing call delegation through to the provider.

Note: The Metrics class does not expose the underlying service provider factory or service provider.

Jammer

When its comes to delegation this metrics library, hosted on Github, looks very much like a bunch of implementation classes hobbled together. Whilst the Metrics class does delegate most of its calls to the exposed MetricsRegistry class instance there is no means whatsoever to switch in an alternative implementation of the MetricsRegistry class as a way to bridge (route) calls to a more mature and professionally engineered library. Of course MetricsRegistry being a class and not an interface makes this all the more impossible but we will come back to this later.

Note: The Metrics class also hardwires in the creation of a JMX integration which ideally should be optional, delegated to the underlying implementation and configured externally by operations staff.

Netflicks

This metrics library, also hosted on Github, does offer the means to delegate calls to an alternative implementation of the MonitorRegistry interface specified by way of a system property though the mechanism is somewhat cumbersome, convoluted and costly.

The DefaultMonitorRegistry class which is the entry point into this library creates an instance of itself, with a reference to an alternative implementation of MonitorRegistry. The static method, getInstance(), in the DefaultMonitorRegistry class then returns a reference to the INSTANCE named field of type MonitorRegistry (actually itself) which in turn delegates calls to an instance field named registry of type MonitorRegistry.

This interface coupling is unwarranted and hinders development of utility methods in the entry point class, DefaultMonitorRegistry, without polluting the MonitorRegistry interface. Not having a distinct SPI interface also complicates bootstrapping which usually needs additional methods and call backs to be introduced once a library goes beyond toy usage and needs to better manage the lifecycle of the underlying service provider.

Separation

To truly achieve delegation in an API there needs to be a separation between the API and one or more possible implementations (library) of the API. Delegation is rarely achievable in practice without the designer ensuring that no signatures or call paths expose an implementation class that prevents (restricts) alternative implementations offering the optimal solution in line with the intended goal(s) of such a library. Excluding supporting struct (value) and enum classes all should be interfaces except for the entry point class itself which serves to bootstrap and initialize the runtime.

We believe so strongly in this separation that there is only one class, Metrics, in the JXInsight/OpenCore Metric Open API and all others are inner interfaces within this class. Except for the SPI interfaces the whole Open API is enclosed in a single class (source) file. There is no implementation exposure or leaking of abstractions for that matter. The Metrics class ensures that the implementation is never exposed (at least “static”tically speaking).

Below is a class structure view of the JXInsight/OpenCore Metrics class within our favorite IDE. There are no other implementation (C) classes other than the Metrics class and no other packages needed by the client. All interface (I) classes are contained within this class.

Jammer

There is not a single interface used (referenced) in any of the many signatures in this library’s Metrics class. MetricName, Gauge, Counter, Histogram, Meter, Timer are all implementation class. It would be impossible to offer any sort of service delegation with this design unless all alternative implementations used the very same namespace and class names unnecessarily (over) exposed (which has implications with regard to intellectual property).

Note: Java interfaces should be used to indicate very clearly what is expected of implementations, communicate to users the possibility of different behavioral characteristics of such across implementations, and afford implementations the opportunity to optimize and tailor to the fullest.

NetFlicks

This metrics library also fails the separation test in exposing the MonitorContext implementation class and its inner Builder implementation class by way of the Monitor interface used in the MonitorRegistry interface. In addition the MonitorRegistry interface directly references the AnnotatedObject implementation class which in turn exposes the AnnotatedAttribute implementation class.

Note: Both open source libraries distribute various utility packages containing implementation classes which only serve to further lock-in client callers and effectively eliminate any possibility of competition in terms of implementation. Dependencies across such packages and the core API package are a mixed bag. It is highly unlikely that any client of either library would be portable to another implementation.

Note: If a library is truly designed to be “open” it should ship with two distributions. One that only includes the actual API itself and the other including both the API and the default (vendor) implementation. JXInsight/OpenCore ships with two libraries that do this very thing – opencore-api.jar and opencore.jar.

Part 3: Groups, Collections & Samples

How (not) to design a Metrics API – Part 1: “Millions of Metrics”

This is part one in a series of articles on how best to design an application metrics monitoring library, in particular its API, providing versatility both in terms of application of the library across domains & environments and in its implementation by one or more vendors. In the series we will discuss the underlying thought process, principles and patterns guiding the design and development of the JXInsight/OpenCore Metrics API and in turn compare it with other libraries that don’t exhibit the resulting qualities which have emerged from such software engineering discipline.

“Millions of Metrics” is a statement made by Adrian Cockcroft, Cloud Architect @ Netflix, in discussing the monitoring of applications deployed using their own proprietary PaaS solution. Granted it’s somewhat boastful (he was speaking on stage) but to those not very well experienced in application performance monitoring it not only sounds impressive it sounds like a goal worth pursuing which would be a mistake. The problem with this unqualified statement is that it does not make the distinction between an information model and a management model, a measure and a metric, a measurement and a sample. There is no software, model, process or human that can effectively use, scale and benefit from a set of such size at least not from a management perspective. It would be terribly (cost) ineffective as the behavior of most applications and systems are driven (determined) and signaled (distinguished) by a very small number of measures.

Note: It’s not certain whether Adrian confused the (measure) instance count with the (metric) type count.

The information model is the (super)set of all measures that can be monitored. The management model is the (sub)set that is sufficient and suitable for monitoring purposes being sampled and saved. It should always be possible to inspect on an adhoc basis the values of the measures pertaining to the information model but only a few of such measures in this model should form the management model that is itself manageable. A metrics library that does not make this distinction very apparent in the design of its interface, interaction and implementation will ultimately be a failure forcing engineers and operations to make poor trade-offs at inappropriate points and limiting its usage beyond very simple use cases.

The JXInsight/OpenCore Metrics Open API makes this distinction in having a separate class for Metric and Measure. Whilst instances of Measure (Counter and Gauge) are managed by the runtime only those instances of Measure that are registered with the runtime for the purpose of monitoring (sampling & collection) are viewed (associated) as a Metric. A Measure becomes a Metric when it is both managed and monitored. The Measure need not be a Metric though it can still be accessible to callers for the purpose of update and access. This is a much better option than having static (global) fields scattered throughout the code base holding references to such measures.

A benefit of this design is that more than one Metric can be mapped to a particular Measure under a different Name. This design also allows for the composition of a Measure from other instances of Measure and then only for this composite to be registered (associated) as a Metric. It also allows for the registration of a Measure as a Metric to be eliminated from the code itself and instead configured externally. The design does not tie the lifecycle of both types and it does not expose state and functioning that would otherwise be made accessible if both types were combined.

Here is a snippet of code showing how the JXInsight/OpenCore Metrics Open API is used to lookup (and create if not present) an instance of Counter, increment it and then register it with the runtime for sampling and collection purposes using its own Name.

Note: The registration of a Counter as a Metric can be done automatically using the jxinsight.server.metrics.counters=${c1},${c2} system property.

The following snippet shows the registration of a Counter as a Metric with an alternative Name mapping.

The interaction story line for a Gauge is pretty much near identical.

Note: The registration of a Gauge as a Metric can be done automatically using the jxinsight.server.metrics.gauges=${g1},${g2} system property.

Gauges like Counters can also be registered as a Metric under different Name mappings.

Note: Use of Counter and Gauge is optional as an implementation of Measure can be registered as a Metric providing extra immutability safe guards.

Due to a deliberate and stylistic design choice in using largely inner interfaces to represent elements of the model we can do a static import on the Metrics class and increase the readability of the code.

Note: Except for some struct & enum classes and the Metrics entry point class we only make public (inner) interfaces. This allows for different implementations of the runtime to offer alternatives for Counter and Gauge.

Note: Since the release of our Metrics Open API over 3 years ago we have not broken backwards compatibility and have never deprecated any interfaces or classes or methods. It took us a year to design the Open API prior to its public release. That time was well spent in getting the model, concepts, names and signatures optimal, aligned and consistent.

A Note on Alternatives

Before comparing the OpenCore Metrics Open API with two open source alternatives found on GitHub its important to note that our Metrics Open API was made public on the 15th May 2009 the same day that Amazon AWS announced CloudWatch and its own Metrics API. It remained publicly accessible up until Oct 2011 when we moved all our developer content over to our http://developer.jinspired.com site.

Since we don’t view the alternatives used throughout this series as examples of good design we have changed the package namespace whilst keeping them phonetically similar.

Jammer (dutch for pity)

This popular metrics library first published in Feb 2010 (though it was much more different then) is similar to OpenCore in using a main entry point class named Metrics within an enclosing metrics named package. Originally this class was named com.jammer.metrics.core.MetricsFactory but renamed in Dec 2010. Evidence of this factory heritage can be seen in its use of “new” as a prefix in methods which lookup and create a Counter.

There is no distinction between an information model and a management model. The Counter is a class (a poor design decision) that implements the Metric interface and which has one very peculiar method named processWith (which will be revisited in a future part).

In the creation of the Counter this intersection (joining) with the information model and management model is done automatically without recourse. Of course one could always instantiate the Counter without using the Metrics class but then it still needs to be added to some shared map structure to make it accessible to adhoc inspection tools. Which brings us on to the next issue with this API and that is the exposure of an underlying registry data structure, MetricsRegistry, which can be used to determine whether a Metric is available without actually resulting in its creation. Incidentally the interface also requires the caller to know the class type just to be able to determine the current value. There are far too many aspects of the underlying implementation exposed – even MetricsRegistry is a class and not an interface. It’s a very bad code smell in a class, Metrics, that is only a few lines.

Note: Version 2 of this library released in 2012 was a complete rewrite breaking any sort of backwards compatibility. Expect the same to occur repeatedly for version 3, version 4 and onwards.

For measures of type Gauge that same issues are present along with others including not offering an actual implementation of this abstract class in the library, usage of numeric (un)boxing and an alternative named value accessor.

Netflicks

The main entry point into this metrics library is DefaultMonitorRegistry which allows for instances of Monitor (a metric in our management model) to be registered via cumbersome call sequences such as the creation of a MonitorContext (which is effectively a metric name) using the fluent builder call pattern. There is a Metric class but that is in fact named incorrectly as it actually represents a timestamped record of a sampled monitor value. Pretty much every primary class in this library is poorly named (and the underlying implementation is not any better).

It makes the same fundamental mistake that all newcomers to metric monitoring make in not distinguishing the management model from the information model. Amazingly there is no way to lookup a Monitor (measure) and its value subsequent to it being added to the MonitorRegistry (model) without actually going through another management library, JMX, which it is naively coupled to (in a horrendous manner) the underlying implementation. Leaving aside the JMX access path there is in fact no information model which is wrong as both models are needed though not always at the sometime and under the same (incident mgmt) circumstances.

That said it does at least define a getValue method in the Monitor interface which is extended by both the Gauge and Counter interfaces though this is largely a reflection of it being simply an embellishment on JMX which has many other problems in particular in its name identifiers which this library has also inherited via what it calls “tags” in the MonitorContext.

Here is how a Counter is published and registered (and made unaccessible).

The Gauge has a similar call sequence though not without its own issues in numeric boxing.

Note: To make all metrics registered with the OpenCore Metrics Open API be accessible via JMX all that is needed is a single system property to enable this extension jxinsight.server.metrics.management.enabled=true.

Part 2: Delegation & Separation

Determining whether an Application Performance Hotspot is a Performance Limiter

In a recent post titled A Preliminary Performance Analysis of the Vert.x (vs Node.js) HTTP Benchmark a number of latency performance hotspots were identified. In this post we introduce a novel approach in determining how much of an impact a potential change (performance tuning or degradation) would have on the overall throughput of an application under test (and in an environment that maybe oversubscribed in terms of resource capacity) using Quality of Service (QoS) for Apps.

Note: A significant benefit in doing so is we don’t have to actually spend hours/days looking for potential (narrowly focused) performance gains and then only to find out in testing such changes result in no measurable difference.

Lets first revisit the latency hotspots identified after numerous performance instrumentation refinements based on previous run metering data in which we impressively whittled down the number of methods metered from 650 to just 13 without ever looking at the source code or using developer guesswork.

Here is a chart of the HTTP request throughput rate reported by the Vert.x counter process on an iMac Intel Core i7 (4 Core, 2.8 GHz) with Oracle’s Java 1.7. All processes were run on the same machine with both client and server runtimes having 4 set as the instance parameter.

To see whether any change in the performance of a hotspot impacts the overall throughput of the system we will introduce a number of different timed delays using the following Quality of Service (Qos) for Apps configuration which specifies that a particular probe (a metered method) be enhanced with QoS capabilities.

Note: j.s.p is a short hand version of jxinsight.server.probes recognized by our measurement runtimes.

j.s.p.qos.enabled=true
j.s.p.qos.resources=q
j.s.p.qos.resource.q.capacity=1
j.s.p.qos.services=vm,fs
j.s.p.qos.service.vm.name.groups=org.vertx.java.deploy.impl.cli.VertxMgr.main
j.s.p.qos.service.vm.resources=q
j.s.p.qos.service.fs.name.groups=org.vertx.java.core.file.impl.DefaultFileSystem$11.action
j.s.p.qos.service.fs.resources=q
j.s.p.qos.service.fs.timeout.unit=micro
j.s.p.qos.service.fs.timeout=5

What will happen is that on startup the main method will immediately grab and hold the only unit of capacity in the QoS resource associated with both QoS services defined. Then when the action method in the DefaultFileSystem$11 class is executed (method entered) it will request a resource reservation of 1 unit, timing out after 5 microseconds because none are available in the pool associated with the resource and then proceeding with its normal execution. All of this behavioral change comes with no code changes as the JXInsight/OpenCore agent need only instrument the specified classes at class load time.

Below is the new request throughput rate (in orange) following this change to the agents jxinsight.override.config file. Surprisingly there is an actual increase in the overall throughput (which we will come back to later).

Note: With waiting caused by reservation on a starved virtual QoS resource we are introducing a time delay but not necessarily consuming precious processor time.

Here is a second re-run of the test this time with the (delay) timeout set to 10 microseconds. The green line shows a drop off in the throughput but not yet below the initial baseline.

j.s.p.qos.service.fs.timeout=10

With the timeout set to 100 microseconds the throughout now drops significantly down from approximately 45K/s to 20-25K/s.

j.s.p.qos.service.fs.timeout=100

We can repeat the above set of experiments for another identified hotspot this time the SocketSendBufferPool$PooledSendBuffer.transferTo method.

j.s.p.qos.enabled=true
j.s.p.qos.resources=q
j.s.p.qos.resource.q.capacity=1
j.s.p.qos.services=vm,nt
j.s.p.qos.service.vm.name.groups=org.vertx.java.deploy.impl.cli.VertxMgr.main
j.s.p.qos.service.vm.resources=q
j.s.p.qos.service.nt.name.groups=org.jboss.netty.channel.socket.nio.SocketSendBufferPool$PooledSendBuffer.transferTo
j.s.p.qos.service.nt.resources=q
j.s.p.qos.service.nt.timeout.unit=micro
j.s.p.qos.service.nt.timeout=5

The orange line shows that with a small time delay of 5 microseconds there is already an observable difference in the throughput.

Increasing the time delay to 10 microseconds resulted in further reduction in throughput but not as much as the 5 microsecond delay over the original baseline.

j.s.p.qos.service.nt.timeout=10

When the timeout is increased to 100 microseconds the overall throughput is greatly reduced – by more than two thirds of the baseline. This is the greatest reduction of the two primary hotspots though it should be noted that the frequency of the transferTo method is twice that of the action method.

j.s.p.qos.service.nt.timeout=100

A similar set of test runs were conducted with time delays introduced into the ReplayingDecoder.callDecode method. Here is the final chart which indicates that changes to this method have very little impact on the throughput of the system under test including the processing capacity (which was fully utilized). In fact the timeout setting had to be changed to 1,000 microseconds before a noticeable drop in the throughput was observed.

Granted this method is executed at 1/4 of the frequency of the transferTo method but even at 10,000 microseconds the throughput was still close to transferTo rate at a 100 microsecond timeout. How can this be happening? Well the first clue is that when the instance count in both client and server is dropped down to 1 the throughput rate remains relatively high at 42K/s. The second clue is that overall processor utilization at this instance count level is just over 50%. Clearly the delays introduced above did not always impact throughput as much as expected because there was already implicit queuing and contention built up with the instance count at 4. This also explains why when a small delay was introduced there was a slight (contradictory) increase in the throughput as we indirectly eased contention on these implicit and explicit queues.

Because of the asynchronous nature of the underlying execution the methods classified above as latency hotspots are not exactly in the classical sense (response time) as their execution is not performed sequentially within a critical path. These methods are best classified as performance limiters in that they feed work to over hotspot methods (and worker threads) via queues. As long as such queues are never completely depleted the throughput is not adversely impacted though as seen from the above charts some methods and their associated queues have a much more pronounced impact on the overall throughput than others though not always reflected in their latency profile.

Note: When I first started writing this post I used “performance inhibitor” in the title but following discussions with Dr. Neil J. Gunther a leading expert in performance capacity planning I reverted to using “performance limiter” which he used in describing a similar behavior in his highly recommended Guerrilla Capacity Planning book in particular sections discussing his universal scalability law (USL) that is likely to become increasingly more relevant with increasing parallelism in request processing.

Further Reading:
Optimal Application Performance and Capacity Management via QoS for Apps
Fairer Web Page Servicing with QoS Performance Credits
Dynamic QoS Prioritization
Introduction to QoS for Applications

Optimal Application Performance and Capacity Management via QoS for Apps

We are often called in to help companies solve application performance (management) problems that are in fact capacity (management) problems – well to be more specific resource management problems. This generally entails profiling, protecting, policing, prioritization and predicting resource consumption requests (or reservation). In such cases the resolution of the performance problem, bottleneck, goes hand in hand with the management of application level resource capacity as it is common for the removal of one performance resource bottleneck to introduce a much greater bottleneck further downstream once the flood gates have been opened upstream. Unfortunately knowing that such constraints exist within an application (or system) is the relatively easy part. The hard part is deciding how to control the (work) flow that elicits such related resource consumption and execution behavior.

Note: A performance bottleneck is used here to refer to points in the execution in which throughput is decreased and/or latency increased.

Somewhat counter-intuitive we generally need to introduce delays (choke/throttle/shaping points) or buffers in order to meet overall performance objectives per some service level agreement (SLA). But again the task in setting parameters for such controlled delay are not as straightforward or and the result not always (near) optimal at least not initially. In the article I will show how our activity based resource metering technology helps alleviate some of the trial and error (and possible waste) involved in the introduction and configuration of such control points using the Quality of Service (QoS) for Apps technology in JXInsight/OpenCore. and combining it with self adjusting feedback loops.

To demonstrate the application of QoS in the management of performance via the management of (resource) capacity I have constructed a test application that introduces one of the most common problematic constraints (bottlenecks) in many enterprise Java applications – sub-optimal heap memory management (capacity & usage) leading to frequent and possibly prolonged garbage collection cycles and other related issues (context switching and associated memory costs).

The Worker class listed at the bottom of this article mimics the servicing of a request in particular the general high allocation rates of many small objects (strings, collections). For each invocation of doWork 10,000 BigDecimal instances are added to a LinkedList.

Below is a chart depicting the number of total units of work (throughput) done across different number of concurrent threads starting from 1 and up to 64 (in powers of two). The tests were performed on an iMac Intel Core i7 (4 Core, 2.8 GHz) with Java 1.6.0. The throughput peaks at 2 and drops slightly at 4 threads and then plummets (buckles) at 8 threads.

One way to address the problem would be to limit the concurrency level upstream in the request servicing or in the case of our simple application at the primary execution point – doWork.

This is very easy to do with JXInsight/OpenCore. We simply install the instrumentation agent which dynamically instruments loaded application classes and methods with metering probes. Then we enhance a smaller subset of these instrumentation points (probes) with QoS for Apps. Here is the jxinsight.override.config file I used to do this.

jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.aggregates.enabled=false

jxinsight.server.probes.qos.enabled=true
jxinsight.server.probes.qos.filter.enabled=true
jxinsight.server.probes.qos.filter.include.name.groups=QoS4Apps$Worker.doWork

jxinsight.server.probes.qos.resources=cpu
jxinsight.server.probes.qos.resource.cpu.capacity=4

jxinsight.server.probes.qos.services=worker
jxinsight.server.probes.qos.service.worker.name.groups=QoS4Apps$Worker.doWork
jxinsight.server.probes.qos.service.worker.timeout=1000

With the above configuration every time the worker QoS (virtual) service, mapped to the doWork method, executes it will reserve a single unit of capacity from the cpu named QoS (virtual) resource. This reservation will be held until the worker service finishes its execution (the doWork method completes). In setting the capacity of the cpu named resource to 4 we have restricted the degree of concurrent execution of the doWork method beyond the immediate entry into it in which are metering instrumentation has been injected.

Note: QoS for Apps is extremely innovative in its approach to resource management that involves modeling the system dynamics in terms of services (flows) and resources (stocks) then mapping these logical/virtual modeling elements to named probes (qualified packages, classes or methods as well as web urls or sql strings) and combining it with feedback loops built around metering measurements.

Here is the revised work unit throughput chart from 8 concurrent threads up to 64. The sharp drop off in throughput at 8 concurrent threads has vanished and we did not have to change any code or introduce an explicit dependency on an elaborate work/resource management framework.

The above was a good start at introducing some late binding execution control but how can we be sure that 4 was the right setting which I based on the number of cores on the machine. What if we had another workload characteristic that allowed higher concurrent thread levels? Well if you look back at the first chart you will see it was possible to get just over 10% more throughput with only 2 concurrent threads.

Ideally we need a way to introduce a feedback loop into the reservation system that automatically adjusts up and down the required reservation units based on the performance of the worker service execution. Here the units don’t equate to threads but time itself. Knowing the average time is less than 1,000 microseconds for a worker service request the capacity is now set at 4,000 on the cpu named resource. And instead of using the default "one" reservation strategy the resource is configured to use the "meter" reservation strategy which predicts the reservation requirements of a service wishing to proceed based on the metering profile of the probe associated with the service at that particular point in the threads execution.

jxinsight.server.probes.qos.resource.cpu.capacity=4000
jxinsight.server.probes.qos.resource.cpu.reservation=meter
jxinsight.server.probes.qos.resource.cpu.meter=clock.time

With the feedback loop if the initial reservations are understated we will get a build up in the contention for the resource (constraint) which will then over time increase the latency of the servicing (which is metered). This then turns increases the required reservation for future services leading to faster depletion of the resource capacity and more introduced delay as threads wait for the reserve capacity to be released by in progress probes. Likewise if predicted reservations are temporarily overstated the response time will drop with less contention leading to a reduction in the reservation requirements and further increasing the level of concurrency. Here is the system diagram.

The resulting throughput has increased for tests runs with 8 and 16 threads but there is a slight dip at 32 and 64 which will be address shortly.

The dip above was more than likely caused by a large increase in the estimated (predicted) reservation units leaving some capacity in the pool (stock) but just not enough for another thread to proceed with its execution. Over a much longer time window (> 1 min) this would probably have automatically corrected itself. But it can be remedied by putting a maximum limit on the reservation that a worker service can make of the cpu resource ensuring we always have at most 2 concurrent threads.

jxinsight.server.probes.qos.service.worker.resource.cpu.max=2000

Here is the revised system diagram with the "max=2000" influence on the reservation flow included.

The change in the configuration has delivered the desired (if near optimal) throughput results.

Here is a bar chart comparing the results across all tests runs. The baseline represents the behavior of the system without any QoS enhancement.

Up to now the focus has been on throughput so what of the response time. Well during the testing I collected metering quantization data. Here is the distribution for the single threaded baseline execution.

Here is the baseline execution with 64 concurrent threads. The distribution has changed significantly with many of the requests above 500 microseconds and the frequency reduced to less 25% of the previous.

Here is the service time distribution with 64 concurrent threads using the first QoS configuration. Much much better with a clustering around 256 and 512 microseconds.

Note: The metering measurements include any possible delay introduced by the QoS runtime during the waiting on reservation requests which had maximum timeout 1000 milliseconds in the above configuration after which a kind of barging occurs without reservation or release by the service.

With the "meter" reservation strategy and with the capping of the reservation requirements we get extremely close to the single thread execution behavior for a 64 thread test.

Here is the Worker class used in the testing.

And here is the remaining snippet of the class code used to generate the test workload.

Further Reading:
Fairer Web Page Servicing with QoS Performance Credits
Dynamic QoS Prioritization
Introduction to QoS for Applications

Prediction is the Future in Application Performance Management & Software Optimization

There are a few general strategies to improving the performance (speed) of a software application.

1. Not Working. You can’t beat this in terms of speed. Always try to avoid doing (or generating) work in the first place. Unfortunately most of the time we must do something to deliver a service but we should always keep this in mind when tackling any performance problem. This is one area in which computers could learn from some of the “best” of us. The computer says No.

2. Working Harder. Getting more out of the hardware and resources available to the software possibly upgrading, improving or aligning the hardware components to the nature of the workload. At present we have reached some physical limits.

3. Working Smarter. Improving the efficiency of the software in terms of algorithmic work needed to be done and possibly its ordering (coalescing) and operation (collocation). Sometimes in applying this we need to do more work but of a different nature to reduce the cost of current or near immediate more expensive work to be done (i.e. sorting => searching). The first strategy is this taken to an extreme.

4. Working in Parallel. Getting more work done in a much shorter time window by adding computing capacity and consumers of such capacity which process (map & reduce, split & stitch) smaller parts of the overall execution concurrently (simultaneously). A lot of the current focus is here both in a distributed and non-distributed form.

Even with the best programming models (message/actor based), languages and runtimes (Java virtual machine) in the world its still pretty hard to achieve the speedups required (or anticipated) whilst fully utilizing the ever growing computing capacity (100 -> 1000 cores) due to sequential nature of our thought process (not the internal workings of the mind) in such endeavors and the obvious physical limits (bottlenecks) that lie elsewhere (IO) in the processing pipeline. Once we have hit the limits with parallelization of an activity, task, job, request, interaction, etc…we are left with prediction and dynamic adaption in our performance arsenal.

So what can be done differently to move beyond such limits? We believe to make application software faster we need to combine and apply all of the above strategies but not within different phases in the application lifecycle, not at different places in the application architecture, not at a particular (static or reference) point in time, not with an expected performance model in mind. No these strategies needs to be applied all the time and just in time by the software itself (in parallel). To make things faster (in terms of the critical path) we need to have the software work far far much harder than it does today but work that is very different than what is the primary function. This “harder” secondary work will be done in parallel even possibly replicated (duplicated) across many processing units (cores). This work will be much smarter in predicting the near realtime (immediate) needs (data, resource) of the primary function execution path with the optimal result that very little or no work is done at least along the primary critical performance path (at path that is very likely going to be adaptive which such adaption done in parallel). For every primary processing unit (or worker) there will be one or more secondary collocated processing units (or supervisors) observing the primary function and predicting its execution path, behavior and resource needs and using such predictions to reserve capacity, allocate and load resources, pre-compute intermediary results, make online adaptions to the software algorithms and execution paths as well as prioritizing work. Eventually we will get to the point that such secondary work in the form of profiling, protecting, policing, prioritizing, predicting and provisioning will dominate the actual capacity consumption profile of an application as users demand much lower response times. Strangely enough there is going to be more waste and cost in matching growing expectations of lower response times.

Note: We can have multiple primary processing units fulfilling a single service request and for each such unit multiple secondary processing units observing, supervising, controlling, predicting, prioritizing,….

Curling Concurrent Computation

A simple sport analogy would be curling “in which players slide stones across a sheet of ice towards a target area…The curler can induce a curved path by causing the stone to slowly turn as it slides, and the path of the rock may be further influenced by two sweepers with brooms who accompany it as it slides down the sheet, using the brooms to alter the state of the ice in front of the stone. A great deal of strategy and teamwork goes into choosing the ideal path and placement of a stone for each situation, and the skills of the curlers determine how close to the desired result the stone will achieve” – From Wikipedia, the free encyclopedia

“Curling is the only sport where the trajectory of the projectile can be influenced after release. Players “sweep” the ice directly in front of the curling stone to decrease the friction of the stone with the ice” – The Physics Of Curling

Another example of a sweeper, this one trailing, is garbage collection. We just now need to push such self regulated mechanisms further up the application stack as well as its abstractions and models.

Note: There is another way to address speed that does not actually lower the response times of requests themselves and that is for each interaction to do more “valued” work for the user in the same amount of time playing to our ever increasing parallel capacity and capability.

Real(& In Time) Management of Application Performance

To make such a future possible primary workers and secondary supervisors need a common collocated supporting runtime that exposes an observation (measurement) model and control (feedback) system. A realization of such a runtime we believe is found in JXInsight/OpenCore’s 3 key technologies: Probes (metering), Metrics (monitoring) and Signals (fingerprinting).

Today’s legacy application performance monitoring (APM) products add no value to the software they attempt to manage. This “management” occurs in the form of changes over many generations (revisions) of the application but never actually within the lifetime of a single incarnation of the application itself. Tomorrows future application performance management solutions (which we are selling today ;-)) will optimize, control, coordinate, prioritize even adapt the application being managed. There won’t be a management dashboard at least not one with big RED and GREEN circles representing some condition on an observation of an application measurement (or metric). What will be communicated to operators is the effectiveness of the secondary processing in continually predicting and adapting the software. In terms of operations the secondary work becomes the primary concern.

Detecting hung threads in Java with call execution stack marking and tagging

The JVM (and probably many other languages & runtimes as well) sorely misses three very useful serviceability features with regard to call execution stacks:

  • ids: a stack frame is given a unique (within the thread context) identifier prior to being pushed onto the call execution stack
  • marking: recording the number of mark operations performed since a frame was pushed onto the call execution stack
  • tagging: a stack frame will take either a tag value set at the thread level or process level prior to being pushed onto the call execution stack

With identifying, marking and tagging the job of detecting hung or delayed threads (deadlocked, busy or spinning) is made far easier because it is possible to easily determine whether a repetitive call stack trace across multiple dumps is indeed the same execution call path instance or an entirely new execution (for the same thread). This is made all the more problematic with thread pools and workers (code blocks) that follow the same flow but which operate on different data types not discernible from the code itself.

Fortunately it is possible to enrich a Java runtime with these capabilities using JXInsight/OpenCore’s dynamic instrumentation agent and extensible activity resource metering technology. It gets even better in that we can limit these enhancements to particular namespaces in the code base (that which is of most interest). This can be done remotely from within the JXInsight/OpenCore management console (@see operation toolbar) or locally using the Probes Open API which is accessible to the application as well as metering plugins.

To demonstrate the enrichments to the runtime I downloaded DataStax Cassandra distribution along with its Pricing Portfolio demo application.

Before launching the Cassandra server process on my Mac OS X I set the following environment variable.

export JVM_OPTS="-Xmx256M -agentpath:/OpenCore/bin/osx-32/libjxinsight.jnilib=prod -javaagent:/OpenCore/bundle/java/default/opencore-ext-aj-javaagent.jar"

In the root installation directory (from which the bin/cassandra script was executed) I created an empty jxinsight.aspectj.filters.config file since the instrumentation agent by default excludes org.apache.* code.

I then created a jxinsight.override.config file with the following metering extensions enabled.

jxinsight.server.probes.console.enabled=true
jxinsight.server.probes.stack.enabled=true

Here is the Stack Table view on startup of the server. There are 5 threads listed as currently executing incomplete metered method invocations. The Id column is the unique identifier for the frame within the context of the thread. The # column is the call path depth from a root caller perspective with 0 being the bottom of the stack. The Count column is the number of frames that have been pushed onto a threads execution call stack since a particular frame was pushed (this calculation is deferred until the collection point). The Count Δ column is the change in the Count column since the last mark. No mark had being performed at this stage. The last column, Δ, represents the number of marks that have occurred during the metering period of the frame.

Here is a snapshot taken after starting the bin/pricer script that comes with the demo application with the parameters -o UPDATE_PORTFOLIOS. Many new threads have been started and are busy executing. From the Count column we can see the amount of work, frames pushed (and popped), that has gone on underneath (within) a call frame’s execution (scope). The Thrift-3 named thread has already executed 61,188 instrumented and metered method invocations.

The Count column tells us how busy a thread or frame has been in terms of executed instrumented methods but that it is not enough as we need to be able to see whether it changes over time on a per frame basis. This is what marking attempts to remedy. Below is a snapshot taken using the Refresh command following the execution of the Mark command and a pause of 1 second. Those frames with non-zero Count Δ values have called (directly or indirectly) other instrumented methods. Those with a value in the Δ column were pushed onto the execution stack prior to the mark and have remained on the stack since.

The next snapshot was taken following a Mark, a pause of 1 second, and a Reset which is a Mark and Refresh combined. From the table we can see that the *.run methods have reached 3 marks. But we can also see in thread Thrift-8 that called frames have been active prior to the Mark and up to (and possibly following) the Reset. We can now say that these frames have at minimum an execution time of 1 second. If we had paused for 10 seconds and had the similar results we could then claim a 10 second minimum execution time (delay).

Here is a snapshot following a Mark, pause, Mark, pause and Reset (Mark and Refresh). We don’t see any call frames other than the *.run methods go above 3 marks.

Instead of marking to detect execution delay we can use tagging. I added the following system properties to the jxinsight.override.config to enable tagging in the metering engine and to set the process tag to “startup” before the very first frame is pushed.

jxinsight.server.probes.tag.enabled=true
jxinsight.server.probes.tag.global=startup

Below is a snapshot taken following the completion of the server startup with tagging enabled. In the management console I then used the Tag command to clear this value for subsequent frames (and threads).

After starting the bin/pricer script as above I set three tags, “first“, “second” and “third“, with a very slight pause between each setting. Those with a tag were pushed onto the stack when that particular tag was active (set).

Here is a snapshot following termination of the bin/pricer script which generated the server workload. The number of threads and frames have shrunk back down but we also now know that thread Thread-3 which was present from the start is now at a different execution flow point with the “third” tag set.

JXInsight/OpenCore 6.3.5 Released – Dynamic QoS Prioritization

Today we have published the fifth update to JXInsight/OpenCore 6.3 on our developer site which includes the ability to dynamically set the QoS priority level for an executing thread invoking instrumented methods.

Prior to this release prioritization was defined at the QoS service level mapped to one or more instrumented methods. Methods with different service classifications but common mapped QoS resources pools would then be prioritized differently in making required reservations on such resource pools. But it was not possible to change the prioritization of the same method (and in turn QoS service) based on some dynamic aspect of the execution context such as the customer type or service entry point. This can now be achieved by setting the named value, qos.priority.level, in the Probes.Environment interface associated with the current thread context.

To demonstrate this exciting new capability I have created a basic small test app. Here are the main executing classes mocking a service request dispatcher, a transaction and its units of work on an exclusive resource. The Worker.run method continuously executes Transaction.doWork until stop has been signaled. The Transaction.doWork method in turn calls the Unit.doWork method for a predefined number of times and the Unit.doWork method spins for a fixed time period.

Here is the enclosing class, environment.Basic, which drives the execution of the above code in our tests reported below. In the doTest method there are two different constructor calls made. One that passes in the “fast” counter with a priority level of 1. The other passes in the “slow” counter with a priority level of 0.

Note: The transaction time is approximately 20,000 nanoseconds (20 microseconds), ignoring resource contention, which is extreme to say the least for any application of QoS but we like to push ourselves and our technology to the limits.

Executing the above without any instrumentation produces the following throughput results with unsurprisingly no real differences in the fast or slow work counters.

fast=1,988,256
slow=1,975,659

Lets change the Worker class to set the priority level for any QoS enabled method invoked by the current executing thread.

Note: The actual setting of the priority level need not be done explicitly in the code. It could have been done using the many extension points in our metering runtime including ProbesPlugin[Factory] and ProbeInterceptor[Factory].

There are two points in the execution in which QoS prioritization can be applied – Transaction.doWork and Unit.doWork. Lets start with Transaction.doWork.

[jxinsight.aspectj.filters.config]
environment.Basic$Transaction.doWork

[jxinsight.override.config]
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.qos.forwarding.enabled=false
jxinsight.server.probes.qos.enabled=true
jxinsight.server.probes.qos.environment.priority.level.enabled=true
jxinsight.server.probes.qos.resources=db
jxinsight.server.probes.qos.resource.db.queue.priority.enabled=true
jxinsight.server.probes.qos.services=tx
jxinsight.server.probes.qos.service.tx.name.groups=environment.Basic$Transaction.doWork
jxinsight.server.probes.qos.service.tx.resources=db
jxinsight.server.probes.qos.service.tx.timeout=0
jxinsight.server.probes.qos.service.tx.timeout.unit=micro

Three tests were performed changing the timeout (in waiting for a QoS resource reservation) in the jxinsight.override.config and the fast priority level in the code as shown here set to 7.

Here is a comparison of the throughput results with the priority level for fast set to 1 and 7 and the timeout set to 0 and 100. OpenCore’s metering engine is now actively controlling the execution of metered method invocations based on the setting of qos.priority.level named value in the Environment of the executing thread. More importantly this enhancement was done dynamically at runtime – no new frameworks, no new programing models, no new dependencies. And it is all driven by an externalized policy.

Its incredibly easy to change the execution point at which the QoS enhancement is applied. Here is a revised configuration for injecting prioritization into the Unit.doWork method instead.

[jxinsight.aspectj.filters.config]
environment.Basic$Unit.doWork

[jxinsight.override.config]
jxinsight.server.probes.strategy.enabled=false
jxinsight.server.probes.qos.forwarding.enabled=false
jxinsight.server.probes.qos.enabled=true
jxinsight.server.probes.qos.environment.priority.level.enabled=true
jxinsight.server.probes.qos.resources=db
jxinsight.server.probes.qos.resource.db.queue.priority.enabled=true
jxinsight.server.probes.qos.services=unit
jxinsight.server.probes.qos.service.unit.name.groups=environment.Basic$Unit.doWork
jxinsight.server.probes.qos.service.unit.resources=db
jxinsight.server.probes.qos.service.unit.timeout=0
jxinsight.server.probes.qos.service.unit.timeout.unit=micro

Here are the throughput results compared and charted. A slight degradation in the over total due to the application of QoS prioritization to a method with an execution time of just 4 microseconds and a QoS reservation frequency increase of 80 per transaction – 20 x (1-5)).

QoS for Applications – An Introduction

Visions of Cloud Computing – PaaS Everywhere

Just came back from the CloudConnect conference in Santa Clara and after visiting a very large number of vendor booths on the expo floor I was somewhat disappointed at the lack of progress in the solutions being offered. There was pretty much no differentiation. Every vendor had their “me too” management console focused on VM provision and monitoring with some degree of AWS integration. There was some sparks of innovation in the storage pace but nothing radical that would really take us all to the next stage in cloud computing – getting rid of the computer (at least vm image). In fact some of the vendors present had pretty much nothing to offer specific to cloud computing except for maybe they use the cloud themselves to generate load to performance/stress test other sites (not necessarily in the cloud). There was a lot of talk but its translation to the cloud and its resulting transformation in the engineering community was largely absent. I don’t even recall seeing any PaaS signage on the expo floor.

Maybe CloudConnect is not the right conference for this or that PaaS is viewed to be a market that will eventually only be dominated by a few very big vendors not present except for Microsoft (who offered the nicest t-shirt). Its a great conference to connect with others pushing ahead (in their minds and hallway chats) but the conference itself really has an looming identity crisis beyond this point (at least for me) which is probably a result of the cloud encompassing so many technologies, processes, service delivery models even ideologies (Ops => DevOps => NoOps).

On the long haul home I decided it was time to blog on some of the visions I have of cloud computing that I would like to eventually see become subject matter at such a conference with this first one looking briefly at PaaS but not as most see it today which is largely focused on taking existing enterprise apps deployed to a fixed number of application server targets in some data center and pushing them out to the cloud with a potentially unlimited number of application server targets and in many cases switching in cloud service based version of services offering access to a resource pool of storage and computing.

After spending considerable time engineering CORBA and grid based solutions the scalability problems the above approach presents (on many levels) is all too familiar. But what if in designing such interfaces engineers offered a means to see and touch more of the data (form and function), the data hidden behind the remote interface because of matters related to this mode/style of interaction. What if there was a remote interface as well as a set of local interfaces which could be accessed by code pushed to (or pulled from) the remote service. The code would execute within some managed (and metered) shell (or container), collocated with the data and functions, and controlled by the context within its own payload. What if the code could transmit responses or signals back to its source whilst it migrated the flow of execution (or control) from one service end point to another. Every cloud service provider would then become a PaaS solution provider but in a narrow domain specific way. Instead of the application (or web app) being the deployment unit it would be an activity containing data, code as well context which would expose metadata for the purpose of security, metering, quality of service (QoS), billing, routing and service delegation (a subject of the next vision blog).

This is not as far fetched as one might initially think. We already have client browsers on end user devices performing some of aspects of this today as well as SaaS solutions offering ways to execute custom business logic under very controlled (governed) conditions. The future of cloud computing it not about provisioning vm instances…its about being able to move our (transient) data, code and workflow (process) from one computing/storage device to another…its about abstracting the consumption of computing power and in doing so expanding the potential capacity beyond that of a single public provider even one as big as Amazon AWS. Its about creating a large service economy in the cloud and applying service supply chain management in the design and deployment of applications and services.

If this could be realized we might actually come back to the situation in which software vendors don’t have to ship proprietary (black box) processors and storage devices with their software which is kind of a like what SaaS is today without the ability to hug such devices. In theory a software vendor would make available their software at some point on the grid with resource consumption (or “provisioning” ) done via one of more service’s (delegates) exposed in the activity context and owned (or billed) to the calling client (or app or user). Do we really want every service provider needing to provision and be billed for computing capacity when the cause of such consumption rests elsewhere in the caller chain?

In forthcoming blogs I would like to explore in greater detail how this could be achieved and the challenges the industry faces without some standardization of the service context meta data needed for safe, efficient and (cost) controlled execution and flow migration.

Metering In The Cloud – Visualizations Part 1

The Open Group’s Service-Oriented Cloud Computing Infrastructure (SOCCI) Framework

The Open Group has published a new industry standard, Service-Oriented Cloud Computing Infrastructure (SOCCI) Framework, that outlines the “concepts and architectural building blocks necessary for infrastructures to support SOA and cloud initiatives”. In the standard metering is given much more prominence than any other proposed standard to date starting with the key characteristics of cloud computing:

http://www.opengroup.org/soa/source-book/socci/cc_char.htm
Measured service: Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be managed, controlled, and reported providing transparency for both the provider and consumer of the utilized service.

In the chapter, Enabling Applicable Architectural Layers, metering is seen as an architecture building block which very much aligns to our view of metering in the context of PaaS though we take a much more ambitious view of metering in it being the primary source, cortext, of signals for controllers and sensors present within the other architecture building blocks.

Article: OpenCore – A Cloud Cortex for PaaS

Article: Tackling the performance analysis of startup times

In this article, Tackling the performance analysis of startup times (PDF), posted on our developer site we show a novel approach we use in the performance measurement of software startup times, in this case JRuby, which for most profiling solutions is hard work if not impossible presenting challenges to even the most advanced performance management solution that is adaptive and cost aware.

The Complexity and Challenges of IT Management within the Cloud

Following on from Feedback & Control signals its arrival in the Cloud & Enterprise which touched on complex adaptive systems and feedback loops (or adaptive control) there are two dimensions that should be considered in the management of applications in the cloud – time and space. In addition consideration needs to be given to such characteristics as diversity and dynamism which compound the problems created by such dimensions in the context of computing and its management. Read more

Feedback & Control signals its arrival in the Cloud & Enterprise

Judging by the postings this weekend from prominent cloud bloggers on the management of complex systems via control & feedback loops it appears that our shared vision of application software, in the cloud or enterprise, that is inherently self-aware (in particular cost-aware) and self-regulated, is gaining supporters, promoters and soon more early adopters. Read more

Article: Metering JVM Jitter without a Hiccup

Recently Azul Systems, a technology partner, announced an open source tool, jHiccup, that is claimed to “measure the pauses and stalls (or “hiccups”) associated with an application’s underlying Java runtime platform…[it] captures the aggregate effects of the Java Virtual Machine (JVM), operating system, hypervisor (if used) and hardware on application stalls and response time”. Read more