How (not) to design a Metrics API – Part 4: Naming and Names
This is part four in a series of articles on how best to design an application metrics monitoring library, in particular its API, providing versatility both in terms of application of the library across domains & environments and in its implementation by one or more vendors. Please read part 1, part 2 and part 3 if you have not already done so.
Following on from domain modeling the hardest task in designing an API is in the naming of classes and methods. From our own experience this activity consumes an inordinate amount of the time though it is probably an area that needs far more attention (especially in engineering books) than it is given today considering its importance in the formation of a users conceptual/mental model of an API and in the communication of usage and intent as well as lessening of ambiguity. Time spent here is not only beneficial for users of an API but the implementors of such an API. One good rule of thumb is consistency.
Whilst there are many slightly different ways to measure and manage application performance we have found many common and similar concepts across such different techniques and approaches, in doing so we have reused interface names (and some in cases cloned implementations). Here is table listing some (but not all) of the interfaces common to both the Metrics Open and Probes API in JXInsight/OpenCore.
The commonality in naming of classes continues to extend to newer technologies we are developing to address sub-microsecond software performance execution analysis.
We are not simply reusing names but also the associated structural and behavioral patterns but in a slightly different observation and measurement context giving the underlying implementation the ultimate flexibility whilst still affording some limited form of internal reuse even if just at the conceptual level.
If a user is familiar with one of our Open API’s then they will feel very much at home with the other API’s though type wise they are different as all interfaces are defined as inner classes with each of our main entry point classes:
Anyone that has worked with the
javax.management API knows all too well that if you don’t get the means to identify (lookup) metrics right everything else becomes laborious, unusable, and error prone. Whilst we understand the thinking that went into the
javax.management.ObjectName class we believe it is one of its more fundamental flaws and its implementation has never being able to adequately compensate for the design decision to use an unordered map of name-value pairs strings as an means of identification. The amount of engineering that has gone into maintaining such, an otherwise simple class (if not for the parsing and mapping), is extraordinary especially in light of the huge execution cost in constructing names and the resulting memory footprint which today still is still flawed in that string representations of the name-value pair map are interned with maps even created when a string is passed into the constructor. JMX is a design disaster in a local client context though admittedly its original design was driven by the needs of legacy system management vendors for the purpose of remote management component lookup. One could very easily argue that it is not exactly a Metrics API though it does have a
GaugeMonitor interface resulting in many libraries perceiving this to be the case and using it as the basis for their own designs and implementations (see alternatives below). It is because of this that today Java still does not ship with a well designed and scalable metrics measurement runtime.
In the design of the
Name interface in our Metrics API (as well as Probes and Signals) we have opted for a hierarchical namespace approach with
Name instances representing an ordered sequence of components similar to the
javax.naming.Name interface (though immutable) with ‘.’ acting as the delimiter in a string form.
Here are the few methods in the
Metrics.Name instances are constructed using one of the following static utility methods in the
Metrics class or by calling
name(String) on an existing instance returning an extended
Metrics.Name being an interface and all construction of such are under the control of the underlying Metrics SPI implementation, instances of
Metrics.Name can be compared using reference equality. We have found this allows for performance optimizations in the caller code space as well as in the underlying implementation which can use this for efficient metric lookup and memory management (see benchmarks below).
We have also found this to be extremely useful in passing meta information, labels, from metric extensions hidden below the API surface up to client callers especially when such extensions automatically create metrics themselves from base registered metrics in the event of some condition or operation (tagging).
Note: Name labels are implemented as bit masks hence the reason to look by string name first. This is much more efficient than arbitrary tag sets.
You will notice that
Name instances do not contain environment specific values. Such information is best located and accessible from an
Context interface both of which available in our Probes API as well as our Signals API.
Metrics class offers utility methods to create
Name instances from
Class references, they are not in anyway tied to a particular source, such as a
Class, or have a fixed number of composite parts.
This metrics library expects that all metrics can be identified by 4 string parameters (
scope) passed into a
MetricName constructor, which are then concatenated to form an identifier referred to as “
mBeanName“. This is the same design flaw highlighted above in the JMX API (which is hinted at in the naming) and inherited possibly due to an underlying dependency on JMX to expose data to remote clients though the author does state in his documentation that the JMX RPC API is “bonkers and fragile”.
mBeanName field which is accessible from the
MetricName class via the
getBeanName() method is used to implement (via call delegation) the
equals methods for the
MetricName class. It is possible to pass in a fifth constructor parameter, the value for the
mBeanName field, which could have an entirely different value for the
scope attributes within its string representation (a decision that invites trouble). The limits imposed and the dependencies introduced are completely unnecessary.
If the fifth parameter were not to be passed in then the
mBeanName field would take the value “
group:type=type,scope=scope,name=name“. This is the basic syntax of a JMX
To create a metric name in this library you must use
MonitorContext is actually the metric name itself holding two fields:
tags. Both fields are used to uniquely identify a metric though a metric in this library is referred to as a
Monitor similar to JMX. The similarities with JMX don’t end there as the
tags field is actually the JMX
ObjectName attribute map revisited as such its construction is both expensive in terms of execution & allocation cost as well as footprint. There is actually a
Metric class but that is in fact a metric sample.
Here is another version with
counter tags added.
Creation and lookup of measures and metrics needs to be implemented very efficiently in a metrics API because of possible high frequency of such activity. Here are results from a quick benchmark test comparing the JXInsight/OpenCore metrics runtime with the Jammer metrics library.
Note: We did try run the NetFlicks metrics library with our tests but it kept crashing with OutOfMemory errors and efforts to circumvent this only resulted in other exceptions being thrown related to duplication counter registration which could not be avoided entirely because the API does not offer the means to test whether a metric has already being registered.
Here is the Jammer code snippet benchmarked.
Here is the JXInsight/OpenCore code snippet benchmarked.
With static imports on the
Metrics class the above code can be simplified.
Considering there is not much to actually creating and looking up a counter in a local context the performance difference is astonishing in executing 1,000,000 loops – OpenCore is approximately 13 times faster.
Note: For a lookup of a
Counter without creating the
Name but instead using an exiting reference held in a static field the call cost for JXInsight/OpenCore drops down to low single digit nanosecond timings (naturally with the reference being already in cpu memory cache).
One of the reasons behind this series of articles on Metrics API design is the belief that the art of good API design is being lost due to a number of growing trends in the industry: (1) a preference for open source over standards even when standards allow both commercial and open source implementation (there are positives and negatives of both approaches), (2) growing willingness of developers to accept a single implementation dependency, (3) a shift in focus from local (client) API design to remote (service) API design, (4) the need to get things done (and updated) today in an ever changing environment reflecting largely current usage, needs and demands (inline with growing uncertainty and unpredictability), and (5) the sidelining of any sort of technical due diligence in favor of “if it is good enough for X (high profile cloud/web service) and it’s free then it is probably good enough for us” based on the assumption that those successful and scalable companies able to keep services from falling over must also be good at engineering the underlying software when in fact it is more likely the heroics of a few persons in operations in daily firefights.
Developing an ad hoc solution that meets current usage and scale within a particular context is not the same as designing and developing an API (and reference implementation) that meets not just current but future needs and scales, and in ways that are not completely known or fully understood at the time of creation for a much more diverse user/consumer group. API designers have to always leave sufficient flexibility, versatility, capacity and adaptability in their design to allow innovation to foster above the API as well as below at the implementation level and in different contexts and environments. Good designers know when to procrastinate in the present so as not to constrain productivity at a later time when more information is available and other enhancements in technology have occurred.