How (not) to design a Metrics API – Part 3: Groups, Collections & Samples
This is part three in a series of articles on how best to design an application metrics monitoring library, in particular its API, providing versatility both in terms of application of the library across domains & environments and in its implementation by one or more vendors. Please read part 1 and part 2 if you have not already done so.
In part 1 we introduced the concept of an information model (the set of all possible measurements) and a management model (the set of measures used for management purposes) showing how creation of a
Measure was distinct from its registration as a
Metric within the JXInsight/OpenCore Metrics Open API. One benefit cited in having these model distinctions was that it allowed us to
register the same
Measure under different
Name instances which can be useful in supporting legacy reporting clients and plugins.
Two common problems that arise in registering multiple measurements that obtain their values from an external component or resource are access cost and value integrity. To allow amortization of a single (possibly expensive) access across multiple related measurement samplings a
Group interface is available in the Metrics Open API containing just a single method,
prepare, that will be invoked before any associated measurements are sampled in a collection cycle.
The Metrics API provides an overloaded
register method which takes a
Group parameter to make this association.
Group also ensures value integrity in that for a single collection cycle the same
UsageMemory instance is used. Without this facility it is possible to have heap measurement samples that as a whole are inconsistent such as the
total which is the case for the two alternative libraries at the bottom of this article.
Of course this only works if access to the registered
Measure beyond the scope of the registration code is prevented, which is the case for the Metrics API. Even with a reference to a
Metric instance looked up using a
Name instance it is not possible to access the underlying
Measure or indirectly cause the
Measure to provide an immediate value unless the
Measure itself is a
Gauge accessible via the
Metrics.gauge(Name) methods. But it is possible to access summary information related to measurement samplings that have been performed up to that point for the
How and when collections are performed is left to the underlying implementation of the Metrics SPI though it is expected that most implementations will use one or more polling threads to perform a single sample collection cycle across all registered metrics at fixed intervals with the contractual constraint that all
Group instances have their
prepare method invoked once in a cycle before their associated measurements are sampled.
Note: JXInsight/OpenCore by default uses one polling thread initially. This polling is automatic with system properties used to externally configure its behavior in terms of collection interval and history management.
The Metrics API does allow access to recently created
Collection instances and the
Sample instances contained within such. Memory management and persistence storage concerns are left to the Metrics SPI implementation. We expect implementations to offer integrations with external system management solutions where such collections can be analyzed and archived.
Note: JXInsight/OpenCore offers a number of integration extensions and plugins that offer this capability – all are configuration driven.
It is also possible to add an event
Listener to obtain access to the
Collection instances before and after a cycle occurs which can be useful in performing global preparation work as well as dispatching (replicating) data to an integration without needing to implement the entire Metrics SPI interface instead using the reference implementation.
Here is a visualized representation of the
Sample instances in the JXInsight/OpenCore management with consecutive sample measurement values omitted.
This metrics library on Github has no support for grouping needed for access cost reduction and value integrity guarantees though it does introduce a somewhat strange signature,
processWith, in its
Metric interface which is implemented by both the
Gauge classes to support hardwired visitor pattern based pollers. This method has no place in a
Metric interface that is extended by measures such as
Gauge. It most certainly should never be visible and callable from a client of a such a
Metric. The whole metric polling support seems to have been an afterthought in the design of this library.
It has no explicit support for collections and samples as actual elements of its domain model.
This metrics library on Github has very similar failings. No group support needed for access cost reduction and value integrity guarantees and its
MetricPoller interface implemented by various incomplete or unused classes that need to be explicitly instantiated and scheduled outside of the API itself. Confusingly the single method in this interface,
poll, returns a list of
Metric instances when in fact what is collected is a set of measurement samples.
In summary a well designed Metrics library must have facilitates that eliminate access cost as much as possible, ensure value integrity across interrelated measurements and have a well defined contract in terms of the actual metric collection that increases thread safety and eliminate as much as work possible related to shared access. It should also provide summary metric information but still prevent the underlying measuring executing a measurement that is not under the direct control of the library.