Skip to content

How (not) to design a Metrics API – Part 3: Groups, Collections & Samples

This is part three in a series of articles on how best to design an application metrics monitoring library, in particular its API, providing versatility both in terms of application of the library across domains & environments and in its implementation by one or more vendors. Please read part 1 and part 2 if you have not already done so.

In part 1 we introduced the concept of an information model (the set of all possible measurements) and a management model (the set of measures used for management purposes) showing how creation of a Measure was distinct from its registration as a Metric within the JXInsight/OpenCore Metrics Open API. One benefit cited in having these model distinctions was that it allowed us to register the same Measure under different Name instances which can be useful in supporting legacy reporting clients and plugins.

Two common problems that arise in registering multiple measurements that obtain their values from an external component or resource are access cost and value integrity. To allow amortization of a single (possibly expensive) access across multiple related measurement samplings a Group interface is available in the Metrics Open API containing just a single method, prepare, that will be invoked before any associated measurements are sampled in a collection cycle.

The Metrics API provides an overloaded register method which takes a Group parameter to make this association.

The Group also ensures value integrity in that for a single collection cycle the same UsageMemory instance is used. Without this facility it is possible to have heap measurement samples that as a whole are inconsistent such as the used + free != total which is the case for the two alternative libraries at the bottom of this article.

Of course this only works if access to the registered Measure beyond the scope of the registration code is prevented, which is the case for the Metrics API. Even with a reference to a Metric instance looked up using a Name instance it is not possible to access the underlying Measure or indirectly cause the Measure to provide an immediate value unless the Measure itself is a Counter and Gauge accessible via the Metrics.counter(Name) and Metrics.gauge(Name) methods. But it is possible to access summary information related to measurement samplings that have been performed up to that point for the Metric.

How and when collections are performed is left to the underlying implementation of the Metrics SPI though it is expected that most implementations will use one or more polling threads to perform a single sample collection cycle across all registered metrics at fixed intervals with the contractual constraint that all Group instances have their prepare method invoked once in a cycle before their associated measurements are sampled.

Note: JXInsight/OpenCore by default uses one polling thread initially. This polling is automatic with system properties used to externally configure its behavior in terms of collection interval and history management.

The Metrics API does allow access to recently created Collection instances and the Sample instances contained within such. Memory management and persistence storage concerns are left to the Metrics SPI implementation. We expect implementations to offer integrations with external system management solutions where such collections can be analyzed and archived.

Note: JXInsight/OpenCore offers a number of integration extensions and plugins that offer this capability – all are configuration driven.

It is also possible to add an event Listener to obtain access to the Collection instances before and after a cycle occurs which can be useful in performing global preparation work as well as dispatching (replicating) data to an integration without needing to implement the entire Metrics SPI interface instead using the reference implementation.

Here is a visualized representation of the Collection and Sample instances in the JXInsight/OpenCore management with consecutive sample measurement values omitted.

Jammer

This metrics library on Github has no support for grouping needed for access cost reduction and value integrity guarantees though it does introduce a somewhat strange signature, processWith, in its Metric interface which is implemented by both the Counter and Gauge classes to support hardwired visitor pattern based pollers. This method has no place in a Metric interface that is extended by measures such as Counter and Gauge. It most certainly should never be visible and callable from a client of a such a Metric. The whole metric polling support seems to have been an afterthought in the design of this library.

It has no explicit support for collections and samples as actual elements of its domain model.

NetFlicks

This metrics library on Github has very similar failings. No group support needed for access cost reduction and value integrity guarantees and its MetricPoller interface implemented by various incomplete or unused classes that need to be explicitly instantiated and scheduled outside of the API itself. Confusingly the single method in this interface, poll, returns a list of Metric instances when in fact what is collected is a set of measurement samples.

In summary a well designed Metrics library must have facilitates that eliminate access cost as much as possible, ensure value integrity across interrelated measurements and have a well defined contract in terms of the actual metric collection that increases thread safety and eliminate as much as work possible related to shared access. It should also provide summary metric information but still prevent the underlying measuring executing a measurement that is not under the direct control of the library.

Part 4: Naming and Names