Skip to content

Google Engineering on Performance Variability – The Tail at Scale

jinspired.logoGoogle has published a fascinating article titled The Tail at Scale on the ACM website. In the article senior engineers detail common causes of performance variability in large scale and wide fan-out web service based systems, as well as offering various techniques to mitigate such variability at different levels and scales. Here is an excerpt from the introduction.

“Here, we outline some common causes for high-latency episodes in large online services and describe techniques that reduce their severity or mitigate their effect on whole-system performance. In many cases, tail-tolerant techniques can take advantage of resources already deployed to achieve fault-tolerance, resulting in low additional overhead. We explore how these techniques allow system utilization to be driven higher without lengthening the latency tail, thus avoiding wasteful over-provisioning.”

After reading the article you might be surprised that there is no mention of application performance monitoring. The reason for this is that performance monitoring is not viewed as some distinct and separate activity but an integral part of the service, its architecture and that of its many dependent services. In fact performance monitoring drives many of the techniques outlined in the article under headings classified by Google as “adaptations”. Sound familiar?

One of the techniques used by Google in reducing performance variability is service classification and the prioritization of different classified service requests through queueing. These are terms common to network call traffic management but here used in the context of application service requests.

Differentiating service classes and higher-level queuing. Differentiated service classes can be used to prefer scheduling requests for which a user is waiting over non-interactive requests. Keep low-level queues short so higher-level policies take effect more quickly; for example, the storage servers in Google’s cluster-level file-system software keep few operations outstanding in the operating system’s disk queue, instead maintaining their own priority queues of pending disk requests.”

This is pretty much what is offered by the Quality of Service (QoS) for Apps technology included as an optional metering extension to JXInsight/OpenCore. Our QoS technology goes beyond this augmenting QoS defined virtual resources with adaptive control valves, pre-reservation, quotas, barriers, latches as well as rate limiting which is another technique advocated by Google.

Managing background activities and synchronized disruption. Background tasks can create significant CPU, disk, or network load; examples are log compaction in log-oriented storage systems and garbage-collector activity in garbage-collected languages. A combination of throttling, breaking down heavyweight operations into smaller operations, and triggering such operations at times of lower overall load is often able to reduce the effect of background activities on interactive request latency.”

Clearly the future of service management involves influencing service runtime behavior via (the late binding and dynamic injection of) system dynamics.

What is worth noting is that the software services Google engineers are self aware in the observational sense. They measure. They relate. They regulate. This is a similar view shared in our vision of the New APM.

Latency-induced probation. By observing the latency distribution of responses from the various machines in the system, intermediate servers sometimes detect situations where the system performs better by excluding a particularly slow machine, or putting it on probation.”

Two in-flight request performance optimization techniques not mentioned in the article but which have been covered extensively here on our site are QoS performance credits for conversational interactions (web sessions) and just ahead prediction.


For a more macro-level (architectural) level approach to this we should also reconsider our current approach to Web Open API design, moving away from coarse grain, but still highly frequent, remote procedure calls (interaction approach) to services that act more as platforms, allowing us to teleport code into some local execution container for more efficient and optimal data acquisition and processing – PaaS Everywhere. There are many adaptations to choose from and it is probably best to let the software itself experiment, select, refine and revise these as it learns and observes its execution environment. Signals facilitates this cross boundary learning providing a historical condense memorization of relevant behavior and advice.


We can probably expect a flurry of interest in tail tolerant techniques over the coming year following publication of this article. For such techniques to be successful though there must be more dynamic runtime adaptation and self regulation, and that is where our ground breaking Signals technology could play an important role even if at a conceptual design level for other languages and runtimes that offer self adaptive capabilities.


Further Reading