Using System Dynamics for Effective Concurrency & Consumption Control of Code
When we started out on the design of our Quality of Service (QoS) for Apps technology we were for the most part unaware of the field of system dynamics though we had come across references to it during our research into self aware, adaptive and self regulated software especially in the context of complex adaptive systems and emergent computing. So when we did finally get time to read Thinking in Systems we were pleasantly surprised to find such strong similarities in our thinking and resulting design for software system control and the model and approach used in system dynamics.
If you have not already read Thinking in Systems then please do so immediately as it’s going to play a critical part in the engineering of software and services in the cloud and beyond!!! In the meantime here is a definition from Wikipedia.
“System dynamics is an approach to understanding the behaviour of complex systems over time. It deals with internal feedback loops and time delays that affect the behaviour of the entire system. What makes using system dynamics different from other approaches to studying complex systems is the use of feedback loops and stocks and flows“.
In system dynamics a system consists of three kinds of things: elements, interconnections, and a function or purpose. In terms of modeling, elements are represented by stocks and interconnections by flows to and from stocks as well as feedback loops relaying signals and sensory measurements. Stocks are the foundation of the system. They are things that can be counted or measured. The value or quantity of a stock changes over time through the actions of a flow. Flows drain and fill stocks. In system dynamics feedback loops are seen as the overriding mechanism that controls the accumulation and depletion of stocks in a system that exhibits a consistent pattern of behavior over time. Such feedback loops cause changes in the flows into or out of a stock, based on changes in the stock itself.
“Feedback is a process in which information about the past or the present influences the same phenomenon in the present or future. As part of a chain of cause-and-effect that forms a circuit or loop, the event is said to “feed back” into itself.” – Wikipedia
Here is an example of such a model applied to populations. The reinforcing loop causes the population to grow (more people => more births) and a balancing loop causing it to shrink (more people => more deaths). If both the fertility rate and death rate remain constant the behavior is simple (and predictable).
Leaving aside feedback loops for now this looks very similar to the underlying reservation based mechanism that powers our QoS for Apps technology. Stocks in system dynamics become QoS Resources. Flows in system dynamics become QoS Services which on entering a method enhanced with QoS capabilities reserve (outflow) units from the resources pool (stock) and then release (inflow) such units back into the resource pool on completion (exit) of the method.
The ordering within the graphic is not natural in terms of how most would envisage a reservation process so another way is to imagine flows between stocks. The reservation depletes the resource stock resulting in an inflow into the reservation stock held by the service during its execution. Control is built into this model because generally there is an initial or fixed capacity defined for a resource and if there is no stock in the resource then no outflow will occur which means no execution of the method mapped to this outflow via service classification.
Applying this to thread concurrency control we would have a “pool” resource representing the maximum thread population allowed for a particular system (defined by flows/services and stocks/resources) and another resource representing the number of threads executing with reservations held on the “pool” resource.
Mapping such control dynamics to an internal system within an identifiable (by namespace) software package is as easy as adding the following system properties to a
jxinsight.override.config file limiting execution concurrency to 10 threads for all methods in the
com.acme.sysone package and its sub-packages.
j.s.p is a short hand version of
jxinsight.server.probes recognized by our measurement runtimes.
This is achieved without any code changes. The same code dynamically weaved into the application codebase at load time for the purpose of resource (gc, wall clock, cpu) consumption metering is used also to control concurrency of the code. This is how concurrency control can be achieved without resorting to changing an applications entire architecture to conform explicitly to some message/packet queueing paradigm and model. Calls in our model already embody messages and queues though granted such things are transient and the delay temporary (for liveliness reasons).
This approach to code level currency control is extremely powerful. Lets say you wanted to limit the number of concurrent executing requests at various executing layers in an application stack such as web, service and db. This can be done using a locally defined resource,
l, that is not shared and visible only to each service.
The above will restrict the concurrency to 20 at the web layer, 15 at the services layer and 10 at the db layer but it does not restrict the global concurrency to 20 as a thread could call into the services layer without going to the web layer or call into the db layer without going through either of the other two layers. If you really need to limit concurrency globally across these services you can introduce a new shared resource (stock),
g, into the configuration (model).
Note: You can’t change the local pool resource to a global resource to achieve such control as reservation obtained at the web layer will be (re)used in the reservation negotiation at the service and database layer.
Again this unprecedented level of fine grain control is achieved without any code changes and more importantly it can be configured externally to the application and tuned at deployment time for a particular environment or workload. It is also far easier to understand than the actual underlying concurrency code that would be needed to achieve the same effect. It is far more productive in that control mechanisms can be very easily experimented (with iteratively), only requiring a restart of the process under control.
Video: Checkout this screen recording demonstrating our “NO CODE CHANGES NEEDED FOR CONCURRENCY CONTROL” claims.
Note: In theory this configuration could be imported into a simulation tool and the dynamics of the system observed without actually running the code (assuming the presence and production of ingress flows).
What you are actually doing is taking a dynamic system defined externally then injecting it and its dynamics into the software, merging normal execution processing with resource control (and feedback loops). A wonderful unification of abstractions in system dynamics and concurrency control. This is what DevOps should be more about rather than automated deployment of packages.
The approach is incredibly versatile for example lets say you wanted to limit concurrency at the db layer for code that does not call (flow) through the web layer.
Looking at the above configuration you might think that you have actually limited the overall concurrency at the db layer to 10 (threads) but remember reservations are held until the service completes its execution. So when a call comes into the web layer it reserves a single unit (if one is available if not it waits) from the
g resource then when it arrives at the db layer it uses this unit to proceed immediately without any further delay as the default reservation method is to calculate the required units as 1.
capacity that has been redefined for the
g resource within the
d service is only applicable when additional units are required such as when a thread attempts to call into the db layer code base without having already obtained a reservation from the
g resource at the web layer. It is important to note that this capacity restriction is in addition to the capacity restriction within the underlying resource so even if 10 threads not originating from the web layer called into the db layer they could still be blocked (not indefinitely) because of existing outstanding reservations at the web layer. The web layer,
w service, is in effect able to over subscribe at the db layer on the
Up to now attention has been on concurrency control but this model is well suited to the control of resource consumption which is not to say that consumption can be reduced but that the rate of consumption within a time window can be reduced using system dynamics defined using our QoS for Apps technology. Which brings up an important aspect of system dynamics – delay.
“Delays are ubiquitous in systems. Every stock is a delay…A stock takes time to change, because flows take time to flow…Stocks usually change slowly. They can act as delays, lags, buffers, ballast, and sources of momentum in a system.” – Thinking in Systems
We like to say that you don’t manage cost you manage the cause of such cost. In the case of software and resource consumption that cause is the code execution (and its caller chain). But as already stated above the goal is not to change what is consumed but the rate at which it is consumed.
“Rate limiting is used to control the rate of traffic sent or received on a network interface. Traffic that is less than or equal to the specified rate is sent, whereas traffic that exceeds the rate is dropped or delayed.” – Wikipedia
This can be modeled in system dynamics as another inflow refilling a stock (resource) at regular fixed intervals up to a desired and maximum level.
“Renewable resources are flow-limited. They can support extraction or harvest indefinitely, but only at a finite flow rate equal to their regeneration rate.” – Thinking in Systems
Returning to the web layer example the maximum number (lets say 2,000) of requests (throughput) allowed to execute in 1 second intervals (1,000 ms) can be configured as follows:
The degree of concurrency can still be controlled by simply adding a capacity specification to the
In the configuration examples above the dynamics of the flow and stock system were simplified in assuming that all services consume (reserve) and replenish (release) only a single unit of a resources capacity. This default reservation behavior can be changed from “
one” to “
meter” or “
inc” or “
lease” or “
timer” giving us much more variability in the flow rate across different services or mapped methods as in the case of “meter” or “timer” which track the resource profile of the probes to determine the required reservation units. This can result in much more interesting and useful feedback loops as demonstrated in the article titled “Optimal Application Performance and Capacity Management via QoS for Apps”.
Video: In this screen recording rate limiting is used to delay the frequency of a calls execution using multiple millisecond intervals.
There are some important differences in our approach that are specific to software systems. The first one is with regard to timed boxed delay. Each services can have its own timeout value which is the maximum delay that can be introduced by the injected dynamics in making a reservation. If this time expires then the code proceeds as normal skipping over the resource release phase in its enhanced execution. Another is reservation prioritization which allows for some services to jump ahead of the queue in the making of (or waiting on) resource reservations under contention across threads which we support via a
priority.level setting at the service level and reservation lanes at the resource level.
Optimal Application Performance and Capacity Management via QoS for Apps
Fairer Web Page Servicing with QoS Performance Credits
Dynamic QoS Prioritization
Introduction to QoS for Applications