Simz – The Promise of Near Real-Time Application Simulation in the Cloud
I was recently asked to write an article highlighting some of the reasons why a Platform as a Service (PaaS) solution needs simulation technology like Simz. My first thought was why limit the article to PaaS when the technology underlying Simz has far wider implications for application management and monitoring? Yes the actual application/activity simulation engine is written in Java and runs in a JVM, but the binary metering feed protocol can be implemented in other languages enabling you to simulate in near real-time execution across different services implemented in different languages and executed in different runtimes on different platforms.
In our Simz 1.0 product announcement we listed 7 key features simulated, simplified, singular, speed, standardized, secure and scalable. But how do these relate to the cloud and why are they so critical in the long run? Hopefully the following will help explain their importance with a little bit more context.
Capacity and Scaling
When we talk of “The Cloud” we immediately think about unlimited computing power and the elasticity of consumption (or reservation) of such power. The cloud makes it very easy to reserve and release capacity and in turn scale up and scale down the consumption of such capacity. In the case of computing offered by IaaS there is still a need to deal with VM instances and operating processes. But with PaaS such things move into the background, becoming nearly invisible to the management of the application, except when things go wrong which I will come back to later.
Auto scaling forces us to see the application as distinct from its instantiation, execution and (data) flows in the cloud. In theory an application can exist in the cloud but yet have no capacity reserved or assigned. This is the “nothing” within the cloud universe and it presents a problem when we consider application management. How do we know of this nothing without creating something to tell us. And how can we distinguish one form of nothing from another and relate the causes of such happenstances.
“All things appear and disappear because of the concurrence of causes and conditions. Nothing ever exists entirely alone” - Buddhism
Clearly the application needs to exist independent of its many instances, which are merely vehicles for execution and consumption. But what is an application when it has no running processes? We would still like to view it like it was running (living), in some form, even when it is doing nothing, owning no resources and holding no space. This is where Simz comes in. Simz is the application. It is not one particular application process, it is all of the processes that have been created as well been destroyed. It offers a continuity beyond the lifecycle of any process. More importantly Simz offers a succinct memorization of past behavior of an application via the metering model, which includes activities (probes) and resources (meters). Simz is not some place holder in a cloud management dashboard used for grouping resources or a record in a database or an entry in some log. It is the application, when there is something happening, and when there is nothing happening. It always looks as one whole even though it represents many moving parts including nodes, processes, threads and activities.
Observation and Monitoring
Cloud platforms aim to simplify management of the many moving parts within an application through automated processes driven by policies. But when things go wrong those responsible for the management of the application are forced to tear away the veil to see the moving parts in a way that is not efficient or entirely transparent. For example when a user, or a group of users, report a performance problem with a particular service entry point, such as a hung request, there is no easy way to identify which actual instance might be in the process of servicing that particular request (in order to collect diagnostics for development). Operations are invariably forced to perform diagnostics on each individual application runtime though it might not actually be a problem within the application tier. If you have 100s of application runtimes this becomes an impossible task and there is no guarantee that the process will still exist by the time you have iterated through all processes…remember many performance monitoring agents don’t collect and report data until after, and in some cases long after, the incident has occurred (if they have even detected it). To navigate to the application activities executed by threads we must first navigate each process when what we really want to do is go from activity (service operation) to individual processes. We want to identify those activities which are delayed and then inspect the associated container or at least be able to determine which other activities are associated with the same container and assess whether they too are also impacted. Yes the very model and access pattern that Simz offers.
“If you change the way you look at things, the things you look at change.” - Wayne Dyer
Isolation and Introspection
Another issue with application monitoring is that many providers don’t want to (or simply can’t) allow customers to poke around within application runtimes they have automatically created to deliver the service level they have agreed. This is a very big sore point with PaaS. To gain agility you give up visibility and increase risk. But this need not be the case. Here again Simz shines because it gives the provider and customers what they both need. Providers get the runtime isolation for the actual application runtimes and customers get an unified simulated application runtime to probe for behavior and resource consumption anomalies. Better still, Simz makes it look as if there is only one application runtime. Changes can still take place within this runtime even as the customer interacts. And if that does not already sound wonderful, Simz can expose the most important aspects of the underlying application behavioral model not only to the customer but to the platform provider which they can use to improve the predictability of their own capacity management, load balancing and runtime provisioning without them having to gain access to the runtime itself and the state within it which holds sensitive application and user state information.
Record and Recall
Coming back to the dynamism in the cloud, a challenge for anyone managing applications is in perceiving, understanding and relating (movement) changes that occur in the environment (capacity, runtimes) and the activities (threads, flows, code) performed in such context; changes that can occur very rapidly and simultaneously at least from our viewpoint. Anyone who has studied dance knows all too well that one must first watch a movement many times, mentally or digitally recording and replaying, then mapping out each step and movement direction (using a notation such as Labanotation) before one begins to understand it, sense it and reproduce it. We learn very effectively by observing things play-out and unfold in front of our very own eyes. Recording and replaying gives us the opportunity to not only observe a sequence of events post their occurrence but process them at the speeds best suited to our cognitive capabilities. Simz being a discrete event simulation engine means it can perform the function of a recorder as well as a replayer. Simz is a time machine with time dilation capabilities.
“Time stays long enough for those who use it.” – Leonardo Da Vinci
Analytics and Integration
Do backend service integrations really need to be performed in real-time and within the critical request processing path? Maybe not. With Simz analytics and many service integrations can be isolated completely from the actual application runtime. No code changes with the application. Instead integration plugins are loaded into the Simz environment and these need not be in real-time but replayed out via a recording. And because the same metering model and Open API is available in both the application runtime and the simulated runtime we can move the integration back and forth between these runtimes without change.