Future Research Directions in Software Engineering of Self-Adaptive Systems
Over the last few weeks I have been on a research sabbatical exploring future directions in the engineering of self managed and self organizing software systems via control, adaptation and regulation. In the course of my research I came across the following academic paper that outlines a “second” research roadmap for self-adaptive systems which I found extremely interesting and insightful.
Software Engineering for Self-Adaptive Systems: A Second Research Roadmap
Abstract. The goal of this roadmap paper is to summarize the state-of-the-art and identify research challenges when developing, deploying and managing self-adaptive software systems. Instead of dealing with a wide range of topics associated with the field, we focus on four essential topics of self-adaptation: design space for self-adaptive solutions, software engineering processes for self-adaptive systems, from centralized to decentralized control, and practical run-time verification & validation for self-adaptive systems. For each topic, we present an overview, suggest future directions, and focus on selected challenges.
“In addition to the ever increasing complexity, software systems must become more versatile, flexible, resilient, dependable, energy-efficient, recoverable, customizable, configurable, and self-optimizing by adapting to changes that may occur in their operational contexts, environments and system requirements. Therefore, self-adaptation – systems that are able to modify their behavior and/or structure in response to their perception of the environment and the system itself, and their goals – has become an important research topic in many diverse application areas.”
What I found intriguing about the above statement is the use of the word perception which indicates a form of self awareness in software. This is something we have written about under the research category of self observing software.
The word also hints at the possibility that what is sensed maybe not be entirely real or reflect current reality in terms of time and space, which is an idea we explored in Simulation and Time Synchronization of Application Memories as well in Going Beyond Actor Process Supervision with Simulation and Signaling. This last article includes the following statement:
In this context you can think of Simz as the “brain in a vat”. Its perception of reality is via the metering feed. Here the worker takes the role of the body of which we can have many over time.
“To explain the concepts, we separate self-adaptive systems into two elements: the Managed System, which is responsible for the system’s main functionality, and the Adaptation System, which is responsible for altering the Managed System as appropriate.”
We consider this separation to be extremely important at the design and developing, allowing some degree of deferred decision making around how best to manage via various adaptation and regulation techniques. That said customers today are looking to have both the managed system and the adaptation system pre-packaged for ease of deployment and operation. This is a need we are currently preparing to address in the very near future via a new “autoletics” entity that will offer augmented distributions of popular open source software such as Apache Tomcat and Apache Cassandra, that are self-managed and self-optimized by our adaptive technologies including intelligent activity metering (IAM), QoS for Apps, adaptive control in execution (ACE), Signals as well as Simz in a partially distributed or centralized controlled context.
“A key design decision about self-adaptive systems is “what information will the system observe?” In particular, “what information about the external environment and about the system itself will need to be measured or estimated?” To make these measurements, the system will need a set of sensors; these determine what the system can observe.”
Observation is paramount to the success of any self-adaptive software initiative which is why the first piece of the puzzle we designed and delivered was a truly innovative approach to performance measurement and behavioral analysis based on an adaptive (intelligent) activity metering engine that cleanly separates and decouples instrumentation, measurement and collection for the purpose of online (dynamic) adaptation driven by cost-benefit analysis.
“Given a way to observe, there are two important decisions that relate to timing: “what triggers observation?” and “what triggers adaptation?” The system could be continuously observing or observation could be triggered by an external event, a timer, an inference from a previous observation, deviation or error from expected behavior, etc.”
From our years of experimentation we have come to the conclusion that both observation and adaptation must be performed online, continuously and (inline) within the execution context to ensure shorter reaction times as well as more accurate impact assessment of adaptations related to control. This is how both our QoS for Apps and Adaptive Control Valve technologies work. With Signals the approach whilst still inline does allow the adaptation and its signals to flow upwards and outwards based on scoped boundaries which represent the point of adaptation (or influence in the case of inbound signals on boundary entry).
“Feedback loops play an integral role in adaptation decisions. Thus, key decisions about a self-adaptive system’s control are: “what control loops are involved?” and “how do those control loops interact?”.”
Feedback is what drives our intelligent activity metering (IAM). It is also the primary mechanism used within our QoS for Apps and Adaptive Control Valves technologies. In Signals we offer feed-forwarding capabilities.
“Some forms of control can be explicitly expressed in the design, whereas other forms are emergent. It is also possible to create hybrid explicit-implicit self-adaptive systems.”
We are often asked why we offer both QoS resource (reservation) management and adaptive valves as control mechanisms. The reason is that traditionally QoS is defined much more explicitly and restrictively, especially in its policy specification and enforcement, whereas adaptive control valves are far more flexible and goal driven. Because it is possible to have multiple adaptive valves installed within a single managed system, some competing (in a manner of sorts) with each other, the overall system behavior is emergent and less predictable as well as (transient) temporal, though we expect it to be optimal with regard to the environment, system and workload. Generally an adaptive management system will employ both explicit and implicit approaches which is why both of our primary control technologies work side by side and integrate as well.
“In selecting the adaptation mechanisms, it is important to consider the states or events that can trigger adaptation. Examples triggers include not satisfying the adaptation targets that relate to non-functional requirements (e.g., response time, throughput, energy consumption, reliability, fault tolerance), behavior, undesirable events, state maintenance, anticipated change, and unanticipated change.”
Our adaptive control valves automatically trigger adaptations of their own internal control structures and policies, which influence the behavior of the processing that flows through them, looking to optimize response time or throughput as well as improving reliability.
“A self-adaptive software system operates in a highly dynamic world and must adjust its behavior automatically in response to changes in its environment, in its goals, or in the system itself. This vision requires shifting the human role from operational to strategic…the vast number of systems makes the operations task too complex for a single centralized machine or a system operator. One answer to these advances is to instrument software systems with managing systems that make them more autonomous. This autonomy means that systems take over some of the responsibilities previously performed by other roles in the software lifecycle, such as sensing failures and automatically recovering from them.”“
This is very much in line with our own thoughts on complexity of application management in the cloud, which we outlined in the article The Complexity and Challenges of IT Management within the Cloud. The article closes with the following line:
Its time to let the machines (at least the software) self regulate themselves inline with policies and priorities we set.
“In traditional maintenance, the failure report is analyzed by developers while in a self-adaptive software system, the managing subsystem analyzes the failure to find alternative workarounds.”
Our metering technology with its Open API allows both the managed software system and the managing software system to go beyond basic runtime state introspection with runtime behavioral reflection. This can ability can enable software to self diagnose behavioral anomalies as well as improve performance via behavioral predictions.
“In a decentralized system there is no single component that has the complete system state information, and the processes make adaptation decisions based only on local information…control in a self-adaptive software system can be centralized or decentralized, independent of whether the managed software is distributed…Control systems with local information scale well in terms of size, and also regarding performance as the collection of information and control implementation are local.”
This is very much aligned with the last statement on our vision page which is reprinted here:
Being able to observe behavior remotely with a time lag that a machine would consider years is not being in control. The first step in scaling and managing the ever increasing complexity in applications and runtimes starts with local (software) autonomy, self-observation (monitoring / awareness), self judgement (analysis / evaluation) and self reaction (change / adjustment).
And from the article The Complexity and Challenges of IT Management within the Cloud we have the following closing paragraphs:
The answer lies in emergent computing and adaptive control that is local and immediate. Local in that observation, judgement and reaction are collocated with the normal processing via embedded controllers and sensors weaved into applications (at runtime). Immediate in that the time interval between measuring, sensing and signaling (possibly to a remote station) and the actuating is at the same resolution of the underlying task/transaction processing that is being monitored, managed and controlled.
For this to happen we need for IT to change starting with how it (or its systems) observe. Moving from logging to signaling. Moving from monitoring to metering. Moving from correlation to causation. Moving from process to code then context. Moving from state to behavior then traits. Moving from delayed to immediate. Moving from past to present. Moving from central to local. Moving from collecting to sensing. When that has occurred we can then begin to control via built in controllers and supervisors.
An upcoming article will look at centralized and decentralized feedback controls and how a hybrid approach can be achieved that offers the best of both approaches based on Simz and Signals, which is something we touched on in the article Going Beyond Actor Process Supervision with Simulation and Signaling.