Lies, Damned Lies, and Logging
Almost 16 years ago when I started working on GSM billing systems I came across a piece of C code that had a comment written in German. I turned to my Dutch colleague and asked whether he could assist with a basic translation of the comment. His response was “I don’t know what this code does or how it works”. My immediate reaction was “Just tell me what the comment is saying”. He laughed and said “That was the translation!”.
I suspect my conservative approach to logging, and to some extent inline code commentary, has been tainted by this event. Logging, the way I have seen it applied, has always seemed like commentary that a developer wanted to echo back by way of an expensive file write. And much like the comment in my short dialogue, written in a different language to the code itself, and a reflection of what the developer thought the code did rather than what it really did, which in this case was not entirely understood.
A good few years later when I was called into a bank to help identify a performance issue with a particular transaction I was reminded of this first encounter with logging as a form of fiction writing. During the course of the performance investigation the problem was found in a different transaction than had been indicated. It was in someway similiar to the one that had been identified by operations but not the same. The initial reaction to my findings when presented to management was that my performance monitoring product was broken (the joys of performance consultancy). When asked how they had come to associate the performance problem with a different transaction the response was “It is written in the log file.”. Instead of asking to see the code for the other “problem transaction” I instead asked for the code related to the transaction I had identified. There in the code was a log statement that looked to have been directly copied over from the other transaction without any change in the labeling. A similar issue recently occurred as a result of a log statement causing misclassification of a transaction label via the stringifying (
Object.toString) of state prior to the proper setup and execution of a transaction. In this case we did not have to look at the code because our metering change set showed clearly that the behavior logged was incorrect .
Here are a few observations I’ve made over the years which I believe contribute to the disconnect that can occur between what a log file says occurred and what actually happened.
- Maintenance: Logging is treated much like commenting drifting from fact to fiction as code changes are experimented with and then committed.
- Translation: Logging uses natural language instead of the host programming language to describe what is happening. It is unstructured.
- Hypothesis: Logging is a based on a developers understanding of the code behavior and its side effects; it is susceptible to misconception bugs.
- Copy & Paste: Logging statements can be copied from one source file to another in the course of copying code, thereby invalidating an enclosing context.
- Verification: Whilst code can be thoroughly tested by a team, rarely does one find anyone looking at and verifying how factual what is logged.
- Validation: Testing the production of certain log statements is rare in practice. It is not seen as a side effect such as in the case of a database transaction.
- Locality: Many statements reflect an expectancy of not just what the immediate code does but also what its direct and indirect callers do.
- Prematurity: Developers tend to place logging at the start of a code block expecting the block will execute to completion and not fail.
- Disconnectivity: Log statements can be written and then later on invalidated by another statement without traceability.
There are other issues related to logging including poor performance, poor signal to noise ratio (lost in space), unstructured and non-standard formatting.
Most of the above mentioned concerns can be remedied if we eliminate the coding of many log statements within the code itself. Instead logging is dynamically injected into the runtime by way of an extension to an activity metering runtime and/or adaptive signaling runtime. Both approaches offer a mechanism to generate a change set reflecting execution behavior between two execution points. Each change set contains a concise history of execution behavior by way of change delta analysis on the thread specific metering model or signal boundary model. Logging is then called based on a rule based classification of the behavioral model represented by the change set. This is coded and thus can be maintained and tested independent of changes in the actual application code (to some degree). It is actual rather than assumed or premature. It is deep rather than local and shallow in its observation and classification. It it contained (within the change set) rather than dispersed. It is efficient and allows filtering out of noise. It has structure and is exposed allowing it to be used by the application runtime for self-regulation or self-adaptation purposes.
What is more we can have log statements written post the execution of code, in a separate runtime, using playback and simulation to reconstruct and replay the software execution behavior recorded. This can be used to correct mistakes in previous log rules as well as test new log rules, especially when a particular recorded behavior can be used as a predictor of abnormal and detrimental execution (or imminent termination).
Finally if you ever wanted to log a statement (record) based on a condition that spanned multiple application runtimes you can us the same techniques above along with Simz, which operates as a metered application runtime, simulating in real-time the metered behavior across multiple processes and threads. This is also useful in removing a runtime dependency on a particular log library and version from your applications, as well as isolating apps from changes to a log level, which now only has to be applied to a single application runtime, be it a simulated one.
It is time we revisited logging and looked at other models and approaches to collection and communication of observations and events.
Introducing Signals – The Next Big Thing in Application Management
Going Beyond State/Structure Reflection in Java with Behavioral Tracking
From Anomaly Detection to Root Cause Analysis via Self Observation
Simz – The Promise of Near Real-Time Application Simulation in the Cloud