The Rise of Log-Driven Development in DevOps
Development and test cycles are becoming quicker and more efficient, so organizations need to use log-driven development for stricter monitoring
"There is no perfect code and no perfect unit testing or integration testing. The testing and monitoring of a feature continues throughout its lifecycle"
Developers frequently find themselves under the radar, working behind the scenes to create a flawless end user experience. The devops approach to development and delivery has played a key role in changing that, presenting developers with a way to express themselves through enhanced agility and innovation.
Along with empowering developers to find their creative voices, devops holds developers accountable for their code from development to support -- and this methodology requires the right control and management tools. Agile, continuous integration, and continuous delivery processes need checks and balances in this modern age of automation. Development and test cycles are becoming quicker and more efficient, so organizations need increasingly strict monitoring to check code, usage, and service behavior.
To do this, my company, Logz.io, uses an approach called LDD (log-driven development). I see LDD as the natural manifestation of TDD (test-driven development) in production. It equips R&D teams with the transparency needed to enable code to move to production at a faster pace, while maintaining the necessary checks and balances.
Monitor code with metrics, logs, and alerts
At Logz.io, developers who want to push code to production are required to implement monitoring components as part of their initial commits. They are required not only to write the feature's code, develop unit and integration tests, and submit pull-requests, but also to write descriptive log messages, generate comprehensive metrics, set alerts on critical log messages, and create a relevant monitoring dashboards within their troubleshooting platforms.
Pushing code to production without strict monitoring and comprehensive visibility is simply too risky. Yes, this approach takes more time, but as a result, developers and ops teams will immediately have quantitative data on the quality of the features once they are released.
In order to integrate monitoring to track a code's behavior and performance, there are several metrics and types of log data that need to be collected and analyzed. These can be put into these categories:
• Numerical data. This includes such metrics as the number of system errors, failures, and user downloads.
• Log data. This is more detailed and descriptive information that includes types of system errors, exceptions, warnings, and even unpredictable user experiences (which should "never happen").
Our developers define the log data and metrics that need to be captured in the design phase. Next, in the code review phase, our developers attach unit and integration tests as well as a dashboard that visualizes metrics and log data during testing and later in production.
Alerts are equally as important as the captured logs and metrics because critical issues in a feature need to be addressed immediately. Developers are responsible for defining the alerts and notifications as well as their levels of priority. If events do occur in production, the first ones to be notified are the respective developers -- making them more accountable for the software that they create. LDD prevents our developers from shirking responsibility in the event of system failure, and it rewards correct implementation.
The two Rs: Responsibility and recognition
Devops balances developer creativity and accountability. In that context, we consider the following Rs to be the two most important values of the LDD approach:
• Responsibility. A developer who develops a feature and pushes it into production must be held accountable for the outcome of that feature. Implementing devops the correct way puts developers closer to their feature releases and naturally makes them aware of the influence that their code has on system behavior and end users. Finally, whether a feature is successful or not, the responsibility of its performance rests solely on the shoulders of its developer.
• Recognition. Beyond the burden of accountability, developers must also be allowed the spoils of a successful feature. A developer who successfully implements a requirement should be recognized as its owner. Recognition, as well as its definition in terms of responsibility to users everywhere, also connotes an accolade for a well-placed contribution to a developer's organization.
A real-life example
By practicing LDD, our teams can easily look back at the features that were delivered six months ago with clarity. This entire process allows them to understand any issues that arise and then make thoughtful and informed decisions based on data.
Last month, we released a new feature after developing it for a couple of weeks. As part of the release, our developers prepared the logs and metrics that would be needed to comprise the new feature's monitoring. These included potential system errors and exceptions, feature usage metrics, and two dashboards. Every ounce of information was collected, and alerts were then enabled to notify the relevant developers whenever certain events occur.
A couple of hours after pushing the feature to production, we started to get alerts on one of these "this should never happen" log messages. The team immediately looked at the relevant log streams, saw a trend in the relevant dashboard that had been prepared beforehand, and was able to push a fix to production shortly after discovering the problem.
There is no perfect code and no perfect unit testing or integration testing. The testing and monitoring of a feature continues throughout its lifecycle. The LDD approach means that R&D leverages the same native logging tools as operations and takes responsibility for end-to-end feature development and deployment. And that makes LDD a great devops capability.