Thursday, 17 April 2014

Use Root Cause Analysis (RCA) in Sprint Retrospectives to Improve Quality

A couple of years ago, I wrote an exhaustive piece about how software and systems projects need to focus on effective quality management across the board, covering in some fair amount of detail the various aspects to quality management, drawing on the experiences from a past project of mine as a case study.

In that paper, I closed by touching on how we, the management team, used as an intervention to improve quality of releases, by injecting Root Cause Analysis into the project stream, by embedding a Quality Manager as part of the team (~250 people, 20 odd development teams), whose role was to monitor various aspects of quality, covering escaped defects, quality of defect information, component code quality, build & release process quality, etc. This manager met with component teams on a weekly basis, instigating interventions such as code reviews, mandatory fields to input information into the defect tracking (for better metrics / data mining), proposing static analysis tools, and so on. The result of this intervention saw a marked improvement across the board, right from individual component quality to final system release quality. This was a full time role, the person was deeply technical, very experienced engineer, and had to have the soft skills to deal with multiple personalities and cultures. Because we were distributed across UK (2 sites), Israel, France, India (2 sites), we had in-country quality champions feeding into the quality manager.

In the above-mentioned project, for example, we ended up with something like this (at a point in the project, Release 19, not the end), the snapshot shows progress over 11 System Releases, capturing 33 weeks of data, where each release was time boxed to a three week cycle:

Starting off, in order to do a reasonable job of RCA, we had to trust the quality of data coming in. Our source was the quality of defect information captured in the defect tracking tool, Clearquest. The stacked bar graph on the top of this picture shows how we measured the quality of defect info: Dreadful, Poor, Partial & Good. The goal was to achieve Good, Reliable defect information - over time, through lots of communication and other interventions, we saw an improvement such that there was barely any track of Partial / Poor / Dreadful defects.  The radial graph shown in the bottom, measures the trend of root causes per release, and movement of causes from one release to another. At the end of Release 19, we still had some "Design" causes to get under control, but we'd seen a substantial improvement in the level of "Coding", "Architecture" & "Unknown" issues. For each of the causes, the team implemented interventions, as described here.
In this post, I drill down into a component development team that's only just starting with agile / scrum, to show how the team can adopt similar Root Cause Analysis (RCA) principles in managing the quality of their own component delivery, by leveraging off the Scrum Retrospective as a platform to implement an Inspect & Adapt quality improvement feedback loop.