Monday, 29 April 2013

Pragmatic Set-Top-Box QA Testing - Don't just trust Requirements-to-Test Coverage scripts

This post might be a little edgy - I think it's important to understand alternative perspectives around Set-Top-Box (STB) testing, how it is uniquely different to other forms of IT Testing; to understand the dynamics of STB testing that promotes agility and flexibility, whilst simultaneously understanding the risks associated with pragmatic testing.

The Headline:

Last year, I wrote quite an in-depth paper on Effective Defect & Quality Management in typical DTV projects. It covered many topics, and touched briefly on the aspect of Project Metrics reporting. This post expands on the subject of QA Metrics tracking, focusing on how this reporting can help change the direction of the project, and instigate changes in focus to the overall QA efforts, particularly around Set-Top-Box QA testing. I advocate the project will change QA focus as stability is achieved with Requirement-to-Test coverage maps, to include more Exploratory, Risk-based & User-based testing.

When STB projects usually kick-off, regardless of whether you're implementing Agile or any other methodology, there's the concept of requirements of the EPG / Application, in terms of user experience, usability and user interface requirements. The STB application requirements actually drive the requirements of the the technical components of the system, such as Middleware & Headend. The Testing/QA elements of the project is generally considered at the outset: We kickoff QA teams for the different components, define what the QA Entry/Exit criteria are, Debate on Sanity Tests, and ultimately define the product's overall acceptance criteria - when do we launch?? Which stage of QA decides the product is suitable for Launch??

If this a brand new EPG from the ground-up, or the inclusion of a massive new feature where the work is quite significant, we generally track the requirements, and associate test cases with the requirements. Our measurement of "Done" or "Fit-for-Purpose" is measuring the results of test-case execution against the requirements.

A feature will have a collection of requirements. Every requirement will have at least one test case mapped against it. When test cases fail, a defect is associated with that particular requirement. The readiness of a feature is thus gained by the pass/fail rate of the associated test cases. If all test cases are passing, then the feature is good to go. We also use the pass/fail of test cases to track regressions, which is quite an important element of tracking during the typical development/integration phase of the project.

As the project matures, the QA team provides report in terms of the regular (weekly) progress being made by the QA workstream. The factors worth reporting are usually:
  • Number of Test Cases Defined
    • Depending on the requirements, product backlog and high level use cases, this is generally a best effort estimation of the total number of test cases expected to be written to cover the testing of a particular feature
  • Number of Test Cases Written
    • To date, what is the progress of the number of test cases written - i.e. progress against the test case backlog
  • Number of Test Cases Executed
    • During the last execution of a test cycle, how many of the written test cases where run?
  • Number of Test Cases Failing
    • How many test cases failing per feature area?
  • Number of Test Cases Passing
    • What is the pass rate of the test cases per feature area?
This information generally helps the project team understand where the QA workstream is at. It is generally communicated as the following:
Summary of Test Coverage Progress: STB Main Functional Areas
Example tracking of Middleware Tests Progress
The first figure above is an overall summary of progress made by a STB SI QA team in terms of mapping a STB product's main feature/functional areas against test suites. It basically shows that good progress has been made with defining test cases, some test cases are still to be written, a fair amount of test cases executed are failing -- and the project has some way to go yet to reach the overall coverage goals.

The second set of pictures above cites an example STB Middleware team tracking progress through various stages of the project, starting 12 months before launch moving to 3.5 months before launch - showing that eventually all test cases that have been defined were actually written, and most importantly the number of test failures has been decreasing -- a telltale sign that quality is being achieved, confirming just one measure of quality tracking.

But, Don't Blindly Trust Scripted QA Test Case Execution Results!
Now, to get to the core of my message. Eventually the project's QA team will reach a stage where all QA test cases have been written and executed. The QA team will have settled into the rhythm of executing test cycles almost automatically (unconsciously competent), executing one test-cycle-after-another almost robotically; and thus run the risk of not seeing the woods-for-the trees.

Once the milestone of reaching test coverage for the major product requirements are reached, and the QA team have executed a few test cycles, and the project is entering the last mile of the stabilization/launch phase, blindly trusting the QA scripted test case execution results offers a level of false confidence - because scripted test cases only test what has been scoped at the requirements level.

Even with Agile, a Product Owner is unable to flesh out, in the definition of done, the various test scenarios to consider. In a sprint, the team might have implemented to death the scenarios as they've identified per user story, but when the product is then used by real people, real users - defects will naturally surface.

Every STB project will enter a phase where the initial confidence is achieved through basic execution of test cases - but the project will see an influx of defects not from failed test cases, but from odd user testing: Exploratory testing by the QA team themselves; Field Trial testing as well as defects inbound from core focus groups.

When this happens, that is, there are more failures coming from Exploratory Testing, then it behoves the QA team to rethink their strategy in terms of testing: Does every QA cycle start off by dogmatically executing test cases that have been passing all along (release-after-release) or does the QA team do more selective and risk-based testing: Focus first on verifying previous reported fixes, then exploratory testing, and selectively testing areas that could be impacted by changes in the latest release??

This is how the story is told through pictures:
1. QA Team makes good progress with Executed Test Scripts - Features working nicely
Scripted Test Case Coverage looks Good
The above trend shows the QA team have met their targets in defining, writing and executing all their test cases. From the test cases that were executed on the last software release, the results paint a very positive picture indeed - the product's quality in terms of the features scoped and defined (documented as requirements or user stories in the backlog) are ready, there is at least a 90% pass rate on average per feature - so it must be the product is ready to ship surely? Ship It!! Not so...

2. Oddly enough, the project still sees a high influx of Defects per Feature area which is at odds at QA Test Report - where are these defects coming from??

Weird - Defects not coming from QA Test Scripts
This trend now takes into account the actual defects reported against the same release that was tested using test case scripts, but also includes other testing groups such as QA Exploratory, Field Trials and Focus User Group testing. This trend takes into account the test script failures per feature and shows the real status of the product's feature set -- the product is not ready!!! Don't Ship it. Is there more value in QA script execution now, or do we change tact and focus on Exploratory??

3. Project now understands that bulk of defects now reported come from Exploratory Testing - points to changing QA strategy??
Spread of Defects Discovery Areas
Ah, this is now a much clearer picture for the project - it seems that the scripted QA test cases only account for less than 5% of the overall defects reported.

With this information in hand, the project team must consider different strategies for addressing the issues reported, and manage the stability & quality requirements of realising the product launch...

The way I've seen this go, bearing in mind I've worked with many STB projects in my lifetime:
  • Achieving test coverage to all requirements is just the beginning, not the end
  • The bulk of failures reported comes from user testing, i.e. Exploratory testing
  • QA scripted test cases acts as a very useful measure for tracking regression - and should be automated as far as possible
  • The Product Owner or Business Analyst in charge of Requirements Specification doesn't always gets stuff right
  • The most value comes from real world testing, hence I've always recommended parallel field trials and early user group testing
  • As projects get closer to launch, risk-based testing is much more preferable
    • You're unlikely to uncover new areas of failure by repeating the same scripted test cases if the test cases have been consistently passing each round
    • Don't start risk-based testing if you haven't achieved a measure of confidence from scripted test case execution
    • Plan for regular execution of complete test case executions at key points in the project, you don't have to repeat the same (i.e. re-run test cases from zero to full suite) every test cycle
  • The expected QA priorities would be: QA Sanity, Verifying bug fixes, Tracking Regression, Exploratory Testing, Supporting Field Trials and Focus Group defects (help recreate problems reported in the field)
  • Allow for a fair amount of time to support Field Trials / User Group Testing
  • The debate of Formal VS Informal testing is well understood in the QA world
    • Informal Testing or Exploratory testing in my view, should feedback into formal test scripts - there must be that feedback loop to improve the notion of "escaped defects"
    • The same needs to be fed back into unit testing & regression spaces

No comments:

Post a Comment