Mo Khan's Outlet!

Wednesday 23 January 2013

SI QA Sanity Tests debate...round one

Today, after 19 months of joining the company & about 18 months from getting involved with ongoing projects, we finally decided to have a discussion around the topic of "Sanity testing" since I had a tendency to always comment on why we continued with downstream testing even though "sanity" had failed, but I was always overridden that it was deemed acceptable for the current stage of the project, and since I wasn't directly involved in managing that project, so I let it be - the timing wasn't right yet...

But even today, on a project that I do have direct involvement in as the overall programme manager, having influenced & steered much of the development / integration / test process improvements to bring the project back on track again, there is still some confusion around what "sanity testing" really means. The technical director managing the launch, who's position in the corporate hierarchy is one level above myself (and the QA manager), has maintained a different view of Sanity Testing than I. But it seems I'm not alone in my view, having recently gained the support of the QA Manager as well as the Product Owner - so we decided it was time to meet and discuss this topic in an open forum, to try and reach an understanding, to avoid confusing the downward teams, and establish a strategy for possible improvements going forward...easier said than done!

One of the challenges I face almost daily is that of enabling change across the project team, sticking my nose into Development / Integration / Test departments with the aim of steering them in the right direction, to help increase the chances of the project delivery. I do this, even though I myself, strictly speaking am positioned within the organisation as a "Programme Manager or Strategic Planner", wearing the hat of Project Management, as if applying PM pressure across the team isn't enough of a thorny subject already, I still go and push for process changes in Dev / Int / QA - am I a sucker for punishment or what?? But I can't help myself really, because my technical background, I've been through all the stages of software engineering as both engineer and manager, and have worked for the best companies in the industry, who, through the years evolved to using best practises -- so when I see things being done that deviates from what I consider the norm in the industry, I just can't help myself, but to intervene and provide recommendations because I can help short-circuit the learning curve and help the team avoid repeating expensive mistakes!

And so with the topic of QA Sanity, we've never respected the term, and continue, despite failing sanity, to proceed with testing an unstable product, in the hope of weeding out more failures:

What does Sanity mean?
What do we do when Sanity fails?
Do we forsake quality until later in the project?

Types of Digital TV Projects, Decision Factors & Typical Duration

In this post I will discuss the various types of Digital TV projects that I've come across over the last twelve (12) years. Although the DTV system is a complex one, and in theory, there are many different permutations and combinations of possible projects, we should not forget that on the part of the Pay TV Operator, it is quite an expensive affair, and therefore these guys are quite resistant to change. Initial costs in setting up the broadcast network must be made up in as short a time as possible (ROI in less than 5 years maybe), subscriber growth must be on the increasing trend, competitive threats kept at bay.

Depending on the market dynamics, some PayTV operators (especially Europe & North America) have strong competition, the market is quite open and those regions usually have a strong regulatory body that promotes a free market, competitiveness and most important of all, the right of consumers to choose. Whilst in other markets such as Africa and Middle East, the PayTV operators that were first to get in when the time was right, are usually market leaders, the dominant player and lack any strong competition.

Regulatory bodies in this region are either non-existent or immature in its abilities in governance and promoting a free market based on consumer choice (South Africa is a good example: not generally consumer driven, although recently this has seen a change with the introduction of the Consumer Protection Act but is still far from having an SA-equivalent of the UK's Ofcom for example ICASA doesn't come close). In the Asia/Pacific markets, there is a growing number of PayTV operators, where competition is rife, time-to-market more important over say, user experience or innovative features (I've seen this with my own eyes, for example - in India, there are some operators that have got products that look like they're built on eighties technology)...

I've set myself quite an ambitious task (as usual). The nature of DTV projects vary, depending on which side of the fence you're sitting on. Of course, all projects that generate revenue are driven by the PayTV operators themselves. If you're a software vendor, say Middleware or EPG Application, you will have a mix of customer delivery projects and internal R&D projects. Both projects have their own challenges, involve various factors that influence decision-making; and both can largely be estimated using general guidelines (that are usually based on performance of past projects).

In this post, I'll discuss projects from the PayTV Operator's point-of-view, in terms of typical use cases. In a future post I will touch on typical R&D projects and what drives primarily software vendors operating in the DTV product space.

Breakdown:

Process for Fast-Tracking Issues - The Daily Scrub

In this post I'd like to share a simple, yet effective process to managing the defect discovery phase, typical experienced during the early stabilisation phase of launching a STB product. The process does not only apply to STB projects, it can be used for Headend projects; or even general Software Projects that involve a number of components requiring system integration. Since the bulk of my experience comes from STB projects, the information presented here thus has a bias towards STB projects.

The material presented here is not that ground breaking, really. If you've been in the business of delivery DTV projects for a long time, then you probably have a tried and tested way of managing the system integration processes. However, if this is the first time you're involved in system integration, you are either a PayTV Operator taking more ownership of SI (System Integration), or you might be a professional services outfit and have won your first big project - If you fall into either of these categories, then I have no doubt that you'll find this useful, at best, trigger to review your currently planned strategy.

As an aspiring independent consultant, the Scrub Process is yet another addition to my PM Toolbox. Recently I had to present this process to a project team, who's processes were a bit old-fashioned and rather rigid, and slow. I needed a medium to express the concepts of gaining more efficiencies out of the SI Process, thus put together this presentation to communicate the idea, which I'll only briefly touch on here. For more details, please download the presentation.

Set-Top-Box (STB) BootLoader Management Template

The Set-Top-Box Bootloader is a crucial piece of software (actually firmware that is generally burnt into (ROM) Read Only Memory, usually One Time Programmable (OTP)) that performs not only the vital job of the classic bootstrap (Power on, boot, launch application) for the device but also is fundamental to applying software updates to the system. Like in PCs, the bootstrap is the initial code that is loaded that generally performs a self-test to check if the device configuration is OK, and looks for an image to load that essentially boots the device into operation, ultimately loading the EPG application.

The Bootloader is generally implemented as very low-level code, with limited set of drivers to enact the basic hardware functionality, with the focus of occupying as small a footprint as possible. There are different types of Bootloaders implemented in STBs, generally known as "n-stage" Loaders. Typically you find a Singe-Stage Bootloader and Two-Stage Bootloaders. A Single-Stage Bootloader entails an all-encompassing loader, self-sufficient, not requiring the help of another bootstrap to continue the boot process. Whereas, a 2-Stage Bootloader, as the name suggests, includes a secondary stage, that kicks of a higher-level bootstrap to facilitate the loading of subsequent system components.

As a STB Project manager, there will be a occasions where the Bootloader requires special attention. This is especially true for projects involving brand-new hardware, different Middleware or Drivers. Generally though, the Bootloader is closely coupled with the Target Hardware: Change in Hardware implies Bootloader changes. However, if implemented correctly, the software architecture for a Bootloader can be effectively designed like any other software stack, with generic components as a Middleware would, leaving a HAL/Hardware Abstraction Layer open to port for different Hardware devices, thus making the job of supporting Bootloader code more efficient. In some cases, Bootloader is a product in its own right, and many companies and consultants have made a business out of it.

STB Bootloaders could be provided by the CA Vendor, Middleware Vendor, or traditionally the STB Manufacturer, because of the code being so low-level and tightly linked to the physical hardware. In the early days when the traditional medium of delivering software to the STB was through the physical tuner cable, the role of the Bootloader was relatively simple. It gets complicated when we move to hybrid STBs, especially with an IP connection, as this opens up another channel for distributing software updates. IP Bootloaders in STBs is relatively new, and hasn't reached a state of maturity as say, the traditional Satellite STB Bootloader. Bootloader testing is extremely technical and complicated, the security, reliability and resiliency testing is really thorough. For example, when testing a Satellite STB, the Bootloader must deal with all kinds of Signal Quality conditions: Removing the signal feed in the middle of the download process, recovering signal, time-outs and general reliability: pulling out the power during the final stages of the write process, etc.

In an IP Bootloader, the same scenarios are considered, but at the TCP/IP networking level. First and foremost, the STB IP Loader must implement a robust IP networking stack. We had a problem in one of our projects where the Bootloader had hard-coded MAC addresses to Zero! We only picked this up late in the project as we entered the final testing. Such basic scenarios should not get past the first stage. Once we fixed that problem, we then hit stability problems: network jitter, packet loss & general robustness w.r.t. IP-connectivity. The Bootloader software component must be ready and verified well in advance of formal production of the hardware. Remember this component is burnt into the STBs at the time of manufacture. It has to be defect-free, any rework post production is going to be next to near impossible, or severely time constraining, manual reworking that will cost the PayTV operator tons of money. You screw up the Bootloader, you fired!

The above experience thus triggered me to instigate a more formal, rigorous process in managing the Bootloader Workstream, which is presented below.

Template for STB Programme Manager: Bootloader Workstream

If your STB project involves a hardware element then it's most likely to impact the Bootloader. It is essential to highlight this dependency in your project plan, even though you might not be directly responsible for the development & delivery of the said platform. If the project was initiated well, or your organisation is quite mature and have been delivering STB projects for many years, then this workstream happens almost like clockwork, automatically in the background, most likely by your company's STB Hardware / Drivers Porting team.

Nonetheless, defining the key roles and assigning owners to the management of the Bootloader delivery will help clarify and add structure to the overall programme:

Product Owner - If you're a broadcaster or PayTV Operator, you need a good understanding of the feature-set and requirements of this component that meets the needs of your business objectives & operations. If you're a component supplier, you still need someone to own the product specification for this component.

Technical Owner - Is someone with technical responsibility, generally an architect, who will translate the Product Owner's requirements into detailed technical requirements specification, that Bootloader vendors must implement.

Component Owner - The party responsible for implementing, testing & delivery of the Bootloader component.

System Integrator: Low-Level & System High Level. The responsiblity of the assigned STB System Integrator to prove the integration of the said software component.

Quality Assurance: Security & Customer. Generally CA Vendors provide certificate of approval of the Bootloader, as it has to pass strict security criteria, remember the CA is responsible for generating millions in revenue (this is generally known has Low-Level code testing). The Bootloader therefore has to be secure to prevent intrusion, compromising the integrity of the STB, leading to piracy. The PayTV operator also performs ATP as the Bootloader must adhere to various business scenarios for software download, upgrade, resiliency & recovery modes. Operator testing is generally referred to as High-Level Code or Customer Code testing.

Project Team. There must be a Project Manager assigned to manage this Workstream. There are generally many dependencies at the System Integration level, so this is best assigned to an SI Project Manager.

Simple Template for Bootloader Workstream

Download the Template!

Some other useful references around Bootloaders
S3's Whitepaper on STB Migration

Tutorial on the inner workings of a STB

Friday 9 November 2012

Modeling the Climb to Launch for STB Delivery Projects

Continuing with my aim of sharing the tools I use (and come to value) in running my own projects, in this post I'd like to share a simple, yet powerful tool, that can be used for scenario planning a typical Set-Top-Box Product Launch, which I've come to term as "The Climb to Launch".

To understand what I'm about to present in this post, you should at least familiarize yourself with some preliminary info:

The last mile of any DTV project is the STB Delivery Project. It is widely accepted practice to deliver the components of the system (a.k.a. Backend or Headend), the core infrastructure that provides essential services to the to STB, well in advance of initiating the formal Test/Integration/Release phase of the STB project. Some might argue this is all very well in theory, and in a complicated system when delivering a brand new feature like Video On-Demand, where the entire value-chain is impacted, one cannot avoid the big-bang Integration/Test approach. I maintain this is just a very lazy approach to programme management that should be avoided at all costs! But that's a topic for another post...

A STB Delivery Project Manager has to implement a plan taking into account all the inputs and resulting outputs. Although there is always variability in these projects, the variability reduces as we approach the end of your typical Development phase, and enter the Stabilisation phase of the project.

We can therefore, quite easily quantify the planning variables involved in the STB Delivery Plan, and use these variables as inputs into producing an overall delivery plan, as I'll explain below.

The STB-Planning Model

Based on the Release Campaign Process outlined in earlier posts, the STB Delivery Plan needs to target specific milestones in the Climb to Launch. I call it the Climb to Launch because it is literally that: A tenuous climb by the entire project team, working through instabilities, performance issues and errors in functionality and user experience, that must all be fixed in order to Launch. The resulting Excel Tool implementing my model produces a Gantt View that takes the shape of Steps leading up to Launch Milestone, hence I've coined the term "Climb to Launch":

Output from Planning Model showing "The Climb to Launch"

Key Planning Variables for the STB Delivery Manager

Again, I'm not advocating that STB Delivery PM is easy and can be solved by using a simplified modeling tool in Excel! A STB PM adds value by having a keen appreciation for the various systems impacted, development & integration processes and most importantly building effective relationships with various teams and stakeholders. It is also important the Delivery PM is well equipped with relevant domain knowledge, can hold technical conversations with ease, as well as have the ability to prioritize issues, despite ambiguity, conflicts & uncertainty.

Planning is no less an important activity for a Delivery PM or even Programme Manager, but spending too much time maintaining a plan on paper, instead of active involvement in the shaping, driving and maintaining of focus - is another recipe for disaster. Having said that, at the high level though, essentially all STB projects follow typical patterns and involve the following key planning milestones or variables:

Availability of the STB Target Hardware to the rest of the Core Development team

There is only so much a development team can do on a typical reference development platform
The sooner the target hardware is made available, the sooner SI can start stabilization
Software components of STB must also be available, i.e. fully development, tested & certified - this is especially true for the STB Loader, less so for accompanying Drivers

Availability of Target Hardware Drivers

This is the foundation of the STB platform. It is generally expected practice that Drivers are delivered, as far as possible, independent from related Middleware components. As long as the interface specification is solid and stable, this development can happen in a parallel stream.
STB SI team must verify and accept the Drivers are fit-for-purpose and can sustain ongoing Development & Integration

STB User Interface Application Component Development Complete

Applicable if there's any new EPG development happening, impacts STB SI timelines

STB Middleware Component Development Complete

Applicable if there's any new Middleware development happening, impacts STB SI timelines

Number of Cycles for STB SI to Stabilize and recommend suitable build on Target Hardware

Assuming your SI team operate in fixed time-boxed iterations, this variable is a useful timeline multiplier

Headend Components Development Complete

Generally, the real STB stabilization phase can only begin once a suitable Headend System is in place that supports the majority of use cases required from the STB product

End-to-End SI Deployment of Headend to Live

Milestone that is useful for tracking - useful for Programme Management, informative for STB Delivery PM

Number of days to time box STB SI Bug Fix / Release Cycle

Length of SI release / bug fix cycles impact the timeline

Number of days STB SI require to create a full stack build for further testing

Necessary and useful to capture so that you don't underestimate the work involved by SI team in the initial bring up of the Software Stack. This doesn't happen by magic, and consumes time.

Number of Cycles to reserve for reaching First Major Milestone (Functionally Complete)

For initial planning, and based on past projects - this is a variable that determines how long the initial stabilization phase is expected to take to hit the first real milestone of being fully Functionally Complete and ready for Closed-User-Group Testing

Number of Cycles to reserve for reaching subsequent milestones, leading up to Wider Field Trials

As above, similar penciling-in a slot to reserve for External Home User Testing

Number of Cycles to allocate to Closed-User-Group (CUG) Testing

As above - how long do we run the CUG stage for?

Number of Cycles to allocate to Wider Field Trials (External Testing)

As above - how long do we run the External Home Testing for?

Earliest Indication of Launch Build Candidate

Based on all the above, what then becomes the outlook for Product Launch?

The Modeling Tool - Example Use

Based on the above planning variables, the PM can model different scenarios. Typically one caters for Best Case, Worst Case and Most Likely scenarios. There is also the Three-Point Estimate technique that some PMs use to better quantify the estimates. I use this technique as guidance, but generally don't go down into the detail of mapping the probabilities of the different outcomes.

Suppose you were given a Programme to manage (but wasn't involved from the beginning) and the project itself has changed along the way to the latest Scenario. It's January 2010, almost a quarter way through the project, the business decides to change its mind: Brand new STB Hardware, New User Experience EPG, new Middleware and also new Headend. You're told that you need to deliver the project in 10 months, or else...

Clearly if you've been involved in STB projects as long as I have, you'll know immediately this project is a pipe dream. How do you report back to the Stakeholders, in a diplomatic way, that all the Stakeholders must be smoking something, or in La-La land, that they've never ran a Product Launch from the ground up on this scale before, and without any planning or analysis - you reject the deadline outright?? No matter how good your gut is, or hunch-base, you won't be taken seriously, unless you have some proof you've applied your mind to the scenarios, and things just don't add up.

But you don't have the luxury of time on your side to get into a detailed planning / analysis phase, with work breakdown estimations, etc. You need to summarize the plan at a high level, working through the various scenarios - this is where a tool comes in handy.

Below is a snapshot from the tool:

The Modeling tool

Step 1: Input the dates / values for variables for the Tasks/Milestones offering the three Estimates: Best Case, Most Likely, Worst Case
Step 2: Click on one of the Buttons to produce the Resulting Plan
Step 3: Select the Variables you want to Fine Tune (using the + / - buttons)
Step4: Repeat / Settle on an Acceptable View to Share with your Stakeholders

In the example project, you can work backwards with your deadline, adjust all the variables such that the plan meets the deadline - and the review the reality of achieving the plan with your stakeholders. You then sit with the primary stakeholders, along with the respective Project Managers to work through estimates that are more realistic. Use that as the baseline plan going forward.

The tool can also be used to summarize the key milestones, by automatically producing a milestones table:

Milestones Table Automatically Generated

Download the Tool Now!

Click on the above link to download a copy of the tool, please ensure you have Excel / Office 2010 or higher installed. As I use a Mac, the tool is best viewed on a Mac, preferably a 32 inch HD display. I will "port to Windows" when I have the time. Remember to enable Macros.

Concluding Remarks

This is a tool I keep in my own PM Toolbox. I use it as guidance to not only plan any new STB project, but also as a way of communicating not only to Stakeholders, but the entire Project Team, of the various scenarios and milestones we need to keep track of in achieving a successful Product Launch, on time or to realistically meet expectations. I have really made this into a generic tool only this year (2012), but again, the concepts and milestones are indeed common and applicable to any STB delivery project. I am hoping this could help some of the new PMs entering the DTV space, improving their ramp-up times.

Lastly, if you do take time to download and experiment with the tool, or even start using this in your own work, please get in touch, offering feedback as appropriate!

Tuesday 6 November 2012

ORITs: One Roof Integration Teams for DTV/STB Projects

In my previous post I introduced the concept of Release Campaigns that are a series of cycles executed to reach stability & maturity of the STB Product, which largely happens during the Integration / Stabilisation phase of the project's life cycle.

During these campaigns it is expected to uncover a number of bugs or defects, especially during the first release campaign, where the various system components are actually being stressed for the first time, end-to-end. Expect a reasonable avalanche of defects reported against STB functionality, stability & performance areas. The team that faces the brunt of these defects is usually the STB SI team, who need to investigate, reproduce the problems, and characterise the defect assigning to a relevant vendor to fix. This process is known as "triage" in some circles.

How does the project team manage dealing with these defects? How do we control the quality of the system ensuring the focus is maintained, that we're not spinning our wheels? How do we mitigate the project from further slippage, removing the burden on the SI team? How do we ensure we focus on the areas that are the pain points, the core features and functionality that add value, and affect the bottom-line: customer experience?

Even though the project is likely to have defined clear guidelines for acceptance criteria & defect severity & priorities as discussed in this post, the project will still have to be directed in a manner to maintain a sense of urgency to get the burning issues fixed.

Two areas are almost certain to happen on every STB/DTV delivery project:

Business / Product owners have their own view of describing the product, as a separate list of feature areas, despite what is defined in all the product documentation
STB SI team will become the bottleneck and on the critical path if the standard processes are maintained - the project will slip, unless some other practical solution is found

Enter the Hit-Squad or ORIT Teams

So we set-up dedicated teams to address key functional areas that cause us pain. Teams are focused on particular areas, with the sole remit of clearing away all problems. Typical areas that cause pain in a STB PVR product is shown below:

Common Areas to Focus Early during Stabilisation Phase of STB PVR

STB SI Team Bandwidth is Limited

The idea behind the one roof / hit squad is to get everyone in the same room, forming one team, focused and dedicated to closing down issues. Typically you have vendors supplying different components of the software stack, with their own processes & priorities - to expedite issues much faster, removing delays that come from communications, ping-pong of emails and phone calls - it's best to get the engineers on-site and deal with the issues face-to-face. This sounds like a no-brainer to those from the Agile or XP camp - but in reality some companies can be pretty rigid in their support service agreements. Hence it is critical this is driven quite hard by the customer, and pains should be taken to agree on the concept that projects will call for people (resources) to be available on-site during Integration/Stabilisation or even final Acceptance Testing.

Advantages of ORITs

ORITS definitely add focus and bring a sense of urgency to the project, across the board. It gets the attention from senior management, and commitment from teams to resolving the issues. In essence, we have the full participation of the vendors.
Communication-paths are reduced significantly by being under one-roof or in the same room. This is often time consuming, and generally is impacted by timezones and vendors being geographically dispersed.
Ensures the right level of support is provided - i.e. all required technical experts are available to support, and easily accessible. You have their full attention, no distractions.
Improved turnaround time for fixing issues - The ability to fix code on-site, in real time, providing engineering builds adds tremendous value and cuts a lot of the red tape associated with vendors build/release processes.
Customer is kept happy - all stakeholders have confidence that there is focus, attention & drive. Having a dedicated owner for each area helps build confidence, maintains pace and momentum. Of course, interventions can be applied much sooner than later - no delay in management decisions.

Some Challenges with ORITs

Almost certain to disrupt existing teams and structures. People will have to be seconded to other teams, virtual - people will generally be interacting with each other for the fist time, which means that these teams might not initially gel well.
Requires focus, participation and attention from vendors. Generally vendors are hard pressed to support multiple project deliveries, taking out a key engineer and dedicated to one customer for several weeks, stresses the vendors other commitments.
Engineers allocated to ORITs need to be sufficiently skilled and competent. We are looking not only for senior, hard-core technical experts but people who can deal with stress and pressures from senior management, as well as communication & reporting must be excellent. Very hard to find such calibre people.
Logistics challenges: The premises for the one roof sessions must support the physical & personal needs of engineers (Equipment setup, kitchen, personal workspace, etc).

Concept of an ORIT Owner is Imperative

For ORITs to return value and actually improve the outcome of the project, you have to assign a strong technical leader overall to manage the various ORITs holistically, as well as strong technical leads per ORIT area, a.k.a. Owners.

An Owner is essentially responsible & accountable for ensuring all critical issues for a particular area (e.g. Recordings) are resolved to closure, proven on-site, and tracked through completion in a formal release, all timely.
This means not only investigating & triaging, but also working closely with - and driving the component vendor engineers & teams for fixes or patches including committing to target fix dates for final implementation. If progress is not going as expected, escalate to senior management as appropriate.
Requires sound technical knowledge, domain experience & appreciation for problems, ideally having worked or experienced previous issues on past projects of similar nature.
An owner has freedom to prioritise issues depending on severity without having to wait for approval from Product Owner
An owner also takes responsibility for leading the team assigned, dealing with people as well as performance related issues

Does it work?

Like all other topics I've shared in my PM Toolbox, I wouldn't be sharing stuff that is fluffy and woolly - not implemented or experienced first-hand by me, directly. So, yes - ORITs do work, if managed correctly. It is no easy feat, requires mindsets & personalities that can maintain a steady pace of rigour, fortitude and relentless determination to get to the root cause of problems - seeking out the best practical solutions in the time allowing. I have both managed ORIT teams myself, observed other projects from afar, and been an engineer on-the-ground as Hit Squad member. It can work well, or can be a complete failure. There needs to be a unifying voice from the top, harmony across the entire project team, and complete focus and attention... It is not an easy or straightforward intervention...

A little more detail behind the mechanics

In the attached slide pack, I try to summarise the key points of this process, including notes on the ideal team sizes, reporting expectations including templates.