Monday, 19 May 2014

Achieving continuous product delivery in Set Top Box (STB) software projects: Challenges & Expectations


The story all software product development teams face at some point in their journey, not necessarily unique to digital Set Top Box (STB) software development: So you entered the product development space, setup a small team that delivered the unimaginable - you got your product out in record time. Customers are pouring in, marketing & strategy are imposing an aggressive roadmap. You've set the pace, possibly surviving through possible death-march-like project deliveries, and now you're expected to do more: Scale! You need to scale your product development & operations to support an aggressive business roadmap, the competition is intense, you're charged with figuring out how to deliver releases to market as frequently as possible in one calendar year - so you set the target of a releasing every quarter, or rather, inherited a Product Roadmap that showed Market Launches every quarter…

You barely survived getting one release to market with quite a small team, how do you scale your operations to satisfy the business demands? What behaviours and habits do you change? What changes can you make to your current practices? You must've been using some form of Lean/Agile technique to meet the original aggressive delivery, is the process that was used before enough, can it scale to get you from where you are now, to where you want to be? What are the implications of achieving continuous product release flow, in an environment, that is typically unique to Set Top Box hardware & software development?

In this post I highlight just one possible framework that cuts through all the areas involved in product engineering of Set Top Box software releases. I will not go into detail behind the intricacies of this environment (search my blog to learn about that) - instead I will map out by using a few pictures that shows the scenarios around achieving continuous product releases to market.

The pay TV space is becoming highly competitive, especially with the likes of online / OTT (over-the-top) players like Hulu, Netflix, Amazon, etc - such that traditional operators are hard pressed to up their game from providing the clunky, almost archaic standard TV experience to a more slicker user experience, offering advanced features, integrated with traditional linear services with additional services aimed at stifling the modern competition. No longer can these traditional pay TV providers release new software once a year at least, users having become used to frequent updates with new features on their smart devices every few months, people are becoming more and more impatient with bugs that go unfixed for some time…Nowadays, it is expected that set top boxes are updated with new features as often as possible. 

With this in mind, how does one structure a product engineering department to cope with this demand? How should projects be managed? What do we expect from teams? What are the implications of reaching continuous delivery? Where does a typical agile/scrum team fit in? Is Scrum even a preferred method, or do we choose a hybrid model?

[This is still very much a work in progress, made public to get feedback/advice from other practitioners out there...]
The Demands from Business / Strategy - Roadmap
In the world of STB software product delivery, the customer is really the broadcaster or operator who owns the set top box product. The product itself consists of software components supplied by various vendors, including the Application / EPG / UX, the operating system or middleware, and hardware platform device drivers. There is usually a primary systems integrator involved that is responsible for integrating & delivering the various components producing a stable software stack / build for release. The customer usually does its own customer acceptance testing (QA) by way of test teams as well as including real customers in field trials. The customer manages its own product management team responsible for defining features & requirements, which the vendors need to implement & deliver the required functionality in their various components.

The forecast or roadmap for the year typically tends to to take the form as shown in the following picture:
Generic Picture of a Typical Product Roadmap
In Digital TV projects, a product roadmap of twenty four (24) months is quite typical, segmented  by stages of clarity:
  • Short / Immediate term (Up to 6 months) - Scope is quite certain, and can be frozen / locked in
  • Medium term (6-12 months) - Fair amount of certainty but still some loose ends to tie up
  • Longer term (12-24 months) - Busy strategising, assessing competitive threats, looking at trends & mitigating technology / architecture / implementation options
The intent is that there is clarity within a six month rolling window. Within the six months, there is a further intent to launch to market at least two releases, as shown by GTMx. GTM refers to "Go To Market" releases for final commercial launch, the intent is to achieve these market releases with a frequency of at least once a quarter, every three months.

The picture also tries to show the different teams involved with processing the roadmap. Product Management are responsible for translating strategy into concrete product features, allocating the features to releases which is used as input into the delivery pipeline. The delivery pipeline is managed by Release Managers who are essentially delivery project managers that take ownership for planning, tracking the delivery plan to completion. Development vendors, including Systems Integration & QA / Testing commit to this delivery plan. It is assumed the impacted teams have the capability of supporting multiple releases at the same time, reporting to multiple Release Managers. In the background of all of this, is the Technical / Architecture team that are responsible for designing the technical solutions for future roadmap releases, they simultaneously provide technical support to the releases under development for current/next GTM releases. What's not shown is the role of Program Management, the silent machine that drives through co-ordination of the various release streams involved to get the release to market, co-ordinating, steering, guiding and shaping product management, engineering development, testing & systems integration, including deployment, operations & go-to-market teams...

This release frequency is highly ambitious, assumes a well-oiled delivery machine that can churn out quality releases. The assumption is a highly focused delivery team, working harmoniously in a sustainable rhythm, efficient flawless execution.

On paper this looks great, the business executives naturally love the picture, and see the roadmap as committed deliverables. In practice however, it places some significant challenges on your delivery organisation, especially when the team have just come out from an intensive delivery period, and not had a chance to catch their breadth, to appreciate the scaling challenges on meeting this continuous release flow!

View that Management See (What is Expected?)
Translating this high level roadmap intent into some plausible strategic plan (usually orchestrated by a technical program manager), that shows the constraints and targets the delivery organisation must achieve, the plan could look like this:
High Level Strategic Planning: Hinting on Kanban / Continuous Flow Expectation
The picture shows the outlook for a full year (current), including a view from the preceding year (previous) and next (following) year. If the current year is 2014, you'd see the whole calendar year for 2014, the last quarter of 2013 (Q4), and somewhat into the first quarter of 2015 (Q1).

This template can be a powerful aid in guiding the organisation focus on the expectations, showing in clear terms, what is required to translate the business strategy of releasing new features to market in a constant flow stream of almost every quarter. In order to achieve this, it highlights the milestones from the various teams and indicates the points in the year where teams will be required to support multiple release streams in parallel, concurrently.

This is shown by the multi-coloured swim-lanes that represent a logical segmentation of the teams impacted:

  • Product Marketing & Infield - are responsible for all marketing, communications, advertising & technical support for moving from one release to the next. When the Launch Build (LB) release is infield (meaning that all set top box customers / subscribers have received new software via an over-the-air upgrade), the Infield support team owns the release from that point, feeding back all customer-related issues back into the development teams by way of the system integration interfaces.
  • Customer Acceptance Testing - consists of various teams specialising in testing. Some testing are focused-lab, stress, performance, scripted testing - these are your basic requirements / features / regression coverage testing.  Quite an important test stream is the real-customer field trials. So we get a segment of real customers field trialing the software - the trigger to enter the Field Trials is a stable Release Candidate (RC), feedback to the development / systems integration teams is continuous until the point of reaching LB. Once LB hits the market, the Field trials team continue to soak test, supporting the release Infield until the next available release is ready for testing.
  • Systems Integration & Delivery / Ops - this is a core activity where the sole aim is to deliver a stable software stack to market. Usually this area involves supporting multiple work streams in parallel, from the parent release level to actual child-dependent work streams for integrating components. Engineering field trials is like customer field trials, except that this home-testing is limited to the working staff on the project, as well as some external stakeholders, and in some odd cases, includes an elite squad of real customers who are tech / gadget geeks who just love giving feedback. The trigger to enter engineering trials is a fully functional / feature complete (FC) release from the Development stream.
  • Product Engineering Team: Architects, Product Management, Development - this area is also logically broken into areas focusing specifically on Development, Product & Technical Management. Development owners need to support work requests coming from Infield team from the last GTM release, issues to fix for the next release under Field Trials (for completed work), issues from SI Engineering testing (for completed & new work under development), as well as focus on developing features for the following release (after next).
    • Product Management team need to be supporting issues from all releases under Infield or currently under test, as well as work in advance of clarifying the product backlog for the next two GTM releases, ahead of time.
    • Technical Architecture team need to be working on breaking down the technical challenges of supporting new features ahead of time, formulating the architecture, doing proof-of-concepts and feeding the design patterns back into the mainstream product engineering / development team.
By way of illustration, if we consider what's happening at the start of the new year for example:
  • The business has recently went live with a GTM release, Rel 0 in Q4 the previous year.
  • Rel 0 is under full ownership of Infield team that front real customer problems. Product Management team supports this, as well as Systems Integration (SI Ops), Development (Dev Ops).
  • Rel 1 would have been feature complete and in an RC state to start Field Trials in Q3 the previous year (noting that Rel 1 GTM is targeting March/Q3).
    • Rel 1 would remain in engineering testing all through Q4 the previous year till early Jan/Feb the current year until an RC is produced for Field trials that triggers the drive deliver the LB
    • Soon after this Rel 2 enters SI Engineering testing (switch from Rel 1 to Rel 2)
  • Rel 2 would already have started feature development in Q4 the previous year by the dev team, whilst still supporting the completion of Rel 1 to market in parallel.
  • Rel 3 would also reached clarity on both technical architecture and product requirements in Q3 the previous year, even though Rel 3 is target for later Q3 that year
  • and so on...
As a note to the management and co-ordination expectations for delivering this kind of roadmap, it expects the following:
  • Product Management: Scoping, Analysis & Feature Backlog must be completed at least two releases in advance (e.g. Work on Release 3 scoping when you're in Release 1 Delivery)
  • Architecture: End-to-End, STB Architect & Component Architects must work in advance and be in sync with with Product Management Team
  • Development Owners must be able to support three Release Streams at the same time for Feature Development
  • Development Owners must also support concept of Dev Ops to support issues from SI / Engineering Testing & Infield Support without disrupting the new Feature Development streams
  • System Integration Operations must support Release InField & Release Under Formal Field Trials.
  • System Integration must also support Release in Engineering Testing (SI Eng Field Trials).
This definitely smells like a classic Kanban/Lean system doesn't it?? It might even smell a little like Waterfall - so what?? Agile/Scrum evangelists run far away when it comes to upfront scoping across requirements, architecture and design: the key is to understand the nature of the system, that's why you must have different tools in your toolbox: use the right tool for the right job!! Scrum probably can't be forced on this program at the end-to-end level, but there may still be a place for Scrum at the individual software vendor level...

It is highly likely that features must be worked on in advance. To think that feature development only starts on day one of a three month release cycle is being rather over optimistic & somewhat naive. Some upfront work is still required, even in the case where the team are aiming to split the three months into a development phase, followed by an integration phase, then customer testing. Unless all features are fairly small and uncomplicated, or the scoping / analysis / user stories and features have been broken down nicely to fit in the cycle, then the following picture might be almost impossible to achieve in practice:
Three Month Release Window for Small-Medium Features
The above picture shows how Product Management & Release Management (Project Managers) work in tandem to prepare for a next GTM release. This template is somewhat too optimistic because it assumes that it takes six weeks for development to reach Feature Complete to kick-off both Engineering Testing & Field Trials in almost the same time, which is taking a stretch too far - unless quality is guaranteed from upstream development teams

The aim of this picture however is not to present a realistic implementation, but instead to show how various activities are required in advance of, and during the current release cycle, such that, if the teams get it right, can be repeated in practice so that the entire team synchronises on a rhythm that ticks every 12 weeks, continuously flowing from release cycle to another.

I believe this is possible and indeed doable but there are various factors to consider. The three month cycle where everything begins on this heartbeat is probably okay for small-enough features, isolated to just say, the EPG / UI application enhancements with no dependencies on hardware & middleware. Application-specific releases is where Scrum thrives, there are no serious dependencies on underlying operating system, browser apis, new standards, changing database design, etc - which is why so many application development teams in the webapp / smartphone / tablet -space can sing wholeheartedly about agile/scrum being the silver bullet. Indeed, it certainly becomes more difficult to apply Scrum where there is a strong dependency on foundational changes (say, needs new operating system interface, new database, or an update to HTML standard, requests for new Android/iOS SDK, etc, etc) - then the application team is really pinned on component dependencies that fall outside the scope of their Scrum team -- such is the nature of set top box software projects.

The circle of influence is small, as soon as features start hitting system-wide boundaries, we enter long lead project streams. We can still maintain a constant flow, using longer time periods, but expecting massively complicated features to be done in three-to-six months from scratch is madness, and a job for the product team to convince the business otherwise...

Impact on Product Engineering: Development & Delivery
The entire organisation needs to be working in a rhythm that creates a continuous flow of activity in all teams. If we zoom in on the core activity of systems integration for example, the outlook for the year looks as follows:
Delivery Team's Outlook for the Year
This table shows the flow from one release to another reaching the GTM milestones, at the same time, context switching to support the next releases in the pipeline. The challenges & expectations are made quite clear in this table: your delivery team's ability to support multiple pipelines of work at the same time. This becomes compounded on the development vendors as well, because as a developer, you not only have to support the last two releases just delivered, but also work on the next release, as well as future releases at the same time. I've already written about the challenges of maintaining a common development team for multiple customers [click here].

What the plan does not impose however, is how the respective teams organise themselves. They could chose to adopt the Lean / Kanban model, having separate lines / benches / batches to support the pipelines, or for instance, the development team could maintain one team or many teams, they take full responsibility for handling all the code base challenges either way.

Set Top Box projects are largely around delivery. A lot of work happens way before the STB needs to deliver. I've worked on projects where we've spent a couple of years just in architectural POC-stages, rationalising systems requirements and architectures, with the STB being the last mile component in the chain. It is quite difficult to manage a complex system, for example, a Targeted Advertising / Recommendations / IP-Streaming system with the one-team agile/scrum approach: too many product vendors are involved (topic for another day...)

Challenges / Impact on Team's Capacity
It is quite clear that capacity and resource planning is going to be a major constraint, and as such, there is a strong reliance on the product's priorities being made crystal clear to the point of being frozen within the release cycle time window - failure do so will result in too much disruption of flow. It would be silly not to take a page out of the Kanban book...

Releases could be summarised as being in one of four states (high level) of development/delivery:
  • In the product backlog with the product management & technical architecture team
  • Actively being built by development team, with support from systems integration
  • Done - Development complete - feature complete from a technical point of view
  • Being validated - in the process of customer testing / field trials. When validation is done, it means the release has launched to market.
Now a Kanban rule permits only so many releases in each of the four states. As releases flow from one state to the other, the queues / buckets fill up. Once a bucket becomes full, it can't accept more releases. Only when a release has been validated can it move off the Kanban line/

By way of another illustration, the picture could look like the flow below - showing the impact of work-in-process limits:
Illustrating potential scenarios where flow can be blocked due to capacity or work-in-process-limits*
* Credit goes to The Lean Startup: How Constant Innovation Creates Radically Successful Businesses
for providing a similar example (Part 2, Chapter 7: Measure) that directed my usage in the above template

Concluding thoughts & reflections on achieving this desire of continuous release flow in practice
Whilst it is important to consider the competitive threats of the business landscape, promoting a sense of urgency with aggressive roadmap intentions that, if delivered, would allow the business to remain ahead of the competition, it is critical that all stakeholders involved understand the implications of such a request. As almost all of the books I've read on subject of Lean/Agile/Scrum as well as Scaling, this requires a change in mindset across the entire organisation, from CTO/CEO right down to engineers in the delivery team. Changing a mindset is extremely challenging, but not impossible - albeit be prepared that it would take time, from my experience with other teams, it's been a journey of at least 3-5 years. To get a good insight on scaling challenges, I strongly recommend Sutton's Scaling up Excellence.

A lot rides on the context, nature and culture of the organisation as well. If you've come out from quite an aggressive project delivery, leaving little time to get basic principles of quality in place at all levels (no time for technical debt, architecture refactoring, etc.), including the infrastructure (automation, CI, unit testing, build master, etc.) required then scaling to do more in less time is just going to be quite problematic, next to impossible, unless the leadership team is able to convince the business otherwise.

There is a strong dependence and reliance on a leader to manage the expectations across the business. The delivery leader should protect the team from unrealistic goals, but in order to do so, needs to make a sound case for showing what is being requested from the team, against what their current capabilities are. To do this, one needs to start collecting measurements that make sense. Use these measurements to go to business / strategy, and start the negotiations. If business are relentless and forces the delivery team into a corner, which is often the case in STB delivery projects, the delivery team will have to consider alternative strategies such as a Lean / Kanban continuous release flow as presented in this case.

Implementing this strategy will still call for a massive change in mindset & behaviours. It relies on a fully focused team, end-to-end, across business units and disciplines. Having the right people with attributes such as rigour, grit, diligence, focus, driven, detail-oriented, self-managing, great communicators, etc -- will definitely help. Continuous delivery implies bolstering up the technical organisation with state-of-the-art practices including, but not limited to: continuous integration, automated testing, scalable infrastructure, and highly skilled technical team. Ideally the technical teams should be self-managing, empowered to make decisions within their circle of influence, and also to be allowed to reflect upwards triggering changes in management expectations, etc.

In short, the entire team needs to bring on their A game. Alpha managers all around, alpha engineers - and a passion for adopting continuous process improvements, driven by self-reflection & feedback loops...

[Sorry for the waffle - will clean up later...]

No comments:

Post a Comment