Friday, 5 September 2014

Hit Squads as a bridge to agility

If you've read some of my previous posts, you will know that I write mostly about software projects in the world of digital TV set top box (STB), broadcast headend systems, including internet TV, Video-On-Demand (VOD) and over-the-top (OTT) projects. There is a tremendous amount of software running in these components, end-to-end, from the STB device (which in itself is a complicated system), to the headend/backend server-side components.

The development teams are usually not under one roof, are less likely to come from single suppliers, with their own methods of working, their own release / test cycles. It is difficult, but not impossible, to establish a regular cadence for the overall delivery stream. Some teams may be following agile/scrum, without continuous delivery - and others prefer to work in a more staged, requirements-up-front / development / test / integration cycles.

There are cases however, where a large part of the development, test, integration & delivery teams are in-house, but segmented by classic functional organisation structures that result in silo-based mentality. I've seen this in a few places, especially when for example, PayTV operators take ownership for product development in-house. The application development team for instance may be following scrum/agile, other teams however, don't.

The situation is almost always the same in such projects: Driven by a Hard deadline. Typical development cycles until feature complete. Enter test/integration cycle (this is usually the first time you know the true status of all the key features and functional areas - expect trouble). You find out there's issues, you're not even close to being feature complete.

The deadline isn't going to move and you've eaten up your development schedule (you're now into the time to stabilise and in what should be surgical mode). Sequential processes with silo'ed teams are a hindrance - you need a quick way to uncover issues, resolve them quickly, providing quick feedback into the project stream. There's pockets of agile/scrum in some teams, but not everyone is convinced that is the way to go. You don't want to disrupt the teams completely yet at the same time promote a different style of working.

What can you use as a bridge-to-agility without getting into the whole methodology debate??

Enter the Hit Squad Team (or also known as Tiger Team) - a concept that I've used on more than one occasion. If managed well, the benefits of having cross-functional teams are obvious. The lessons you take from here are used in your next delivery project, likely to become the preferred choice of working. I've seen this transition take shape & became the norm, without having to religiously convert people to a new mindset - I've also learnt it aint that easy. As many more before me have warned, what worked for one organisation isn't necessarily going to work for others. Still though, if you're operating in a similar technology space or problem domain like STB development, it wouldn't hurt to try this out...

What's a Hit Squad then?
A Hit Squad Team is a fully autonomous self functioning team, that is set up on-the-fly to help resolve issues, both known & unknown, in an accelerated time frame, rescuing the project from technical failure. It usually consists of a cross-functional team of seasoned technical people that thrive on high pressure situations. The gist of it is: We have a problem, put a team together, help us figure it out, doing whatever it takes to solve it. 

The members of this team may not necessarily even have been a core part of the development/integration project, or possibly don't even own any development components, but a known to be subject matter experts, with a great history of getting stuff done. This is the extreme case of course, what you actually want, is people from various parts of the project come together and fight a common cause. What transpires is fascinating, barring all the us/them/silo/competitor mentality, "not my code / broke my design" syndromes - because the Tiger team is expected to not only find problems known & unknown, but also expected to hack fixes together, sometimes altering the design to solve the issue at hand, prove the change works, and recommends the fix back into the component development team. When people put down their shields, the knowledge-transfers just flows, a sense of camaraderie develops, and respect is earned...

This is not just a team of technical people with no people skills or discipline in managing communications or managing expectations. Some admin & control will be required, leading towards defining some sense of structure. It takes the form shown in this picture:

How does it Work?
  • Similar to an ORIT (One Roof Integration Team), but more controlled and extreme-focus to detail, management & control: Daily tracking, Weekly Release / Build Planning
  • Cross-functional Teams are setup for each key area (called “Hit Squad Team” or HST)
  • Each HST is led by a Squadron Lead (SL) who has deep domain knowledge
    • The SL is a senior person who leads the team in day-to-day workload, priorities, removes blockages, escalates upwards, etc.
  • HSTs are Centrally controlled by a Squadron Commander (CSC)
    • Technical Launch Director / Delivery Owner  / Chief Architect / Chief Product Owner.
  • HSTs tracking and co-ordination done by Hit Squad Planning Officer (HSPO
    • This is usually a Release Manager / Project Manager
  • No more than Nine people in a team, including Squad Lead (small & lean)
    • Call on external service teams like Automation/Infrastructure/SysAdmin, if needed – as needs basis
  • Each Team has its own Backlog agreed between SL & CSC to burn through
    • New issues that surface are prioritised and feedback into the backlog
  • All Teams work off the same build and release timeline (NO Branches - zero)
  • HSPO tracks daily working with Inventory (SI Build Manager)
    • Availability of fixes, component release pipeline, Weekly Release Build Contents (Bill of Materials)
What makes it Work?
  • Central Control/Technical Authority removes ambiguity and sets overall focus and direction
  • Common Heartbeat of Release Cycles maintained by one Build Manager (System Integration)
  • All Hit Squads aligned to common heartbeat
  • Hit Squads are fully functional, self-contained units
    • Backlog is clearly managed by Squad Lead
    • New issues are prioritised daily
    • Ownership and accountability delegated / empowered to Squad Lead
    • Squad takes ownership of fixes
    • Squad puts their own builds together, however many is required (they can branch)
    • Fixes are consolidated and rationalised
    • Fixes are delivered back to mainstream build
    • Fixes must make delivery time table, managed and co-ordinated by Release Manager with Build Master
    • SI Triage, fast-tracking, investigations are done inside the squad
    • Fixes are made in near real-time and fed back to the squad
  • Other teams continue as normal
    • Main project QA/Test cycles tick along in the background
    • Customer Testing continues as before
Isn't this too Disruptive?
  • No, natural part of the Delivery / Operational Stage
    • Development teams must adapt to operations mode, when a delivery is imminent, the plan-estimate-do-feedback-check-capacity-loop doesn't work too well
    • Fixes must flow smoothly into release trains 
  • Iterations/Sprints can still be maintained, although the frequency might change
  • Does NOT disrupt Scrum/Agile process, except:
    • Teams will have to lend people to hit squads
    • Hit Squads may need support from
      • Scrum Masters could facilitate Scrum of Scrums, Team Backlogs, etc.
      • Build and Release Management from SI needs tight control & co-ordination
      • Might have to temporarily re-purpose Scrum Masters
  • Future strategic streams can still progress in parallel if capacity permits
  • Must accept that no single entity like a Product Manager can own all streams, need to work together with subject matter experts who usually take the lead on hit squads
Surely there must be challenges with this approach?
Indeed there are challenges aplenty. Hit squads require an elite task force, assumes a high level of competency. Finding people with the right amount of skill-set, experience and personalities can be difficult. 

For example, I recently requested one of my project streams to setup a HST for networking stability, my requirements (based on my own development experience) to the team were as follows, excludes the people-behaviour elements are shown below. It proved quite difficult to find people.
  • Must have coding experience with networking
    • Native C/C++ with BSD sockets API at driver level as well as application level (preferably linux sockets, but Windows Sockets API would be fine as well)
    • Must have written networking code covering client/server, streaming, keep-alive, heartbeats
    • Developed Network manager-like components from scratch: HTTP client / server stack, UDP streaming client, etc.
    • Must understand the flow from kernel space-to-user space, including DMA transfers, mem copies across the stack (generally impacts performance), etc.
    • Understanding of set top box implementation compared with generic network stacks in PCs
  • Familiar with how Java abstracts networking APIs from JVM SDK to lower layers, and associated logging that comes with it
    • At least 3 years coding experience with the above topics
  • In terms of tools / skills / knowledge:
  • Good understanding of protocols: TCP/IP, UDP, HTTP, multicast, CIDR, Firewalls, Routing
    • Wireshark – how to use, write plugins to filter out for specific traffic / headers / payloads
    • Microsoft net monitor
    • Netstat, ipconfig/ifconfig icmp tables
    • Load balancers, apache web server, Microsoft IIS, nginx, varnish, proxies, reverse proxies, ACLs, TCPview
Depending on the organisation and profile of the project, extreme measures can be taken by sourcing people from outside your immediate project, department, or even country. If the stakes are high, a solution will be found. I was once in a project that was essentially one big HST from day one of the project. We sent out alerts to all departments, looking for people with specific skills, seconded them to the project, then disbanded the team when the project delivered.

In other instances, some companies have specialists floating around, acting as if on-call to support emergency requests for hit squads.

Does this thing really work or add value?
I wouldn't spend my own personal time writing about this if I didn't think it was a useful concept. To the people that have got agile/scrum working-to-the-tee, this HST is so obvious that it's considered default, nothing special. In the context of STB projects that have yet to make the leap to a fully cross-functional team domain, the HST concept could be experimented with. When there's extreme pressures for delivery, it calls for extreme measures.

I seeded this concept on a past project (see here), at one stage we had close to 30 HSTs running in parallel, massive drive to recover the project to hit the delivery deadline: it worked, it stuck, and then triggered subsequent projects to plan time for staged integration / HST efforts before critical milestones...

No comments:

Post a Comment