In a series of posts this year, I plan to write on how I led a transformation of a technology platform and engineering team - and delivered results in scaling to 10X+ growth on KPIs such as user-and-device growth, user engagement, enhanced personalisation & content discovery, reduced platform instability by increasing availability from 97 to 99%; created a 20X+ reduction in core operating costs (saving R100m+) and simultaneously built a scalable leadership team to take over. All in 3.5 years.
This draws on my professional work experience from March 2017 - October 2020, when I spent my time as CTO (Chief Technology Officer) of an OVP (Online Video Platform) for Africa's largest VE (Video Entertainment) provider. In this short period my team delivered end-to-end transformation (not just software development) that set up the IT/Engineering to scale for future growth. I also left a scalable leadership pipeline in place which allowed me to comfortably transition to my next role outside of video systems by leaving a technology roadmap delivery plan & sustainable processes in place for at least another 2 years. Since my departure, I remain in contact with the team who continue not only to thank me for the roadmap but also for the opportunities I helped create to grow their own careers as leaders, who are themselves on track to become CTOs & CIOs as well.
Context
Our Tech Stack that we evolved to |
Technical & Leadership Challenges
- How do I bring structure into a chaotic engineering team, stabilise the platform and simultaneously scale to 10X++ MAUs, double user engagement, with a small team and still deliver a comparable experience like Netflix?
- How do I do this without micromanaging my team and build self-confidence in their engineering ability, despite being considered not “world-class” by sceptics (underdogs - I hate it when (international) folks doubt local African engineering talent - rant for another day)?
- How to change a slow corporate processes to innovating fast and taking chances to migrate to cloud - without having all the answers yet and tick all the due-diligence questions?
- How to maintain customer excellence when the stakes are high, and when chips are down, how to gracefully deal with a crisis (almost every weekend)?
- How do I build credibility back when the team’s reputation has taken a hit?
- How to maintain some level of platform stability where known bottlenecks exist that will cause an outage under high load or peak demand like a Black Friday event?
- How to decide whether the existing platform needs a full rewrite or be replaced by COTS or whether it is safe to continue existing platform through technical-debt re-engineering?
- How do you transform a "POC" prototype into an industrial-strength video platform that can stand the likes of the giants?
- How do I transform a mindset of an engineering team and change the culture to a high performing, consciously customer-focused operations/security devsecops focus?
- How do I transition the org to a devops model by changing mindsets, introducing habits and behaviours, then tooling and ultimately transforming team process and composition?
- How do I create a technology vision & long term view that's credible & a practical roadmap for delivery?
- How do I repair dysfunctional relationships between product-and-engineering teams where trust is low, cynicism high, unhealthy partnership?
- How do manage business and marketing operations, building sincere relations to work around constraints and limits of the platform (shared context)?
- How do I communicate the true state-of-the-platform to stakeholders without causing panic and have a credible action plan that instils confidence?
- How do I build a leadership team when headcount is frozen and we still need to deliver? How do I remove doubt from senior stakeholders about engineering skillset and competency of managers?
- How do I convince people about taking a chance on diverse individuals who may not be experienced or "qualified enough"?
- How to handle political conflict, fears and uncertainty when merging two tech platforms, create a vision for the future and manage communications with non-technical platform why "it won't work" scenarios.
- How do I leave a leadership succession pipeline in place seeing my goal was to leave after 3 years anyway?
Topics Covered
- Take time to understand current reality & publish to-be aspirations - get an independent opinion by way of a system audit & architecture review
- How I designed the technical org with a view for future organic transformations in mind. There's the ideal and then there's the "play the cards you're dealt", but never lose sight of the bigger picture.
- Being bold with data centre optimisations - down with data centres, embrace peering!
- First things first, stabilise your infrastructure and networking, build in redundant routing
- Sometimes you need to take a step backwards to move two steps forwards - enhance platform software for multiple datacentres. First principles matter, a lot.
- Stabilize hosting infrastructure and step change your virtualisation model - move to containers
- Embrace cloud content delivery networking (CDNs) - don't build your own, partner. Introduce multi-CDN strategy, experiment, iterate.
- Have the leadership will to experiment, fail and re-invent - leverage experimentation with partners. Change the paradigm (cache-at-edge, build safe modes, borrow from metaphors from other domains).
- Choose your cloud partners wisely but don't over analyse to death. Move fast, choose an appropriate cloud platform that aligns to your culture.
- Factors I consider important for partnerships
- Going cloud native - out with the old, in with the new, skip lift-and-shift altogether - content discovery & personalisation
- Keeping the bean counters happy - Introduce econometrics - how I introduced a platform economics model for Total Cost of Ownership, cost per user, cost of networking per user to prove the logic of transformation. Speak the language of finance to finance people. A CTO without financial intelligence is limiting.
- How to improve technical operations, introducing habits, discipline, accountability & ownership.
- Nurturing a culture of automated testing, continuous testing on production, load & performance testing, chaos testing, failover & disaster recovery testing. Testing is a job not just for QA department.
- Managing large-scale events, scaling for peak usage & crisis management when things go wrong
- Leveraging off-the-shelf capabilities - don't reinvent the wheel unless you have no other option.
- Willingness to learn from mistakes, taking responsibility for outages & setting direction on way forward - leadership.
- Know your metrics - how to go from zero to 100+ (and perhaps overkill) in creating a command centre for monitoring issues in real-time. Why every technology platform needs its own Platform Intelligence Portal outside of Business Intelligence reports or a Data Warehouse team. Engineering teams must own their platform metrics, don't pass responsibility over-the-wall.
- How to build a succession pipeline in place such that exiting CTO position behind is ready to be filled by your number one?
Results Delivered
- Inherited technology stack with weak foundational architecture, massive technical debt causing instability
- Successfully turned around a previously distressed, dysfunctional group, bringing clarity of focus & team cohesion
- Redesigned the organisational structure & technical platform strategy aligned to future business growth
- Introduced new leadership roles, set the vision, mission & objectives for the division thus bringing order to chaos
- Improved relationships and working agreements with core customers by agreeing SLAs & metrics for performance
- Improved platform availability from 97-98% (7-10 days yearly outage) to 99.8-99.9% (8.76-17 hours)
- Scaled the platform to 10X+ on monthly active users, total active users/devices
- Increased engagement at least 2X year-on-year
- Increased network throughput 10X breaking internet streaming records for Africa, up to 800Gbps on one event
- Expanded application device footprint 4X (introduced smart TVs, game consoles, new set top boxes, IP-only STB)
- Enhanced project and release management processes, ways-of-working improved resulting in on-time completion of projects
- Transitioned teams from textbook agile scrum methods to more fluid, generalised project execution tracks improving flexibility
- Introduced Google’s Software Engineer in Testing capability, driving automation & completely removed manual testing
- Instilled a sense-of-ownership and accountability for monitoring operations by upskilling & training programs
- Celebrated people by winning group-worldwide recognition awards: Innovation in AI/ML, Test automation & App development
- Improved overall people engagement of division scoring highest management metric as measured by OfficeVibe tool
- Created architectural platform vision for the new technology stack thus managing expectations of journey timelines
- Delivered on key project & product roadmap items meeting business objectives for all fiscal years to date
- Introduced new operational behaviours & mindset changes, dress rehearsals, large event preparation management processes
- Introduced new SOPs albeit manual, that resulted in improving platform stability & uptime, reduction in incidents raised
- Transformed software engineering processes to full stack, paving the way for DevSecOps, removing silos across teams
- Enabled development of in-house platform monitoring and intelligent dashboard tools improving NOC, 24/7 & command centre
- Enhanced testing coverage driving load, performance & scalability testing, automated on production during off-peak times
- Closed security gaps in architecture/implementation that reduced risk of revenue leakage & subscription management
- Catalysed change in thinking in managing total cost of ownership: buy versus build, partner more, improving focus
- Steered group's transformation to cloud services, improving stability & uptime e.g. Microsoft Azure Media Services delivery, AWS S3/CloudFront, Lambda, AWS Shield, R53, etc.
- Reduced costs by up to 70% on core infrastructure, saving group in the order of R100m over three years
- Resolved legacy technology components and services by offloading, de-supporting and transferring to other divisions
- Demonstrated cloud innovation in streaming by enabling Africa’s first 4K/UHD streams over Azure media services for FIFA’18
- Scaled platform capability doubling YoY to 10X+ MAUs with 50%+ increased engagement meeting KPIs
- Introduced cloud transformation journey roadmap of at least 3 years: multi-DC app hosting, microservices, containers to AWS
- Managed a large budget excess of $30m, operating largely within budget as well as delivering cost savings KPIs
- Created technical & financial model for forecasting platform costs, introducing cost per user economics never seen before
- Introduced multi-CDN partnering capabilities for redundancy, cost & improving risk management for disaster recovery scenarios
- Reduced total cost of ownership by moving away from in-house built purpose systems to partnerships & leveraging cloud
- Instigated enterprise technology transformation by optimising and transferring people/skills to other departments
- Made difficult technology decisions that negatively impacted team morale but nevertheless maintained best business interests
- Deftly & tactfully negotiated contracts resulting in better relationships, support plans & drove additional cost savings
- Grew & improved relationships with vendors to be partner focused, thus enjoying symbiotic relationships than before
- Turned around governance for both internal and external audits, resulting in Green report within 12 months of a red report
- Improved monitoring & execution of antipiracy initiatives by developing tools that enabled quick takedowns of pirate streams
- Shaped the strategic direction of technical platform consolidation, at the risk of disrupting the status quo
- Key contributor to enterprise-wide governance & steering councils: Risk, Governance, Architecture, Procurement & Cloud
- Drove cross-group consolidation of competencies, tools & technologies promoting reuse and improving synergies
- Drove modern application development methods like feature flagging, A/B testing & application logging as well as common cross-platform application development framework on React.JS
- Transformed the org into DevOps path introducing key concepts of "Mission Control", operational excellence, dashboarding, tooling, large event planning, crisis management runbooks etc.
No comments:
Post a Comment