Showing posts with label Work. Show all posts

Saturday, 14 September 2024

How I used ChatGPT o1-preview to build a rough interactive software engineering resource forecasting model

Seed Prompt for tool (click here to play with it): I am a software engineering manager. A large part of my job is understanding the engineering resource utilization for my engineering team. I manage multiple engineering teams of varying sizes from 5 people to 20 people in a team. When estimating software projects, understanding availability of productive engineers is important for planning purposes. Help me build an interactive resource forecasting tool, that can be fully self-contained within browser app, embedded in a google blogspot page. The tool must include inputs for: 1\ Funded team size; 2\ Current Available engineers (i.e. actual current size of team). The tool must account for the time it takes for hiring new engineers, ramp-up time, learning, etc. before they can be deemed as productive for committing to project work. The tool must show a table the resource plan over 52 weeks in the year, and include a visual graph to show how the team size grows to meet funded size. Research the best methods in software engineering management, and create the tool that can help software managers have a good model for resource utilization. The tool can store information in the local user browser storage as needed. Remember, I need the full code to be fully self-contained so I can run the tool from within a google blogspot page.

Prompt for this article: Great I think we have good baseline version to start with. Now using your PhD capabilities, write up an accompanying blog post that details the model, the thinking behind it, and call out the limitations using a critical eye, suggesting improvements for the next version.

Prompt for Image using Dall-E: "An isometric illustration of a diverse engineering team collaborating in a modern office space. They are gathered around a large table filled with charts, graphs, and laptops, discussing and planning. Behind them, a wall-sized calendar and a whiteboard display timelines, Gantt charts, and upward-trending graphs. Include elements like gears and clock icons subtly integrated into the design to symbolize engineering and time management. Use a professional, clean art style with vibrant yet muted colors to convey a serious but engaging atmosphere.

Engineering Resource Forecasting: A Comprehensive Model for Team Planning

By ChatGPT o1-Preview (took about 4 hours to get the tool to version 1)

Effective resource planning is crucial for engineering teams aiming to meet project deadlines, optimize productivity, and manage budgets efficiently. The Engineering Resource Forecasting Tool is designed to help managers and team leads forecast their engineering resources over a year, considering various real-world factors such as hiring rates, attrition, ramp-up times, efficiency, leave days, and public holidays.

In this blog post, we'll delve into the model behind the tool, explore the thinking that shaped its development, critically analyze its limitations, and suggest improvements for future iterations.

Understanding the Model

1. Objectives of the Tool

The primary goals of the Engineering Resource Forecasting Tool are to:

Forecast Team Size: Predict how the engineering team will grow or shrink over time.
Calculate Effective Capacity: Determine the actual productive capacity of the team, considering factors that reduce availability.
Assist in Planning: Provide insights for hiring strategies, resource allocation, and project scheduling.

2. Key Components of the Model

The model incorporates several essential elements:

Funded Team Size: The maximum number of engineers that the budget allows.
Current Available Engineers: The starting point for the forecasting, representing the existing team.
Hiring Rate and Time: The rate at which new engineers are hired and the average time it takes for them to join.
Ramp-Up Time: The period new hires require to reach full productivity.
Attrition Rate: The expected percentage of engineers who will leave the team annually.
Efficiency Factor: The proportion of time engineers spend on productive work.
Leave Days and Public Holidays: Time when engineers are not available due to vacations and holidays.

3. Model Calculations

a. Effective Engineers

The number of effective engineers is calculated weekly by adjusting the total number of fully ramped-up engineers for efficiency and availability:

Effective Engineers = Total Ramped-Up Engineers × Efficiency Factor × (Weekly Available Days / 5)

b. Weekly Available Days

Engineers aren't available all five working days each week due to leave and public holidays:

Weekly Available Days = 5 - ((Annual Leave Days + Public Holidays) / 52)

c. Attrition Modeling

Attrition reduces the team size over time:

Weekly Attrition Rate:

Weekly Attrition Rate = Annual Attrition Rate / 52

Cumulative Attrition: The total number of engineers who have left the team up to a given week.

d. Hiring and Ramp-Up Process

Hiring Pipeline: Represents engineers who have accepted offers but haven't started yet.
Ramping Engineers: New hires who are in the process of ramping up to full productivity.

e. SDE-Weeks and SDE-Days

SDE-Weeks: Sum of effective engineer contributions per week within a month.
SDE-Days: Sum of effective engineer contributions per day within a month.

These metrics provide a more granular view of the team's productive capacity.

Thinking Behind the Model

1. Realistic Representation

The model aims to mirror real-world scenarios by considering:

Time Delays: Hiring and ramp-up times introduce delays between initiating a hire and gaining full productivity.
Non-Productive Time: Leave, holidays, and non-productive activities reduce actual working time.
Attrition Effects: Engineers may leave the team, impacting capacity and necessitating replacement hires.

2. Balancing Simplicity and Complexity

While the model incorporates multiple factors, it maintains usability by:

Using Average Rates: Distributes leave, holidays, and attrition evenly over the year for simplicity.
Assuming Linear Processes: Hiring and ramp-up are modeled as linear processes without accounting for fluctuations.

3. Providing Actionable Insights

By visualizing the forecast through charts and tables, the tool helps managers:

Identify Hiring Needs: Determine the hiring rate required to meet team size goals.
Plan for Attrition: Anticipate capacity reductions due to attrition.
Adjust Strategies: Modify inputs to explore different scenarios and their outcomes.

Limitations and Critical Analysis

Despite its usefulness, the model has several limitations:

1. Uniform Distribution Assumptions

Leave and Holidays: The model assumes that leave days and public holidays are evenly distributed throughout the year. In reality, leave may cluster around certain periods (e.g., summer vacations, year-end holidays), leading to temporary drops in capacity.
Attrition Timing: Attrition is modeled as a steady rate, but actual departures may occur unpredictably and in clusters.

2. Lack of Granular Ramp-Up Modeling

Binary Productivity: The model assumes engineers are either ramping up or fully productive after the ramp-up period. It doesn't account for gradual increases in productivity during ramp-up.
Individual Variations: All new hires are treated identically in terms of ramp-up time and efficiency, ignoring individual differences.

3. Hiring Constraints and Real-World Challenges

Hiring Rate Feasibility: The model calculates the hiring rate needed to meet targets but doesn't consider the practical challenges of achieving that rate, such as market conditions or recruitment resource limitations.
Candidate Pipeline Variability: It doesn't account for fluctuations in candidate availability or acceptance rates.

4. Simplified Efficiency Factor

Static Efficiency: The efficiency factor is constant over time, not accounting for potential improvements or declines due to training, process changes, or team dynamics.
Team Composition: It doesn't differentiate between engineers with different skill levels or specializations.

5. Absence of Project Dependencies

No Task Modeling: The model focuses on resource capacity without linking it to specific project tasks or timelines.
Overtime and Workload Peaks: It doesn't account for periods when engineers might work overtime or when workload exceeds capacity.

6. Economic and External Factors

Budget Changes: The funded team size is assumed constant, ignoring potential budget cuts or increases.
External Events: Unforeseen events like economic downturns, pandemics, or natural disasters are not modeled.

Suggestions for Improvement

To enhance the model's accuracy and applicability, future versions could consider:

1. Incorporating Seasonal Variations

Variable Leave and Holiday Distribution: Model leave and public holidays based on historical patterns to reflect periods of low and high availability.
Attrition Peaks: Adjust attrition rates to account for known periods of higher turnover.

2. Enhanced Ramp-Up Modeling

Gradual Productivity Increase: Implement a ramp-up curve where productivity increases incrementally over the ramp-up period.
Individualized Ramp-Up: Allow different ramp-up times for different roles or experience levels.

3. Realistic Hiring Constraints

Recruitment Capacity Limits: Introduce maximum hiring rates based on recruitment team capacity.
Candidate Pipeline Dynamics: Factor in time-to-fill variations, candidate drop-off rates, and competition in the job market.

4. Dynamic Efficiency Factor

Efficiency Over Time: Allow the efficiency factor to change over time, reflecting process improvements or team fatigue.
Role-Based Efficiency: Differentiate efficiency factors for various roles or seniority levels.

5. Integration with Project Planning

Task-Level Planning: Link resource capacity to project tasks and milestones to assess whether capacity meets project demands.
Risk Modeling: Incorporate risks and uncertainties that could impact timelines and capacity.

6. Economic Scenario Planning

Budget Flexibility: Allow for adjustments in the funded team size based on potential budget changes.
External Shocks: Introduce scenarios for external events that could affect hiring, attrition, or productivity.

7. User Input Flexibility

Custom Time Frames: Enable users to adjust the time frame of the forecast beyond a single year.
Data Import: Allow users to import historical data to better tailor the model to their organization's patterns.

Conclusion

The Engineering Resource Forecasting Tool provides a solid foundation for planning and visualizing engineering team capacity over time. By accounting for critical factors like hiring, ramp-up, attrition, efficiency, leave, and holidays, it offers valuable insights for managers.

However, like any model, it simplifies complex real-world dynamics to remain practical and user-friendly. Recognizing its limitations is essential for interpreting the results accurately and making informed decisions. By incorporating suggested improvements, future versions can offer even more robust and precise forecasting, aiding organizations in navigating the challenges of resource planning.

About the Author

[Your Name] holds a PhD in [Your Field] and has extensive experience in project management, resource planning, and organizational development. Passionate about leveraging data-driven models to solve practical problems, [Your Name] contributes to advancing tools and methodologies that enhance operational efficiency.

Note: The Engineering Resource Forecasting Tool is intended as a planning aid and should be used in conjunction with professional judgment and other planning tools. Always consider your organization's specific circumstances when making staffing decisions.

Saturday, 1 June 2024

The Hawthorne Effect, a short story

So I was going through my Trello board today, catching up on my growing list of todos, things to learn and be curious, explore (from years ago), and tackled this card "Check out Hawthorne effect". As usual, Googling takes you to wikipedia to read... I went to chatGPT and instead prompted it with the following...:

Explain the Hawthorne effect through a short-story, framing the narrative in a style, similar to Goldratt's book "The Goal". Limit the short-story to no longer than two A4 pages

The Workshop Challenge

Chapter 1: A Problem Arises

Alex Raines, the plant manager of Evergreen Manufacturing, walked into his office to find a stack of reports on his desk. The company had been struggling with productivity issues for months. Despite numerous initiatives and new policies, the output remained stagnant. Frustration mounted as Alex tried to pinpoint the problem.

Chapter 2: A Visit from HQ

One afternoon, an unexpected visit from Emma Collins, a senior executive from headquarters, threw Alex off his routine. Emma, known for her innovative problem-solving skills, had a knack for seeing through complexities.

“Alex, I’ve been reviewing the numbers,” Emma began, taking a seat in his office. “We need to figure out why the productivity is lagging. Let’s start by observing the workshop directly.”

Together, they walked to the workshop floor, where the hum of machinery mixed with the chatter of workers. Emma suggested they conduct a series of interviews and observe the workflow discreetly.

Chapter 3: The Observation Begins

Emma and Alex set up a temporary observation station in a corner of the workshop. Over the next week, they meticulously noted the workers’ behaviors, machine efficiency, and workflow patterns. To Alex’s surprise, there was a noticeable uptick in productivity during this period.

“I don’t get it,” Alex said, scratching his head. “The only change we made was being present here.”

Emma smiled knowingly. “That, my friend, is the Hawthorne effect.”

Chapter 4: Unveiling the Mystery

Emma explained, “In the 1920s and 30s, the Western Electric Company conducted a series of experiments at their Hawthorne Works plant. They discovered that workers’ productivity improved when they were being observed. It’s a psychological phenomenon where individuals alter their behavior due to the awareness of being watched.”

Alex pondered this revelation. “So, our presence here made the workers more conscious and thus more productive?”

“Exactly,” Emma confirmed. “But this effect is usually temporary. The key is to understand why the workers felt the need to change their behavior and how we can sustain productivity improvements without constant observation.”

Chapter 5: Digging Deeper

Determined to find a long-term solution, Alex and Emma conducted anonymous surveys and held informal discussions with the workers. They discovered several underlying issues: lack of clear communication, inadequate recognition of hard work, and an outdated incentive system.

Emma suggested a few strategic changes:

1. Improve Communication: Implement regular team meetings to discuss goals and progress.

2. Recognize Achievements: Introduce a recognition program to celebrate workers’ contributions.

3. Revamp Incentives: Align the incentive system with productivity and quality metrics.

Chapter 6: Implementing Change

With a clear action plan, Alex rolled out the changes. They started with weekly team huddles, where workers shared updates and suggestions. A “Star of the Week” program was introduced, celebrating top performers. The incentive structure was reworked to reward not just quantity but also the quality of output.

Chapter 7: Sustained Improvement

Over the next few months, the changes bore fruit. The workshop buzzed with newfound energy. Productivity rose steadily, and the workers felt more engaged and valued.

During a follow-up visit, Emma observed the transformed environment with satisfaction. “This is the real impact we were aiming for, Alex. By addressing the core issues and fostering a supportive environment, you’ve achieved sustainable improvement.”

Alex nodded, grateful for the insights. “It all started with the Hawthorne effect, but we’ve turned it into something lasting.”

Epilogue: Reflecting on the Journey

Alex reflected on the journey. The initial spike in productivity from mere observation had been a catalyst. By addressing the root causes and implementing meaningful changes, they had created a culture of continuous improvement. The Hawthorne effect was just the beginning—a lesson in understanding human behavior and leveraging it for lasting success.

In this narrative, the short-lived productivity boost due to observation, known as the Hawthorne effect, served as a springboard for deeper changes that ultimately led to long-term improvements in the workplace. The story mirrors the style of Goldratt’s “The Goal,” emphasizing problem-solving and continuous improvement.

Wednesday, 24 April 2024

Advice for Technology Executives eyeing out AWS L7 Snr Manager role

As I write this post, my phonetool (a cool internal service where employees can lookup people) says I've been working at Amazon for 3 years, 3 months and 13 days (it actually feels much longer than that!!).

I thought it might be a good idea to provide some advice for people who are considering applying for L7 Senior Software Engineering Manager roles at Amazon, even though I am just a sample of one. Amazon is a huge, gigantic, complex entity - so by no means is my experience, advice is representative of the company...rather I'm sharing my very own personal experience, aimed mostly at South African professionals who might be considering interviewing for senior management roles -- and are curious to learn from others who walked the path before. We are so lucky to have a global high-tech giant like Amazon operating in South Africa, offering engineering opportunities. I was even contemplating relocating back to the UK in 2020, but decided to stick it out in this beautiful country, try Amazon and live & work from Cape Town...No other global hi-tech company, apart from Amazon, offers software engineering roles in South Africa as far as I'm aware.

If you're considering leaving a C-level or Executive Head position behind, coming from a South African corporate, and are now eyeing out senior engineering manager roles in Amazon/AWS locally or internationally, then this post might just be of use to you.

DISCLAIMER: I'm sharing my own personal experiences and recommendations in the hope my story could help others make an informed decision. Yes, I'm currently employed at AWS as I write this, and so far enjoyed my journey of transition, pretty settled in now as an Amazonian, despite a bit of a rough start of complete change/disruption to both professional and life streams.

My experience going into AWS L7 Senior Manager Role

Before Amazon, from May 2017 to October 2020, I was the single threaded owner (STO aka CTO) for a video streaming platform serving 50+ countries in Africa. Reporting to me were a group comprising executive head of departments and senior managers responsible for:

1\ Software Delivery & Testing (Backend + Apps running on Smart TVs, Set Top Boxes, Browser Web Apps (Chrome, IE, Firefox, Safari), iOS and Android mobile phones and tablets, Apple TVs, Android TVs. In this org, reporting to the senior manager were: 3x Software Dev Managers (Frontend + Backend - 10 dev teams), 1x Test/QA Manager.

2\ Platform, Networking & Video Infrastructure, DevOps: Data Centres in Johannesburg, Cape Town, Durban, Nigeria, London, Ghana. AWS & Azure cloud. CDN infrastructure. Networking backbone, internet transit infrastructure. Devops. Application Web hosting.

3\ Site Reliability Engineering and 24/7/36 Platform/Application Operations monitoring, 2nd and 3rd line technical support team.

4\ Platform Intelligence Health Dashboards, Alerting & Analytics, Content Discovery, Search and Recommendations. An AI/ML team responsible for building video and content discovery services.This team was also responsible for building Infra/Platform/App Health Analytics consoles.

5\ Enterprise & Solution Architecture. A group of architects cutting across software architecture, video streaming, encoding and media processing platform architects. Live streaming video headends, CDN architecture.

6\ Agile Program and Project Management Office. A group of cross-functional project management experts, including program managers, release managers, scrum masters and general program management.

7\ Cyber Security, Piracy and Governance - risk management, legal, compliance and regulatory ownership. Overall IT/Security and Compliance.

I owned all the above (I designed the org structure around cloud modernization), reporting directly to CEO and was part of a the Top Leadership team. I had direct P&L responsibilities in the order of billions of rands, having authority to manage commercial relationships with technology vendors, suppliers, etc. signing licensing agreements, SOPs, reviewing RFPs,etc. I serviced business, technology and customer operations (marketing, content, acquisition, HR tech, customer support). Under my leadership, the platform scaled to 10X scale, paved the way for a cloud-first future and created NextGen AI streams for personalized content discovery. I had delivered significant cost-savings, created a cloud-first transformation strategy and clear roadmap and built a strong leadership succession plan before leaving. I had a team of 200+ people, which extended to partner contractors & professional services. I decided to leave and was attracted to Amazon because of my exposure to AWS as an enterprise customer, I wanted to learn about what happens in the engine room of AWS.

Between 2013 and May 2017, before becoming a CTO, I was a freelance senior management consultant, taking on program director contracts where I worked with C-level stakeholders as my primary customer, running very large enterprise programs that cut-across multiple lines of business. Effectively having a virtual seat at the C-suite table, but without skin-in-the-game, helping executives deliver their top business, technology and product transformational goals. For these programs, I directed and executed large-scale program deliveries across the African continent, with project team sizes from 350-1000 people, most of my programs were tagged "billion rand programs" costing "a million rand a day". I enjoyed working 4-day weeks, took my time off according to my schedule, and worked just over 10 months a year, during that time period.

What made me leave executive roles behind and take a risk with a mid-senior management position? In this post, I shared a deep reflection that motivated for a change. I had set my target on a new domain, i.e. cloud services, an opportunity presented itself with AWS, I applied - and the rest is history. I had made a one-way door decision, there was no looking back - but I did have a bumpy start, which in retrospect, if I had access to these pointers beforehand, who knows...

Be the leader you wish you had

BE THE LEADER YOU WISH YOU HAD

I use this saying often in my 1:1s with my directs and in my private coaching sessions. It is a powerful way to make one pause for a moment, reflect, adjust to the discomfort, then embrace the excitement of a new energy that is created.

Adopting this mindset has transformed me from standard "manager" to empathetic "leader". Reading Seth Godin's "The Song of Significance" reinforced my instinctual leadership practices.

Quoting from "13. Let's Get Real or Let's Not Play", Seth says this:

<quote> No one goes to the gym to willingly get punched in the face by the senior vice president of boxing. But some folks eagerly pay for a sparring partner when it's time to get better. The difference is obvious, but we've forgotten to say it out aloud. No grades, no check marks, no badges. I'm not in charge of you, and I'm not manipulating you. I'm simply establishing the conditions for you to get to where you said you wanted to go. You tell me where you're going and what you need. You make promises about your commitment and skills development. I'll show up to illuminate, question, answer, spar with, and challenge you. I'll make sure you're part of a team of people who are ready to care as much as you do. We can get real. Or let's not play. </quote>

This is not some leadership mumbo jumbo. Some time ago, I developed a model for personal development that borrowed concepts from agile product management by way of user stories (search RAGE tag on this blog). I then used the same methods in the way I work with my direct reports. HR people might call this "contracting with the employee" but I take it further. I get real. It's not about objectives, KPIs & deliver results. I put myself on the line. I reach out. And so when it comes to performance reviews, my reviews are a two-way conversation. My direct also evaluates Mo's performance - because as a leader, I believe leaders mirror & contribute to the performance of their direct reports.

What's my mechanism then?

I ask each person to write a user story in this format:

In order for me, [Name] to do [XYZ] (e.g. my job | grow | be inspired | learn | etc.) I need my manager (Mo) to support me by doing [....insert your wish-list here] so that I can ....

So I start the year with level setting on our contracts together, and in our 1:1s, we check-in and inspect, comment, re-calibrate, adjust.

Guess what?

This mechanism might seem simple but it's quite challenging for people. Usually, it's the first time they're experiencing a manager doing it this way. There's hook both ways. Often, it takes a few iterations to get the user stories crafted in way that is mutually relatable and agreeable. My mechanism goes beyond the standard business SMART goal setting. I make it human. Real. Personal. For me, this is my song of significance.

Here's some real-world examples in play, from senior managers that report into me - See how doing so puts me, Mo, on the hook?

* In order for me to do my job, I need my manager (Mo) to support me by throwing me in the deep end and exposing me to as much as possible so that I can quickly learn and understand this business

* In order for me to do be inspired, I need my manager (Mo) to support me by leading by example so that I can learn from his vast experience

* In order for me to do grow, I need my manager (Mo) to support me by pushing me out my comfort zone so that I can grow in all directions.

* In order for me to do my job. I need my manager (Mo) to support me by throwing me in the deep end and exposing me to as much as possible so that I can quickly learn and understand this business

* In order for me to grow my skillset, I need my manager to support me in blocking out time on my calendar so I can complete the ‘make great hiring decisions’ course (5hrs)

* In order for me to get promoted to L7, I need my manager to support me by identifying key opportunities so that I can start building a roadmap of promotional milestones

Why I never ran a program without a Project Charter

Lessons on large-scale delivery program management ...

I continue to dig into my past artefacts to showcase my work portfolio. I'm using a multi-pronged approach here: 1\ Showcase my work to prospective employers; 2\ Openly share my work so that others (people I coach, my colleagues and boss, etc.) can benefit; and 3\ Act as my own living knowledge repository.

I spent a decade climbing up the project management ladder, in the same way I climbed up the software engineering ladder (from junior engineer to principal engineer) - I first started project managing small software product development (2-4 teams with 10 services), then scaled up to large middleware services (20+ teams, 50+ services) as lead delivery owner, then up to full stack systems integration (full stack of all major components: kernel, middleware, integration layers, applications), then program managed a full go-to-market product launch scaling out to including Tech, Business (Finance, Marketing, Supply Chain) & Operations (Customer Care, Retention, Content, Legal, Regulatory) - as senior program manager. I also owned the full plan of starting a business from scratch to launch (a video streaming company) in 8 months. I did a stint in management consulting, running the top 5 business projects for a $3 billion run-rate business, which some companies might call Tier 0/1 initiatives - where I co-ordinated these large-scale programs, as Chief Program Director - delivering through multiple business lines, multiple project management offices and multiple product and engineering teams. In a sense, I served as the CxO program manager, advisor and delivery owner.

It is with this experience and knowledge, that I dare to share about my work experiences - and I'm not making these things up - you can check my LinkedIn recommendations page for proof.

During my tenure as the lead program director mentioned above, I often found myself picking up and repairing distressed programs - and along the way, I'd help improve team processes and coach the management teams as well. I also ran new business & technology initiatives from scratch, start-to-finish-then-handover. So with this diverse experience, I developed a simple method that helped me navigate both types of program scenarios: either resetting or starting from scratch, the simple, powerful mechanism of a Project Charter document. To this day, I'm surprised to see many program & project managers failing to use the Project Charter in the way it was meant to be used (clarifying the essence), and often find less-experienced, newly minted PMP/Prince2 certified professionals, doing it "by the book". My approach to project charters went much deeper than that...

So what do I mean by using the Project Charter in "clarifying the essence" then?

A seasoned, experienced project leader, chief program director, end-to-end project manager, senior technical program manager, etc - call the roles what you like - in my view, is not about just putting a plan together, working backwards from a deadline or target completion date. No, I believe as senior program leaders must apply their minds to appreciate the bigger picture and create a program structure that becomes the north star in guiding and leading multiple delivery teams. I never started a program without first establishing my project charter, which at the top level, focuses on the following:

Start by understanding the why. Why is this program needed? Why is it important?
Move on to understanding the who. Who are the sponsors, stakeholders and teams impacted? Who will be working on the program? "First who, then what"

A program manager must be sufficiently well-versed with all the roles expected from the program, and work hard to secure the roles needed. Yes, this means the program manager must escalate to get the people needed for the program (on the bus, as well as off the bus). A responsible program manager would raise all these risks & concerns up-front, before officially kicking off the program.

Clarify the what, including calling out what's missing - Set up the mental model for the program. What is this program about? What is it not about? What's in scope? What's not in scope? What workstreams make up the program? How do all pieces come together?
Agree, Align, Action - The 3 As of project execution involve agreement on the deliverable, alignment of all parties involved which includes acceptance of their workstreams and ultimately agreeing on the action plan to execute.

Project charters don't necessarily have to be communicated in a written document, a slide deck is more than adequate to communicate the essence. Depending on the business environment and culture (for example, some business cultures prefer slide decks over detailed documents to save reading time, whilst others like Amazon, insist on detailed text narratives). So a seasoned project leader must adapt their style to suit the particular business need & culture of the teams.

In this post, I'm sharing a version of a Project Charter as a slide deck. In a future post, I will share a detailed 50+ page document project charter that involved the launch of a consumer electronics device, the program covered a mix of engineering, business and operations workstreams.

Example Program: Transform Digital Self-Service of a $3 billion run-rate business

I was called in to help reset and kickstart an overarching cross-cutting program to improve a selection of key metrics that would result in increased usage of digital self-service channels, improving customer satisfaction and overall reducing operational costs. This program covered the full value chain delivering the service: 3rd party technology vendors developing phone "mobi" apps using USSD, iOS/Android self-service app, Website, Payments, a hardware kiosk station, set-top box interactive application, integration with internal & 3rd party CRM/Billing systems - and resulting business workflows: finance, customer care & banking channels. Technology teams were spread between the CTO/CIO lines (3 IT pillars), and business teams reported separately to the CEO. The program also served the needs of Group Strategy, Risk & Regulatory. Bringing all these things together requires a steady hand, a tactful negotiator, a strategic and business mindset as well as a strong technology leader. This is why I enjoyed such challenges as these programs were never boring, limited to only tech/engineering.

Enough said, let the slides talk and let me know what you think in the comments!

Sunday, 9 April 2023

How I scaled engineering ops excellence to ±10X with Mission Control

Continuing with me sharing my experiences as CTO, in this post I share the actions I took to help improve an engineering organisation's operational health in our journey of scaling an online video streaming platform from 1X to 10X, from May 2017 to October 2020. To get to 10X improvement takes a journey, which I achieved in under 3 years, and after reaching the goal, I decided I'd learnt enough of the CTO experience and exited, after having set up a strong succession leadership pipeline in place.

To get an idea of some of the major themes that I tackled during this time, as a leader I had to lead from the front, back, left, right setting the direction of my managers to follow (as all of these interventions were new to them), whilst doing my best to respect what came before:

Establishing the team despite constant re-orgs going on at parent company - getting the right people in the right roles at the right time
Transforming a rag-tag undisciplined team to a disciplined, clear-headed, focused organised unit
Introducing laser focus on product engineering by unbundling non-core video apps to other businesses
Being critical on the technology platform by establishing a baseline of the architecture, using third party auditors to rate the scalability of the platform
Improving physical infrastructure: networking, compute, storage and data centres. Move away from self-hosted and self managed data centres to partnering, shutting down data centres as needed.
Build an industrial grade networking stack and leveraging modern peering facilities and overhauling the server infrastructure
Setting the roadmap for cloud by transitioning first from single region data centres, to multiple data centre deployments, to running multiple stacks simultaneously, introducing containers and microservices then finally getting ready for cloud and leaping first into serverless paradigms
Embracing cloud partnerships with big players: Akamai, Microsoft, AWS, etc.
Improving product and engineering delivery by revamping and overhauling the agile work processes and backlog management.
Introducing communications mechanisms that helped remove doubt and earned trust across the many different business units and teams (we were known as the online pirates doing their own thing)
Improving risk, governance and security - bringing it to the top, raising awareness
Creating strategic partnerships internally and externally to leverage skills and expertise I couldn't get in-house or afford to build or manage ourselves
Introduced technical operations controls - Mission Control, more active management of operations daily, 24/7 with increased focus, planning and prep for peak times, like weekends and major events planning.
Aggressively reducing costs on key platform components whilst capitalising on gains through economy of scale

In this post, I'm going to share some of the early context and interventions I introduced in my first 3 months on the job, that remain effective to this day, more than five years later: Mission Control Ops.

The dreaded 403 We're sorry, something went wrong

I took over a team that weren't prepared for the intense discipline needed to run and operate a highly available 24/7/365 platform. There were many reasons for this which I might touch on in another most. I recall coining the term "Bloody May" as the month of so many outages, that I wondered: 1\ What on earth have I taken on? 2\ Is my life going to be consumed by work from now on? 3\ Is there any hope for this platform? 4\ How am I going to turn this platform around? 5\ How much is the job worth to me?

It turned out there was going to be many more "Bloody Mays"in 2017 whilst my team set about improving stability. In 2017, the platform experienced outages that racked up about 20 days of downtime in one year. This equates to ± 95% availability, which is unacceptable for a video streaming platform. When I left the team in Oct 2020, we had turned around the platform to reaching 99.5% availability trending higher. Today, 5 years on, I'm told the availability is much higher but their usage profile has drastically changed (reduced the number of concurrent streams to one device only, also reduced their devices supported, moved most of the services to AWS).

An example of resource planning a 100 person technology team

Here's an example of how an engineering leader can go about managing headcount and resource planning to deliver on both the technology roadmap as well as business goals. For context, this is a resource plan for a 100-person size engineering organisation who had end-to-end responsibility for building and managing a live-streaming and video-on-demand online video platform "Delta".

I was the CTO responsible for managing everything, pretty much a CIO in my own right, overseeing: 1\ Physical Infrastructure services: Datacentres, Networking, Compute, Storage; 2\ Internet backbone transit links; 3\ Cloud integrations and peering; 4\ Telecoms integrations; 5\ Software development & testing; 6\ Video broadcast & streaming infrastructure; 7\ CDN management; 8\ Personalisation & Content Recommender systems development; 9\ Agile Program Office; 10\ Technical Operations Command Centre offering 24/7/365 first/second/3rd line support; 11\ Enterprise & Solutions Architecture; 12\ Security, Anti-piracy, Risk & Governance; 13\ Platform Intelligence Dashboards & Analytics; 14\ Vendors.

Whilst I had my own technology strategy to deliver for the underlying platform to serve the scaling needs of the business, as we were aggressively targeting a growth phase - my teams had to also continue to build out product features and enhancements for all devices we supported (sometimes the apps varying by user experience depending on the device itself, e.g. Smart TV versus iOS vs Web app), we also had to deliver new business product offerings and deliver across group-wide projects and initiatives as shared goals.

With these challenges, an engineering leader must have a firm handle on the "resource" allocation. Sorry, agile folks - I myself cringe at the mere mention of "resources" - but heck, since joining Amazon, I was amazed of how natural the term "resources" is used by management. There are two camps, people who belong to "I am NOT a resource" and "I am a resource"!!

Anyway, I digress. As a leader, you have to keep an eye on your workforce. Based on on your existing headcount allocated to you, how do you best organise your people and plan around the various delivery demands placed on your team? How do you show your stakeholders where your resources are allocated? How do you motivate for additional headcount or escalate a change of priorities if you don't have the data that accompanies your story? How do you show that although you might have 100 people under your org, the capacity for doing project work is only 80 or less, because you need to account for management, team leads, shared service work and other internal projects? See below pivot summary table:

As tedious as it might be, an engineering leader has to do some "management" work - no matter how boring it might feel. This stuff is important. The finance folks need this information. The business owners need to have a view of what impact their projects have on your teams.

A resource allocation plan can be a powerful tool - my advice - use it often, maintain it and let it become a key mechanism for you to have productive, collaborative conversations with your stakeholders.

Yes, I maintained my own list, accounting for each person in the org.

No, I don't always support the notion that people are fungible resources, can be chopped and changed or "allocated" using numbers on a spreadsheet. I also don't believe the fractional numbers help either, but they are meant to be rough allocations to help with accounting - which I'm afraid is unavoidable - part of any software managers job really.

Below picture shows what goes into headcount planning / forecasting for the year, trying to keep the headcount to 100 people due to the hiring freeze. Just imagine, a 100-person size team responsible for building the full infrastructure and consumer applications for a streaming video platform that served the whole African continent (50+ countries)! Amazon has the Frugality leadership principle - we were frugal on people but wise on partnerships, using another of Amazon's LP - Invent & Simplify.

As an experienced engineering leader, my advice to software managers: If you're not maintaining a resource allocation plan, you're missing a vital tool of software management. Be careful!!

Saturday, 18 March 2023

Visualising a technology roadmap and year plan on just one page

Here's something that other engineering leaders might find useful: Showing all the work under one's ownership, a technology roadmap on a single page that nicely prints to A3 paper.

In this example, I capture everything that was important to me as the leader responsible for what was at the time, one of Africa's largest online video streaming platform (Live TV + VOD). I was responsible for the full platform end-to-end:

Physical Platform Infrastructure and Networking - Data Centres (Compute, Storage, Networking)
Cloud Services commercials & workloads
Software Engineering - multi-platform, multi-device software development & testing
Enterprise & Solution Architects
Video Streaming Hardware & Software infrastructure integration & management
Recommendations & Content Discovery Engines - AI/ML scientists and engineering
Agile Program Management Office - Agile Specialists
Technical Operations & Integrations - Command Center & Mission Control 24/7/365 support
Security, Anti-piracy, Risk & Governance streams

How do I show on one page, the focus areas for the above teams in terms of our year plan, showcasing:

The themes that categorise the work so non-technical customers & stakeholders can understand
The cadence of releases for product feature delivery for product & marketing teams
The key KPIs our work drives - growth targets for monthly active users
The key events happening during the year that would put a strain on the platform load/stability
Show what the tech team will be producing month-on-month
The owners and points-of-contact for each work stream

So if you're a CTO or Head of Engineering responsible for an overarching technical platform, and you're not visualising your work in your own roadmap view, then IMHO, you're missing a trick. I used Excel to craft my roadmap because I was being scrappy - but it is possible to create such views if you invest in a solid product/program planning tool.

Click to enlarge: An example of a technology leader's roadmap / year plan

Thursday, 16 March 2023

How I led the turnaround of a tech platform from 1X to ±9X in just 3 years

In a previous post, I shared how I ended up transitioning from a program manager to being the leader of a large engineering organisation as CTO. I then followed up another post sharing my context in this leadership position that was beset with highly volatile, uncertain, complex and unambiguous challenges - the world of online video streaming. A dynamic, tough, competitive space, especially as Africa was only then waking up to the streaming video wars that started in 2015.

[Disclaimer: I write about my past work experiences, this post dating back to 2017-2020 referring to entities that no long exist today (in 2023). Previous mentions of such entities are widely in the public domain through news media outlets, press briefings, launch announcements, etc. I take time to ensure that nothing I share exposes commercially sensitive material. My intent is to share my learnings and experiences to teach and mentor people on a similar path to me; with the side benefit of showcasing my professional work portfolio to current and future prospective employers & head hunters, through my writing].

In my 3.5 years as CTO, I would see the business change executive leadership multiple times - I think by 2020, we'd been through at least six changes in executive leadership (CEO/Exco-heads). This meant that almost every year, I had a new boss, who's strategy shifted to suit their new vision as they aimed to leave their mark on the business goals of an an aggressive growth strategy.

From a CTO perspective, I had to navigate different expectations of these new leaders, who didn't grasp the nature of technology and engineering challenges. Quite often, my team would be compared with the big players like Netflix, Amazon, Hulu - citing the incredible numbers of concurrent users (millions), comparing against our fledgling platform built in Africa, by Africans, for Africans - a platform that emerged not natively online, but an extension of the existing PayTV broadcast ecosystem. Despite whatever truths be valid or not from the engineering perspective, my job as CTO was to manage expectations upwards as best as I could whilst working with my engineering team on the necessary upgrades to scale the platform accordingly.

Once an app is live and already being used daily, with a steady increase in monthly active users - despite technical debt and platform instability - customers expect to receive feature updates and when it comes to consuming video, a stable viewing experience. Add an aggressive marketing and acquisition strategy, my technology team didn't have the luxury to pause, stop and fix things. Business goes and and IT tech folks just have to get on and deliver, smartly.

One of the first things I did with my team, in addition to resetting their mental model - was to create a simple graph of our state - not so dissimilar to Amazon's infamous flywheel. Based on the business growth strategy, the key drivers impacting my technology team pivoted around two fundamental missions: 1\ Increase the number of active users (growth) and 2\ Increase Engagement. So I created this map, as our North Star - that would remain, un-erased on the whiteboard for 3 years since creating. This picture would become my constant reference during my management meetings (reminding my directs and technical leads why we exist, what areas are important) as well as my all-hands updates:

Technology Strategy Map - North Star / Key Focus Areas

A video streaming business is about reaching as many customers as possible, providing engaging content that captivates their users to return and consume more and more content, staying longer on the app, remaining more engaged. The app user experience needs to be seamless, easy, intuitive and on-par with other well known players (ala Netflix). The tech platform then becomes core in providing the experience. What the picture highlights are the key pillars, which became the driving force for our improvement goals. I focused our attention to execute on these and only these: 1\ Product development (software delivery of app features across as many devices and platforms as possible); 2\ Streaming & Networking infrastructure - CDN & cost optimisation; 2\ Platform Scaling challenges (big area of technical debt - transition to AWS cloud); 3\ Platform Intelligence (don't fly blind); 4\ Content Discovery (AI/ML recommenders); 5\ Telco partnerships (we didn't own the full network). Controlling all of this, was technical operations excellence, something I called "Mission Control" - active, preventative and proactive monitoring of the live platform (technical alerting and social media monitoring).

A large part of this challenge was managing cost. Based on the business growth targets, the cost to serve the technology platform per customer had to remain low. In parallel with me turning around not only a distressed platform whilst keeping the integrity of the engineering teams together (because of layoffs and re-org), I had to work on delivering cost-savings whilst armed with my own sizeable budget. We could no longer be operating on a shoestring budget anymore, and at the same time, we couldn't afford to be wasteful - frugality was still top-of-mind. My budget over my tenure would amount to R1.5 billion (±100 million USD) covering people, capital expenditure, and operations expenditure (video streaming businesses is a costly game), aligned to a Buy / Partner strategy instead of Build-your-Own.

As a result of this strategy, my team were able to execute year-on-year, exceeding expectations and turning around what was once a platform on the brink of extinction, rebooting and refreshing in 2017 to deliver a nett ±9X improvement on active users and availability of the platform:

Platform growth over 3 years under my leadership

As a result of this experience, I wrote a white paper that dived quite deep into the technical challenges a CTO in the online video streaming business must not overlook - check it out here.

My engagement as a CTO was timeboxed from the start - I wasn't expecting to work longer than 3 years as I'd set myself the challenge of learning and setting up a leadership succession plan that would continue to lead the team and deliver on the technology transformation strategy I created. At the end of that experience in 2020, just around Covid times, it was clear I needed to reset myself, and after 20+ years career in video technology, I sought a change...and decided to leave the TV world behind.

Looking back, more than two years later, I somewhat miss the thrill of high-stakes leadership, and the intensity of the experience. More so, leading a large group of people (150+), rallying them to raise the bar and turnaround their tech platform (that I had no prior experience in building, nor was I familiar with the code) and also themselves, in trusting me to lead them - and get past their stigma of not trusting a business program manager...

My next posts will dive deeper into some of the streams on how we got to 9X, covering some hits and also some big misses too!!

Pages