Culture

Flow, value, culture, delivery — measuring agility at ASOS (part 2)

The second in a two-part series where we share how at ASOS we are measuring agility across our teams and the wider tech organisation…

Recap on part one

For those who didn’t get a chance to read part one, previously I introduced our holistic measurement to agility, covering four themes of:

  • Flow — the movement of work through a teams workflow/board.

  • Value — the outcomes/impact in what we do and alignment with the goals and priorities of the organisation.

  • Culture — the mindset/behaviours we expect around teamwork, learning and continuous improvement.

  • Delivery — the practices we expect that account for the delivery of Epics/Features, considering both uncertainty and complexity.

Now let’s get to explaining how the results are submitted/visualized, what the rollout/adoption has been like along with our learnings and future direction.

Submitting the results

We advise teams to submit a new assessment every six to eight weeks, as experience tells us this gives enough time to see the change in a particular theme. When teams are ready to submit, they go to an online Excel form and add a new row, then add the rating and rationale for each theme:

A screenshot of a spreadsheet showing how users enter submissions

Excel online file for capturing ratings and rationale for each theme

Visualizing the results

Team view

By default, all teams and their current rating/trend, along with the date when the last assessment was run are visible upon opening the report:

A screenshot of all the team self-assessment results on one page

Note: all Team/Platform/Domain names anonymised for the purposes of this blog!

Viewers can then filter to view their team — hovering on the current rating provides the rationale as well as the previous rating and rationale. There is also a history of all the submitted ratings for the chosen theme over time:

An animation showing the user filtering on the report for a specific team

Note: all Team/Platform/Domain names anonymised for the purposes of this blog!

Filters persist across pages, so after filtering you can also then click through to the Notes/Actions page to remind yourself of what your team has identified as the thing to focus on improving:

An animation showing the user who has filtered on a team then viewing the improvement actions a team has documented

Note: all Team/Platform/Domain names anonymised for the purposes of this blog!

Platform/Domain view

Normally, we facilitate a regular discussion at ‘team of teams’ level which, depending on the size of an area, may be a number of teams in a platform or all the teams and platforms in a Domain:

A screenshot showing the self-assessment results for a particular platform

Note: all Team/Platform/Domain names anonymised for the purposes of this blog!

This helps leaders in an area understand where the collective is at, as well as being able to focus on a particular team. It also can highlight where teams in an area learn from each other, rather than just relying on an Agile Coach to advise. Again filtering persists to allow for Leaders to have a holistic view of improvements across teams:

A screenshot showing the improvement actions for a particular platform

Note: all Team/Platform/Domain names anonymised for the purposes of this blog!

This is key for leaders as it informs them in understanding how they can support an environment towards continuous improvement and agility. For example if a team was experimenting with WIP limits to improve their rating for Flow, if a leader is pushing more work to them then this probably isn’t going to result in the theme improving!

Tech Wide View

The Tech Wide View provides an overview of the most recent submissions for all teams across the organisation. We feel this gives us the clearest ‘measurement’ and holistic view of agility in our tech organisation, with the ability to hover on a specific theme to see if ratings are improving:

An animation show the tech wide view of results for the four themes and the trend

As coaches, this also helps inform us as to what practices/coaching areas we should be focusing on at scale, rather than trying to go after everything and/or focusing on just a specific team.

In turn we can use this data to help inform things like our own Objectives and Key Results (OKRs). We use this data to guide us on what we should be focusing on and, more importantly, if we are having impact:

A screenshot showing the OKRs for the Agile Coaching team and how they use the data from the self-assessment in their key results

Rollout, adoption and impact

In rolling this out, we were keen to stick to our principles and invite teams to complete this, rather than mandating (i.e. inflicting) it across all of our technology organisation. We used various channels (sharing directly with teams, presenting in Engineering All Hands, etc.) to advertise and market it, as well as having clear documentation around the assessment and the time commitment needed. After launching in Jan, this is the rate at which teams have submitted their first assessment:

In addition to this, we use our internal Team Designer app (more on this here) to cross-reference our coverage across domains and platforms. This allows us to see in which areas adoption is good and in which areas we need to remind/encourage folks around trialling it:

A screenshot of the Team Designer tool showing the percentage of teams in a platform/domain that have completed a self-assessment

Note: numbers may not match due to date screenshots were taken!

With any ‘product’, it’s important to consider what are appropriate product metrics to consider, particularly as we know measurable changes in behaviour from users are typically what correlate with value. One of the ways to validate if the self-assessment is adding value for teams is if they continue to use it. One-off usage may give them some insight but if it doesn’t add value, particularly with something they are ‘invited’ to use, then it will gradually die and we won’t see teams continue to complete it. Thankfully, this wasn’t the case here, as we can see that 89% of teams (70 out of 79) have submitted more than one self-assessment:

The main thing though that we are concerned with in demonstrating the impact/value of this approach is if teams are actually improving. You could still have plenty of teams adopt the self-assessment yet stay the same for every rating and never actually improve. Here we visualise each time a team has seen an improvement between assessments (note: teams are only counted the first time they improve, not counted again if they improve further):

Overall we can see the underlying story is that the vast majority of teams are improving, specifically that 83% of teams (58 out of 70) who have submitted >1 assessment have improved in one (or more) theme.

Learnings along the way

Invitation over infliction

In order to change anything around ways of working in an organisation, teams have to want to change or “opt-in”. Producing an approach without the acceptance going in that you may be completely wrong and being prepared to ditch it leads to sunk cost fallacy. It is therefore important with something like this that teams can “opt-out” at any time.

Keep it lightweight yet clear

There are many agility assessments that we have seen in different organisations/the industry over the years and almost always these fall foul of not being lightweight. You do not need to ask a team 20 questions to “find out” about their agility. Having said this, lightweight is not an excuse for lack of clarity, therefore supporting documentation on how people can find out where they are or what the themes mean is a necessity. We used a Confluence page with some fast-click links to specific content to allow people to quickly get to what they needed to get to:

A screenshot showing a Confluence wiki with quick links

Shared sessions and cadence

Another way to increase adoption is to have teams review the results together, rather than just getting them to submit and then that’s it. In many areas we, as coaches, would facilitate a regular self-assessment review for a platform or domain. In this each team would talk through their rationale for a rating whilst the others can listen in and ask questions/give ideas on how to improve. There have been times for example when ratings have been upgraded due to teams feeling they were being too harsh (which surprisingly I also agreed with!) but the majority of time there are suggestions they can make to each other. In terms of continuous improvement and learning this is way more impactful than hearing it from just an Agile Coach.

Set a high bar

One of the observations we made when rolling this out was how little ‘green’ there was in particular themes. This does not automatically equate to teams being ‘bad’, more just they are where we think good is from an agility perspective, relevant to our experience and industry trends.

One of the hard parts with this is not compromising in your view of what good looks like, even though it may not be a message that people particular like. We leaned heavily on the experience of Scott Frampton and his work at ASOS to stay true to this, even if it at times it made for uncomfortable viewing.

Make improvements visible

Initially, the spreadsheet did not contain the column about what teams are going to do differently as a result of this, it was only after a brief chat with Scott and with his learnings we implemented this. Whilst it does rely on teams to add in the detail about what they are going to do differently, it helps see that teams are identifying clear action to take, based on the results of the assessment.

Culture trumps all when it comes to improvement

This is one of the most important things when it comes to this type of approach. One idea I discussed with Dan Dickinson from our team was around a ‘most improved’ team, where a team had improved the most from their initial assessment to what they are now. In doing this one team was a clear standout, yet they remained at a value of ‘red’ for culture. This isn’t the type of team we should be celebrating, even if all the other factors have improved. Speedy delivery of valuable work with good rigour around delivery practices is ultimately pointless if people hate being in that team. All factors to the assessment are important but ultimately, you should never neglect culture.

Measure it’s impact

Finally, focus on impact. You can have lots of teams regularly assessing but ultimately, if it isn’t improving the way they work it is wasted effort. Always consider how you will validate that something like an assessment can demonstrate tangible improvements to the organisation.

What does the future hold?

As a coaching team we have a quarterly cadence of reviewing and tweaking the themes and their levels, sharing this with teams when any changes are made:

A screenshot of a Confluence page showing change history for the self-assessment

Currently, we feel that we have the right balance in the number of themes vs. the light-weightiness of the self-assessment. We have metrics/tools that could bring in other factors, such as predictability and/or quality:

A screenshot showing a Process Behaviour Chart (PBC) for work in progress and a chart showing the rate of bug completion for a team

Left — a process behaviour chart highlighting where WIP has become unpredictable | Right — a chart showing the % of Throughput which are Bugs which could be a proxy for ‘quality’

Right now we’ll continue the small tweaks each quarter with an aim to improve as many teams, platforms and domains as we can over the next 12 months…watch this space!

Flow, value, culture, delivery — measuring agility at ASOS.com (part 1)

The first in a two-part series where we share how at ASOS we are measuring agility across our teams and the wider tech organisation. This part covers the problems we were looking to solve, what the themes are, as well as exploring the four themes in detail…

Context and purpose

As a team of three coaches with ~100 teams, understanding where to spend our efforts is essential to be effective in our role. Similarly, one of our main reasons to exist in ASOS is to help understand the agility of the organisation and clearly define ‘what good looks like’.

Measuring the agility of a team (and then doing this at scale) is a difficult task and in lots of organisations this is done through the usage of a maturity model. Whilst maturity models can provide a structured framework, they often fall short in addressing the unique dynamics of each organisation, amongst many other reasons. The rigidity of these models can lead to a checklist mentality, where the focus is on ticking boxes (i.e. ‘agile compliance’) rather than fostering genuine agility. Similarly they assume everyone follows the same path where we know context differs.

Such as this…

Unfortunately, we also found we had teams at ASOS that were focusing on the wrong things when it comes to agility such as:

  • Planned vs. actual items per sprint (say-do ratio)

  • How many items ‘spill over’ to the next sprint

  • How many story points they complete / what’s our velocity / what is an “8-point story” etc.

  • Do we follow all the agile ceremonies/events correctly

When it comes to agility, these things do not matter.

We therefore set about developing something that would allow our teams to self-assess, which focuses on the outcomes agility should lead to. With the main problems to solve being:

  • Aligning ASOS Tech on a common understanding of what agility is

  • Giving teams a lightweight approach to self-assess, rather than relying on Agile Coaches to observe and “tell them” how agile they are

  • Having an approach that is more up to date with industry trends, rather than how people were taught Scrum/Kanban 5, 10 or 15+ years ago

  • Having an approach to self-assessment that is framework agnostic yet considers our ASOS context

  • Allowing senior leaders to be more informed about where their teams are agility wise

Our overarching principle being that this is a tool to inform our collective efforts towards continuous improvement, and not a tool to compare teams, nor to be used as a stick to beat them with.

The Four Themes

In the spirit of being lightweight, we restricted ourself to just four themes, these being things that we think are most important for teams when it comes to the outcomes you should care about when it comes to your way of working.

  • Flow — the movement of work through a teams workflow/board.

  • Value — focusing on the outcomes/impact of what we do and alignment with the goals and priorities of the organisation.

  • Culture — the mindset/behaviours we expect around teamwork, learning and continuous improvement.

  • Delivery— the practices we expect that account for the delivery of features and epics, considering both uncertainty and complexity.

Each theme has three levels, which are rated on a Red/Amber/Green scale — mainly due to this being an existing scale of self-assessment in other tools our engineering teams have at their disposal.

Flow

The focus here is around flow metrics, specifically Cycle Time and Work Item Age, with the three levels being:

Teams already have flow metrics available to them via a Power BI app, so are able to quickly navigate and understand where they are:

The goal with this is to make teams aware just how long items are taking, as well as how long the “in-flight” items have actually been in progress. The teams that are Green are naturally just very good at breaking work down, and/or have already embedded looking at this on a regular basis into their way of working (say in retrospectives). Those teams that are at Red/Amber have since adopted techniques such as automating the age of items on the kanban board to highlight aging items which need attention:

Value

The focus with this theme is understanding the impact of what we do and ensuring that we retain alignment with the goals and priorities of the organisation:

In case you haven’t read it, in the ASOS Tech blog previously I’ve covered how we go about measuring portfolio alignment in teams. It essentially is looking at how much of a team backlog goes from PBI/User Story > Feature > Epic > Portfolio Epic. We visualise this in a line chart, where teams can see the trend as well as flipping between viewing their whole backlog vs. just in-flight work:

Similar to flow metrics, teams can quickly access the Power BI app to understand where they are for one part of value:

The second part is where many teams currently face a challenge. Understanding the impact (value) of what you deliver is essential for any organisation that truly cares about agility. We’re all familiar with feature factories, so this was a deliberate step change to get our teams away from that thinking. What teams deliver/provide support for varies, from customer facing apps to internal business unit apps and even tools or components that other teams consume, so having a ‘central location’ for looking at adoption/usage metrics is impossible. This means it can take time as either data is not readily available to teams or they had not actually really considered this themselves, most likely due to being a component part of a wider deliverable.

Still, we’ve seen good successes here, such as our AI teams who measure the impact of models they build around personalised recommendations, looking at reach and engagement. Obviously for our Web and Apps teams we have customer engagement/usage data, but we also have many teams who serve other teams/internal users, like our Data teams who look at impact in terms of report open rate and viewers/views of reports they build:

Culture

Next we look at the behaviours/interactions a team has around working together and continuously improving:

Ultimately, we’re trying to get away from the idea that many have around agility that continuous improvement = having retrospectives. These are meaningless if they are not identifying actionable (and measurable) improvements to your way of working, no matter how “fun” it is to do a barbie themed format!

We aren’t prescriptive in what team health tool teams use, so long as they are doing it. This could be our internal tool, PETALS (more on this here), the well known Spotify Team/Squad Health Check or even the Team Assessment built into the retrospectives tool in Azure DevOps:

All tools welcome!

The point is that our good teams are regularly tracking this and seeing if it is getting better.

A good example of what we are looking for at ‘green’ level is from this team who recently moved to pairing (shout out to Doug Idle and his team). Around 8 weeks before this image was taken they moved to pairing, which has not only made them happier as a team, but has clearly had a measurable impact in reducing their 85th percentile cycle time by 73%:

73% reduction (from 99 to 27 days) in 85th percentile Cycle Time as a result of pairing

Combine this with then sharing more widely, primarily so teams can learn from each other, then this is what we are after in our strongest teams culturally.

Delivery

The final theme touches on what many agile teams neglect which is ultimately about delivering. When we mean delivery in this context we’re focusing on delivery of Features/Epics (as opposed to PBI/Story level). Specifically, we believe it’s understanding risk/uncertainty and striving towards predictability and what this means when using agile principles and practices:

The good teams in this theme understand that due to software development being complex, you need to forecast delivery with a percentage confidence, and do this regularly. This means using data which, for our teams is available to them within a few clicks, here they can forecast, given a count of items when will they be done or, given a time box, what can they deliver.

Many teams have multiple features in their backlog, thus to get to ‘green’ our teams should leverage Feature Monte Carlo so the range of outcomes that could occur for multiple Features is visible:

Note: Feature list is fictional/not from any actual teams

Previously I’ve covered our approach to capacity planning and right-sizing, where teams focus on making Features no bigger than a certain size (batch) and thus can quickly (in seconds) forecast the amount of right-sized features they have capacity for, which again is what we look for in our ‘green’ criteria:

Note: Feature list is fictional/not from any actual teams

The best way to do this really is to have a regular cadence where you specifically look at delivery and these particular metrics, that way you’re informed around your progress and any items that may need breaking down/splitting.

Part two…

In this post I share how teams submit their results (and at what cadence) as well as how the results are visualised, what the rollout/adoption has been like, along with our learnings and future direction for the self-assessment…

Mastering flow metrics for Epics and Features

Flow metrics are a great tool for teams to leverage for an objective view in their efforts towards continuous improvement. Why limit them to just teams? 

This post reveals how, at ASOS, we are introducing the same concepts but for Epic and Feature level backlogs…

Flow at all levels

Flow metrics are one of the key tools in the toolbox that we as coaches use with teams. They are used as an objective lens for understanding the flow of work and measuring the impact of efforts towards continuous improvement, as well as understanding predictability.

One of the challenges we face is how we can improve agility at all levels of the tech organisation. Experience tells us that it does not really matter if you have high-performing agile teams if they are surrounded by other levels of backlogs that do not focus on flow:

Source:

Jon Smart

(via

Klaus Leopold — Rethinking Agile

)

As coaches, we are firm believers that all levels of the tech (and wider) organisation need to focus on flow if we are to truly get better outcomes through our ways of working.

To help increase this focus on flow, we have recently started experimenting with flow metrics at the Epic/Feature level. This is mainly because the real value for the organisation comes at this level, rather than at an individual story/product backlog item level. We use both Epic AND Feature level as we have an element of flexibility in work item hierarchy/levels (as well as having teams using Jira AND Azure DevOps), yet the same concepts should be applicable. Leaving our work item hierarchy looking something like the below:

Note: most of our teams use Azure DevOps — hence the hierarchy viewed this way

Using flow metrics at this level comprises of the typical measures around Throughput, Cycle Time, Work In Progress (WIP) and Work Item Age, however, we provide more direct guidance around the questions to ask and the conversations to be having with this information…

Throughput

Throughput is the number of Epics/Features finished per unit of time. This chart shows the count completed per week as well as plotting the trend over time. The viewer of the chart is able to hover over a particular week to get the detail on particular items. It is visualised as a line chart to show the Throughput values over time:

In terms of how to use this chart, some useful prompts are:

What work have we finished recently and what are the outcomes we are seeing from this?

Throughput is more of an output metric, as it is simply a count of completed items. What we should be focusing on is the outcome(s) these items are leading to. When we hover on a given week and see items that are more ‘customer’ focused we should then be discussing the outcomes we are seeing, such as changes in leading indicators on measures like unique visits/bounce rate/average basket value on ASOS.com.

For example, if the Epic around Spotify partnerships (w/ ASOS Premier accounts) finished recently:

We may well be looking at seeing if this is leading to increases in ASOS Premier sign-ups and/or the click-through rate on email campaigns/our main site:

The click-through rate for email/site traffic could be a leading indicator for the outcomes of that Epic

If an item is more technical excellence/tech debt focused then we may be discussing if we are seeing improvements in our engineering and operational excellence scores of teams.

What direction is the trend? How consistent are the values?

Whilst Throughput is more output-oriented, it could also be interpreted as a leading indicator for value. If your Throughput is trending up/increasing, then it could suggest that more value is being delivered/likely to be delivered. The opposite would be if it is trending downward.

We also might want to look at the consistency of the values. Generally, Throughput for most teams is ‘predictable’ (more on this in a future post!) however it may be that there are spikes (lots of Epics/Features moving to ‘Done’) or periods or zeros (where no Epics/Feature moved to ‘Done’) that an area needs to consider:

Yes, this is a real platform/domain!

Do any of these items provide opportunities for learning/should be the focus of a retrospective?

Hovering on a particular week may prompt conversation about particular challenges had with an item. If we know this then we may choose to do an Epic/Feature-based retrospective. This sometimes happens for items that involved multiple platforms. Running a retrospective on the particular Epic allows for learning and improvements that can then be implemented in our overall tech portfolio, bringing wider improvements in flow at our highest level of work.

Cycle Time

Cycle Time is the amount of elapsed time between when an Epic/Feature started and when it finished. Each item is represented by a dot and plotted against its Cycle Time (in calendar days). In addition to this, the 85th and 50th percentile cycle times for items in that selected range are provided. It is visualised as a scatter plot to easily identify patterns in the data:

In terms of how to use this chart, some useful prompts are:

What are the outliers and how can we learn from these?

Here we look at those Epics/Features that are towards the very top of our chart, meaning they took the longest:

These are useful items to deep dive into/run a retrospective on. Finding out why this happened and identifying ways to try to improve to prevent this from happening encourages continuous improvement at a higher level and ultimately aids our predictability.

What is our 85th percentile? How big is the gap between that and our 50th percentile?

Speaking of predictability, generally, we advise platforms to try to keep Features to be no greater than two months and Epics to be no greater than four months. Viewing your 85th percentile allows you to compare what your actual size for Epics/Features is, compared to the aspiration of the tech organisation. Similarly, we can see where there is a big gap in those percentile values. Aligned with the work of Karl Scotland, too large a gap in those values suggests there may be too much variability in your cycle times.

What are the patterns from the data?

This is the main reason for visualising these items in a scatter plot. It becomes very easy to spot when we are closing off work in batches and have lots of large gaps/white space where nothing is getting done (i.e. no value being delivered):

We can also see maybe where we are closing Epics/Features frequently but have increased our variability/reduced our predictability with regards to Epic/Feature cycle time:

Work In Progress (WIP)

WIP is the number of Epics/Features started but not finished. The chart shows the number of Epics/Features that were ‘in progress’ on a particular day. A trend line shows the general direction WIP is heading. It is visualized as a stepped line chart to better demonstrate changes in WIP values:

In terms of how to use this chart, some useful prompts are:

What direction is it trending?

We want WIP to be level/trending downward, meaning that an area is not working on too many things. An upward trend alludes to potentially a lack of prioritisation as more work is starting and then remaining ‘in progress’.

Are we limiting WIP? Should we change our WIP limits (or introduce them)?

If we are seeing an upward trend it may well be that we are not actually limiting WIP. Therefore we should be thinking about that and discussing if WIP limits are needed as a means of introducing focus for our area. If we are using them, advanced visuals may show us how often we ‘breach’ our WIP limits:

A red dot represents when a column breached its WIP limit

Hovering on a dot will detail which specific column breached its WIP on the given day and by how much.

What was the cause of any spikes or drops?

Focusing on this chart and where there are sudden spikes/drops can aid improvement efforts. For example, if there was a big drop on a given date (i.e. lots of items moved out of being in progress), why was that? Had we lost sight of work and just did a ‘bulk’ closing of items? How do we prevent that from happening again?

The same goes for spikes in the chart— meaning lots of Epics/Features moved in progress. It certainly is an odd thing to see happen at Epic/Feature level but trust me it does happen! You might be wondering when could this happen — in the same way, some teams hold planning at the beginning of a sprint and then (mistakenly) move everything in progress at the start of the sprint, an area may do the same after a semester planning event — something we want to avoid.

Work Item Age

Work Item Age shows the amount of elapsed time between when an Epic/Feature started and the current time. These items are plotted against their respective status in their workflow on the board. For the selected range, the historical cycle time (85th and 50th percentile) is also plotted. Hovering on a status reveals more detail on what the specific items are and the completed vs. remaining count of their child items. It is visualised as a dot plot to easily see comparison/distribution:

In terms of how to use this chart, some useful prompts are:

What are some of our oldest items? How does this compare to our historical cycle time?

This is the main purpose of this chart, it allows us to see which Epics/Features have been in progress the longest. These really should be the primary focus as this represents the most risk for our area as they have been in flight the longest without feedback. In particular, those items that are above our 85th percentile line are a priority, as now these are larger than 85% of the Epics/Features we completed in the past:

The items not blurred are our oldest and would be the first focus point

The benefit of including the completed vs. remaining (in terms of child item count) provides additional context so we can then also understand how much effort we have put in so far (completed) and what is left (remaining). The combination of these two numbers might also indicate where you should be trying to break these down as, if a lot of work has been undertaken already AND a lot remains, chances are this hasn’t been sliced very well.

Are there any items that can be closed (Remaining = 0)?

These are items we should be looking at as, with no child items remaining, it looks like these are finished.

The items not blurred are likely items that can move to ‘Done’

Considering this, they really represent ‘quick wins’ that can get an area flowing again — getting stuff ‘done’ (thus getting feedback) and in turn reducing WIP (thus increasing focus). In particular, we’ve found visualizing these items has helped our Platform Leads in focusing on finishing Epics/Features.

Why are some items in progress (Remaining = 0 and Completed = 0)?

These are items we should be questioning why they are actually in progress.

Items not blurred are likely to be items that should not be in progress

With no child items, these may have been inadvertently marked as ‘in progress’ (one of the few times to advocate for moving items backwards!). It may, in rare instances, be a backlog ‘linking’ issue where someone has linked child items to a different Epic/Feature by mistake. In any case, these items should be moved backwards or removed as it’s clear they aren’t actually being worked on.

What items should we focus on finishing?

Ultimately, this is the main question this chart should be enabling the conversation around. It could be the oldest items, it could be those with nothing remaining, it could be neither of those and something that has become an urgent priority (although ignoring the previous two ‘types’ is not advised!). Similarly, you should also be using it in proactively managing those items that are getting close to your 85th percentile. If they are close to this value, it’s likely focusing on what you need to do in order to finish these items should be the main point of discussion.

Summary

Hopefully, this post has given some insights about how you can leverage flow metrics at Epic/Feature Level. In terms of how frequently you should look at these then, at a minimum, I’d recommend this is done weekly. Doing it too infrequently means it is likely your teams will be unclear on priorities and/or will lose sight of getting work ‘done’. If you’re curious how we do this, these charts are generated for teams using either Azure DevOps or Jira, using a Power BI template (available in this repo).

Comment below if you find this useful and/or have your own approaches to managing flow of work items at higher levels in your organisation…

Seeking purpose – intrinsic motivation at ASOS

Autonomy, mastery and purpose are the three core components to intrinsic motivation. How do you embed these into your technology function/department? Read on to explore these concepts further and how we go about it at ASOS…

The book Drive by Daniel Pink is an international bestseller and a commonly referenced book around modern management. If you haven’t read it, the book essentially looks at what motivators are for people when it comes to work.

Some may immediately assume this is financial, which, to a certain degree is true. The research in the book explains that for simple, straightforward work, financial rewards to motivation are indeed effective. It also explains how we need to understand this as being an ‘external’ motivational factor. Motivation from these external factors is classed as extrinsic motivation. These factors only go so far and, in the complex domain such as software development, quickly lose effectiveness when pay is fair.

This is where we look at the second motivational aspect of intrinsic motivation. When pay is fair and work is more complex, thisis when the behaviour of the person is motivated by an inner drive that propels a person to pursue an activity. Pink explains how intrinsic motivation is made up of three main parts:

  • Autonomy — the desire to direct our own lives

  • Mastery — the desire to continually improve at something that matters

  • Purpose — the desire to do things in service of something larger than ourselves

What drives us: autonomy + mastery + purpose

Source

When people have intrinsic motivation, it motivates people to do their best work. So how do we try to bring intrinsic motivation to our work in Tech @ ASOS?

Autonomy

Autonomy is core to all our teams here at ASOS. From a technical perspective, teams have aligned autonomy around technologies they can leverage. We do this through things such as our Patterns and Practices group, which looks to improve technical alignment across teams and agree on patterns for solving particular problems. We then communicate these patterns both internally and externally, which makes our software safer to operate and reduces re-learning effort.

As a team of Agile Coaches, we uphold this autonomy principle by not prescribing a single way of working for any of our teams. Instead, we give them the freedom to choose however they want to work, but guiding them around ensuring this way of working aligns with agile values and principles.

Comic Agilé of a leader telling teams they are self-organising

Not like this!

From books such as Accelerate, we know that enforcing standardisation with working practices upon teams actually reduces learning and experimentation. When your target market is fashion-loving 20-somethings, teams simply must be able to innovate and change without having what Marty Cagan would call ‘process people’ who impose constraints on how teams must work. You cannot inhibit yourselves by mandating one single way of working.

To bring this to life with a simple example, we don’t have any teams that use all elements of Scrum as per the guide. Do we have teams that take inspiration and practices from Scrum? Yes. Can they change/get rid of practices that don’t add value? Of course. Do they also blend practices from other frameworks too? Absolutely! For instance, we have plenty of teams who work in sprints (Scrum), love pairing (eXtreme Programming) and use flow metrics (Kanban) to continuously improve, all whilst retaining a core principle of “you build it, you run it” (DevOps). Autonomy is therefore an essential factor for all our technology teams.

Enough about autonomy… what about mastery?

Mastery

Mastery exists in a few forms for our teams. A core approach to mastery our teams use is our Fundamentals. These are measures we use to drive continuous improvement and operational excellence across our services​. Our own Scott Frampton discussing the history and evolution of this in detail in this series. In short, it comprises of four pillars:

  1. Monitoring & Observability

  2. Performance & Scalability

  3. Resiliency

  4. Deployability

Teams self-assess and use this as a compass (rather than a GPS) to guide them in their improvement efforts. This means we are aligned in “what good looks like” when engineering and operating complex systems.

Sample view of engineering excellence

The levels of the respective measures are continually assessed and evolve quarter to quarter, in line with industry trends, as well as patterns and practices, so teams never “sit still” or think they have achieved a level of mastery that they will never surpass.

Similarly, mastery is something that is encouraged and celebrated through our internal platforms and initiatives. ASOS Backstage is our take on Spotify Backstage, another tool in our toolbox to better equip our teams in understanding the software landscape at ASOS. We also have our Defenders of the Wheel group — a collection of engineers who work to support the development and growth of new ASOS Core libraries and internal tools.

Screenshot of ASOS Backstage

To encourage mastery, individuals across Tech are able to achieve certifications relevant to their role(s) and/our contributions to these internal platforms/groups:

Backstage badges

This means that there are frequent sources of motivation for individuals in our teams from a mastery perspective.

What about the final aspect of intrinsic motivation, purpose?

Purpose

This is probably the most challenging area for our teams, as often this may be outside of their control. As an organisation, we’re very clear on what our vision and purpose is:

Our vision is to be the world’s number one fashion destination for fashion-loving 20-somethings

Source: ASOS PLC

Similarly, our CEO José recently reminded us all about what makes ASOS the organisation it is, covering our purpose, performance and passion at a recent internal event:

José talking purpose, performance and passion at Town Hall

Source: José’s LinkedIn

The challenge is that in a tech organisation, this doesn’t always easily translate into the specific work an individual and/or team is doing. If a team is working on a User Story for example, it’s not an unfair question for them to be asking “Why am I doing this?” or “What impact will this have?” or even “Where is the value?”. One of our efforts around this has been introducing and improving what we call ‘Semester Planning’, which Paul Taylor will cover in a future post. The other main effort has been around portfolio transparency.

Portfolio transparency, as a concept, is essentially end-to-end linkage in the work anyone in a team is doing so that they, as an individual, can understand how this aligns with the goals and strategy of the organisation. Books such as Sooner, Safer, Happier by Jonathan Smart bring this concept to life in visuals like so:

Strategic objective diagram

Source: Sooner Safer Happier

The key to this idea is that an individual should be able to understand the value in the work they are doing. This value should be as simple as possible – i.e. not via some Fibonacci voodoo or ambiguous mathematical formula (e.g. SAFe’s version of WSJF). The acid test being that can anyone in the tech organisation understand how a given item (story, feature, epic) contributes to the goals of the organisation and the value this brings. My own ‘self-imposed’ constraint for this being that they should achieve this in less than five clicks.

At its core, this really is just about better traceability of work end to end. We have high-performing teams who regularly showcase technical excellence, but how does that fit into the big picture?

With the work we have been doing, a team can now take a User Story that they will be working on and, within five clicks, understand the value this brings and the strategic alignment to the goals of the organisation (note these numbers have been modified for the purpose of this blog):

Sample hierarchy of User Story to Feature to Epic to Portfolio Epic
Sample epic in Azure Devops

*Note — 

not

the actual £ values*

Sample products demo from previous user story

And this is what it looks like to you!

Of course, this is dependent on quality data entry! Not everything (yet!) in our portfolio contains this information, however, this is the first positive step in making visible the purpose and value in our work.

How do you do this in your organisation? Can teams easily see the value in what they are doing? I’d love to hear your thoughts in the comments below…

The many flaws of Flow Efficiency

As organisations try to improve their ways of working, better efficiency is often a cited goal. ‘Increasing flow’ is something else you may hear, with Flow Efficiency being a measure called out as something organisations need to focus on. The problem is that few, if any, share the many flaws of this metric.

Read on to find out just what these pitfalls are and, more importantly, what alternatives you might want to focus on instead to improve your way of working…

Queues

Queues are the enemy of flow in pretty much every context, but especially software development. Dependencies, blockers and org structure(s) are just a few that spring to mind when thinking about the reasons why work sits idle. In the world of Lean manufacturing, Taiichi Ohno once stated that in a typical process only around 5% is defined as value adding activity. There is also the measure of overall equipment effectiveness (OEE), with many manufacturing lines only 60% productive.

More recently, the work of Don Reinertsen in the Principles of Product Development Flow has been a frequent inspiration for Agile practitioners, with this quote in particular standing out:

Our greatest waste is not unproductive engineers but work products sitting idle in process queues.

Many thought leaders, coaches and consultants champion the use of a metric known as flow efficiency when coaching teams and organisations about improving their way of working, but what exactly is it?

What is flow efficiency?

Flow efficiency is an adaptation from the lean world metric of process efficiency. This is where for a particular work item (backlog item, user story, whatever your preferred taxonomy is) we measure the percentage of active time — i.e., time spent actually working on the item against the total time (active time + waiting time) that it took to for the item to complete.

For example, if we were to take a software development team’s Kanban board, it may look something like this:

Source: Flow Efficiency: Powering the Current of Your Work

Where flow efficiency would be calculated like so:

Flow efficiency (%) = Active time / Total time x 100%

The industry standard says that anything between 15% and 40% flow efficiency is good.

In terms of visualizing flow efficiency, it typically will look like this:

Flow Efficiency: Powering the Current of Your Work

In this chart, we can see the frequency (number) of work items with a certain percentage flow efficiency, and an aggregated view of what the average flow efficiency looks like.

All makes sense, right? Many practitioners would also advocate this as an important thing to measure.

I disagree. In fact, I would go as far as to say that I believe flow efficiency to be the least practical and overhyped metric in our industry.

So, what exactly are some of the problems with it?

Anecdotal evidence of “typical” flow efficiency

Now I don’t disagree with the ideas of those above regarding queues being an issue with a lot of time spent waiting. I also don’t deny that flow efficiency in most organisations is likely to be poor. My issue comes with those who cite flow efficiency percentages or numbers, quoting ‘industry standards’ and what good looks like without any solid proof. “I’ve seen flow efficiency percentages of n%” may be a common soundbite you hear — #DataOrItDidntHappen needs to be a more frequent hashtag for some claims in our industry. If we take a few examples near the top of a quick Google search:

I finally thought I’d found some hard data with “the average Scrum team Process Efficiency for completing a Product Backlog Item is on the order of 5–10%” which is cited in Process Efficiency — Adapting Flow to the Agile Improvement Effort. That is until we see the full text:

And then the supporting reference link:

Surveying a few people in a classroom ‘the average Scrum team’. 

It amazes me that in all the years of data collated in various tools, as well as our frequent emphasis on empiricism — there is not one single study that validates the claims made around what flow efficiency percentages “typically” are.

Lack of wait states

Now, discounting the lack of a true study, let’s look at how a typical team works. Plenty of teams do not know or have not identified the wait states in their workflow:

In this example (which is not uncommon) — all the workflow states are ‘active’ states, therefore there is no way to calculate when work is waiting, thus flow efficiency will always be 100% (and therefore useless!). Plenty of teams have this where they do in fact know what their wait states are yet have not modelled them appropriately in their workflow.

Impossibility of measuring start/stop time

Let’s say now we’ve identified our wait states and modelled them appropriately in our workflow:

How often (IME fairly regularly!) do we hear updates when reviewing the board like the below:

Expecting near real-time updates (in order to accurately reflect active vs. wait time) is just not practical and therefore any flow efficiency number is flawed due to this delay in updating items. Furthermore, there are so many nuances with product development that making a binary call as to whether something is active vs. wait is impossible. Is thinking through something on a walk active time or idle time? What about experimentation? Even more so think about when we leave for work at the end of the day. None of our items are being worked on, so shouldn’t they all be marked as ‘waiting’ until the next day?

Not accounting for blockers

Keeping the same workflow as before, the next scenario to consider is how we handle when work is blocked.

This particular item is highlighted/tagged due to being blocked as we need feedback before we can move it along in our workflow. Yet it’s in a ‘work’ state as it cannot be progressed. More often than not, this is not factored into any flow efficiency calculation or literature, such as this example:

Tasktop — Where is the Waste in Your Software Delivery Process?

There is no way an item was “In Dev” for a clear, uninterrupted period and therefore it is not a realistic picture presented in terms of how product development actually happens.

That’s NumberWang!

For those unaware, Numberwang is a well-known sketch from the comedy TV show That Mitchell and Webb Look. It is a fictional gameshow where the two contestants call out random numbers which are randomly then told by the ‘host’ to be “Numberwang!”

[embed]https://youtu.be/0obMRztklqU[/embed]

Why is this relevant? Well, when looking at items that have moved through our process and their respective flow efficiency percentage, all we are doing is playing an Agilists version of the same comedy sketch.

Face ID Login has a flow efficiency of 19% but QR code returns only had 9%! OMG :( :( :( So what? It’s just a numbered percentage— it doesn’t mean anything! Also, look at the cycle time for those items, can we definitively say that one item was “more efficient” than the other? Does this tell us anything about how to improve our workflow and where our bottlenecks are? No! It’s just reading out numbers and thinking it means something because it’s “data”.

The Flaw of Averages

Anyone who has read previous posts of mine will know that any sort of use of average with flow metrics is a way to push my buttons. Unfortunately, the visualisation of flow efficiency often comes with an average of the efficiency for a collection of completed work items, like so:

Using averages with any sort of metrics is a dangerous flirtation with misleading information, and we can see that for a series of items this is quite easy to do:

Three of our five completed items have poor flow efficiency yet aggregating to a single number alludes to (if the close to 40% flow efficiency being “good” anecdote is being cited!) us having a fairly effective process. By aggregating we are losing all that context of those ‘inefficient’ items and where we might be using them as the basis for a conversation around improving our way of working.

What should we use instead?

In theory flow efficiency seems like a good idea, however when you look at all those reasons above that it is simply not practical for teams and organisations to actually implement and put to effective use (without at least being clear they are using flawed data). Proceed with caution for anyone advocating it without those caveats mentioned above!

https://twitter.com/danvacanti/status/1321547554136428544https://twitter.com/DReinertsen/status/1106975020432031744

A better metric/use of your time is looking at blocker data and going after those that occur the most frequently and/or are the most impactful. Troy Magennis has a great tool for this (thank you also to Troy for sharing some thoughts on this piece).

Here are some of the examples we use for some of our teams here at ASOS:

Shout out to

Francis Gilbert

in our Saved Items team for these!

Which can then be used to reduce the frequency of particular blockers occurring and seeing where you need to next focus:

Shout out to Gary Sedgewick in our PayTech team for this!

This way you’re actually going after the problems teams face, which will in turn positively impact the flow of work. All this is done without the need of some ‘efficiency’ measure/number.

What are your thoughts? Agree? Disagree? 

I’d love to hear what you think in the comments below…