Nicolas Brown

6 February 2025

Cycle Time Correlations

Nicolas Brown

6 February 2025

Improving flow is a key goal for nearly all organisations. More often than not, the primary driver for this is speed, commonly referred to as Cycle Time. As organisations try to reduce this and improve their time to (potential) value, what factors correlate with speed? This blog, inspired by the tool DoubleLoop, looks at the correlations Cycle Time has with other flow-based data…

The correlation of metrics

A few months ago I came across a tool called DoubleLoop. It is unique in that it allows you to plot your strategy in terms of your bets, the work breakdown and key metrics, all onto one page. The beauty of it is that you can see the linkage to the work you do with the metrics that matter, as well as how well (or not so well) different measures correlate with each other.

DoubleLoop — Develop an impact-oriented strategy

For product-led organisations, this opens up a whole heap of different options around visualising bets and their impact. The ability to see causal relationships between measures is a fantastic invitation to a conversation around measuring outcomes.

Looking at the tool with a flow lens, it also got me curious, what might these correlations look like from a flow perspective? We’re all familiar with things such as Little’s Law but what about the other practices we can adopt or the experiences we have as work flows through our system?

As speed (cycle time) is so often what people care about, what if we could see which measures/practices have the strongest relationship with this? If we want to improve our time to (potential) value, what should we be focusing on?

Speed ≠ value and correlation ≠ causation

Before looking at the data, an acknowledgement about what some of you reading may well be pointing out.

The first is that speed does not equate to value, which is a fair point, albeit one I don’t believe to be completely true. We know from the work of others that right-sizing trumps prioritisation frameworks (specifically Cost of Delay Divided by Duration — CD3) when it comes to value delivery.

Given right-sizing is part influenced by duration (in terms of calendar days), and the research above, you could easily argue that speed does impact value. That being said, the data analysed in this blog looked at work items at User Story/Product Backlog Item level, which is difficult to quantify the ‘value’ that brings.

A harder to disagree with point is the notion that correlation does not equal causation. Just like the biomass power generated in the Philippines correlates with Google searches for ‘avocado toast’, there probably isn’t a link between the two.

However, we often infer in working with our teams about things they should be doing when using visual management of work. For some, these are undoubtedly linked, for example how long an item has been in-progress is obviously going to have a strong relationship with how long it took to complete. Others are more for up for debate such as, do we need to regularly be updating work items? Or should we be going granular with our board design/workflows? The aim of this blog is to try challenge some of that thinking, backed by data.

For those curious, a total of 15,421 work items completed by 70 teams over the since June 1st 2024 were used as input to this research. Given this size, there may be other causal relationships at play (team size, length of time together, etc.) that are not included in this analysis.

Without further delay, let’s start looking at the different factors that may influence Cycle Time…

Days since an item was started (Work Item Age)

One of the most obvious factors that plays into Cycle Time is how long an item has been in-progress, otherwise known as Work Item Age.

Clearly this has the strongest correlation with Cycle Time as, when your item is in-progress, it will have been in that state for a number of days and then once it moves to ‘done’ there should never be a difference in those two values.

The results reflect that with a correlation coefficient of 1.000, and about as strong a positive correlation as you will ever see. This means that above everything else, we should always be focusing on Work Item Age if we’re trying to improve speed.

Elapsed time since a work item was created (Lead Time)

The next thing to consider is how long it’s been since an item was created. Often referred to as ‘Lead Time’, this will often be different to Cycle Time as there may be queues before work actually starts on an item.

This is useful to validate our own biases. For example, I have often made the case to teams that anything older than three months on the backlog probably should just be deleted, as YAGNI.

This had a correlation coefficient with of 0.713, which is a very strong correlation. This is to be largely expected, as longer cycle times invariably will mean longer lead times, given it (more often than not) makes up a large proportion of that metric.

Time taken to start an item

A closely related metric to this is the time (in days) it took us to start work on an item. There are two schools of thought to challenge here. One is the view of “we just need to get started” and the other being that potentially the longer you leave it, the less likely you’re going to have that item complete quickly (as you may have forgotten what it is about).

This one surprised me. I expected somewhat a stronger relationship than the 0.166 correlation. This shows there is some relationship but it is weak and therefore not going to impact your cycle time how quickly you do (or don’t!) start work on an item.

The number of comments on a work item

The number of comments made on a work item is the next measure to look at. The idea with this measure would be that more comments likely mean items take longer, due to their being ambiguity around the work item, blockers/delays, feedback etc.

Interestingly in this dataset there was minimal correlation, with a correlation coefficient of 0.147. This suggests there is a slight tendency for work items with more comments to have a longer cycle time, but we can see that after 12 or so comments this doesn’t seem to be true. This could be that by this point, clarification is reached/issues are resolved. Of course, once we go past this value there are far less items that have that amount of comments.

The number of updates made to a work item

How often a work item is updated is the next measure to consider. The rationale for this being teams are often focused on ensuring work items are ‘up to date’ and trying to avoid them going stale on the board:

An update is any change made to an item which, of course means that automations could be in place to skew the results. With the data used, it was very hard to determine those which were automated updates vs. genuine ones, which means there is a shortcoming in using this. There were some extreme outliers with more than 120 updates, which were easy to filter out. However once I started going past this point there was no way to easily determine which were automated vs. genuine (and I was not going to do this for all 15,421 work items!).

Interestingly here we see a somewhat stronger correlation than before, of 0.261. This is on the weak to moderate scale correlation wise. Of course this does not mean just automating updates to work items will improve flow!

The number of board columns a team has

The next measure to consider is the number of board columns a team has. The reason for looking at this is that there are different schools of thought around how ‘granular’ you should go with your board design. Some argue that To Do | Doing | Done is all that is needed. Others would say viewing by specialism helps see bottlenecks and some would even say more high-level views (e.g. Options | Identifying the problem | Solving the problem | Learning) encourages greater collaboration.

The results show that really, it doesn’t matter what you do. The weak correlation of 0.046 shows that really, board columns don’t have any part to play in relation to speed.

Flow Efficiency

Flow efficiency is an adaptation from the lean world metric of process efficiency. This is where for a particular work item we measure the percentage of active time — i.e., time spent actually working on the item against the total time (active time + waiting time) that it took to for the item to complete.

I have blogged about the pitfalls of it as a metric before, as well as challenging what organisations see as ‘typical’ flow efficiency.

This one was probably the most surprising. A correlation coefficient of -0.343 suggests a moderate negative correlation. What this means is that as Flow Efficiency increases, Cycle Time tends to decrease. The correlation of -0.343 shows this the relationship between the two whilst not very strong is certainly meaningful.

The number of times a work item was blocked

The final measure was looking at how often a work item was blocked. The thinking with this one would be if work is frequently getting blocked then surely this will increase the cycle time.

It’s worth noting a shortcoming here is not how long it was blocked for, just how often blocked. So, for example, if an item was blocked once but it was blocked for nearly all the cycle time, it would still only register as being blocked once. Similarly, this is obviously dependant on teams blocking work when it is actually blocked (and/or having a clear definition of blocked).

Here we have the weakest of all correlations, 0.021. This really surprised me as I would have thought the blocker frequency would impact cycle time, but the results of this suggest otherwise.

Summary

So what does this look like when we bring it all together? Copying the same style of DoubleLoop, we can start to see which of our measures have the strongest and weakest relationship with Cycle Time:

What does this mean for you and your teams?

Well, it’s clear that Work Item Age is the key metric to focus on, given just how closely it correlates with Cycle Time. If you’re trying to improve (reduce) Cycle Time without looking at Work Item Age, you really are wasting your efforts.

After that, you want to consider how long something has been on the backlog for (i.e. how long it was since it was created). Keeping work items regularly updated is the next thing you can be doing to reduce cycle time. Following this, retaining a balance of the time taken to start a work item and keeping an eye on the comment count would be something to consider.

The number of board columns a team has and how often work is marked as blocked seem to have no bearing on cycle time. So don’t worry too much about how simplified or complex your kanban board is, or focusing retros on those items blocked the most. That being said, a shortcoming of this data is that it is missing the impact of blockers.

Finally, stop caring so much about flow efficiency! Optimising flow efficiency is more than likely not going to make work flow faster, no matter what your favourite thought leader might say.

Nicolas Brown

24 January 2025

Continuously right-sizing your Jira Epics using Power Automate

Nicolas Brown

24 January 2025

A guide on how you can automate the continuous right-sizing of your Jira Epics using it’s REST API and Power Automate…

Context

Right-sizing is a flow management practice that ensures work items remain within a manageable size. Most teams apply this at the User Story level, using historical Cycle Time data to determine the 85th percentile as their right-size, meaning 85% of items are completed within a set number of days. This is often referred to as a Service Level Expectation (SLE).

“85% of the time we complete items in 7 days or less”

In combination with this, teams use Work Item Age, the amount of elapsed time in calendar days since a work item started, to proactively manage the flow of work that is “in-progress”. Previously I have shared how you can automate the Work Item Age to your Jira issues.

Right-sizing isn’t limited to Stories — it also applies to Epics, which group multiple Stories. For Epics, right-sizing means keeping the child item count below a manageable limit.

To understand what this right-size is, we choose a selected date range, plotting our completed Epics against the date they were completed and the number of child items they had. We can then use percentiles to derive what our ‘right-size’ is (again typically taking the 85th percentile):

Good teams will then use this data to proactively check their current ‘open’ Epics (those in progress/yet to start) and see if those Epics are right-sized:

Right-sizing brings many benefits for teams as it means faster feedback, reduced risk and improved predictability. The challenge is that this data/information will almost always live in a different location to the teams’ work. In order for practices such as right-sizing to become adopted by more teams it needs to be simple and visible every day so that teams are informed around their slicing efforts and growth of Epic size.

Thankfully, we can leverage tools like Power Automate, combined with Jira’s REST API to make this information readily accessible to all teams…

Prerequisites

This guide assumes the following prerequisites:

Within Jira you have access to add custom fields to a project
You know how to create (and have created) an API token in Jira
Statuses/status categories in your Jira project are configured correctly
Epics are owned by a single ‘team’ — rather than shared across teams
Epics regularly flow to ‘done’ — rather than remaining open forever
You have access to Power Automate and can create premium flows

With all those in place — let’s get started!

Adding a custom field for if an Epic is right-sized is not

We first need to add a new field into our Epics called Right-sized? As we are focusing on right-sizing of Epics, for the purpose of simplicity in this blog we will stick to Epic as the issue type we will set this up for.

Please note, if you are wanting to do this for multiple issue types you will have to repeat the process of adding this field for each work item type.

Click on ‘Project settings’ then choose Epic
Choose ‘Text’ and give the field the name of Rightsized
Add any more information if you want to do so (optional)
Once done, click ‘Save Changes’

We also then need to find out what this new custom field is called, as we will be querying this in the API. To do so, follow this guide that Caidyrn Roder pointed me to previously.

Understanding our Epic right-size

As mentioned earlier in the blog, we plot our completed Epics over a given time period (in this case 12 weeks) against the date they were completed on and the number of child items those Epics had. We can then draw percentiles against our data to understand our ‘right-size’:

If you’re wondering where the tools are to do this, I have a free template for Power BI you can download and connect to/visualise your Jira data.

For the purpose of simplicity, in this blog we’re going to choose our 85th percentile as our right-size value so, for this team, they have a right-size of 14 child items or less.

Automating our right-size check

Start by going to Power Automate and creating a Scheduled cloud flow. Call this whatever you want but we want this to run every day at a time that makes sense (probably before people start work). Once you’re happy with the time click create:

Next we need to click ‘+ new step’ to Initialize variable — this is essentially where we will ‘store’ what our Rightsize is which, to start with, will be an Integer with a value of 0:

We’re going to repeat this step a few more times, as we’re Initialize variable for ranking Epics (as a ‘float’ type) by their child item count:

Then we will Initialize Variable to flatten our array value, which we’re going to need towards the end of the flow to get our data in the format we need it to be in to do the necessary calculations:

Our final Initialize Variable will be for our Interpolated Value, which is a ‘float’ value we’re going to need when it comes to calculating the percentile for our right-sizing:

Then we’re going to choose a HTTP step to get back all our Epics completed in the last 12 weeks. You’ll need to set the method as ‘GET’ and add in the the URL. The URL (replace JIRAINSTANCE and PROJECT with your details) should be:

https://JIRAINSTANCE.atlassian.net/rest/api/3/search?jql=project%20%3D%20PROJECT%20AND%20statuscategory%20%3D%20Complete%20AND%20statuscategorychangeddate%20%3E%3D%20-12w%20AND%20hierarchyLevel%20%3D%201&fields=id&maxResults=100

Click ‘Show advanced options’ to add in your access token details:

Then we need to add in a Parse JSON step. This is where we are essentially going to extract our the Issue Key from our completed Epics. Choose ‘body’ as the content and add a schema like so:

{
    "type": "object",
    "properties": {
        "expand": {
            "type": "string"
        },
        "startAt": {
            "type": "integer"
        },
        "maxResults": {
            "type": "integer"
        },
        "total": {
            "type": "integer"
        },
        "issues": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "expand": {
                        "type": "string"
                    },
                    "id": {
                        "type": "string"
                    },
                    "self": {
                        "type": "string"
                    },
                    "key": {
                        "type": "string"
                    }
                },
                "required": [
                    "expand",
                    "id",
                    "self",
                    "key"
                ]
            }
        }
    }
}

Then we’re then going to add an Apply to each step, using the ‘issues’ value from our previous step. Then add a HTTP actionwhere we’re going to take the child count for each Epic. The first part of the URL (replace JIRAINSTANCE with your details) should be:

https://JIRAINSTANCE.atlassian.net/rest/api/3/search?jql=%27Parent%27=

Then the id, and then:

%20AND%20hierarchyLevel=0&maxResults=0

Which should then look like so:

Don’t forget to click ‘Show advanced options’ and add your access token details again. Then we’re going to add a Parse JSON action using Body as the content and the following schema:

{
    "type": "object",
    "properties": {
        "startAt": {
            "type": "integer"
        },
        "maxResults": {
            "type": "integer"
        },
        "total": {
            "type": "integer"
        },
        "issues": {
            "type": "array"
        }
    }
}

Which should look like so:

Next add a Compose action with the total from the previos step:

Next we’re going to Append to array variable the output of this to our ‘FlattenedArray’ variable:

Then we’re going to go outside our Apply to each loop and add a Compose step to sort our child item counts:

sort(variables('FlattenedArray'))

Then we’re going to add a Set Variable step where we’re going to set our Rank variable using the following expression:

float(add(mul(0.85, sub(length(outputs('SortedCounts')), 1)), 1))

Next we’re going to do the part where we work out our 85th percentile. To start with, we first need to figure out the integer part. Add a compose action with the following expression:

int(substring(string(variables('rank')), 0, indexOf(string(variables('rank')), '.')))

Then add another compose part for the fractional part, using the expression of:

sub(float(variables('rank')), int(substring(string(variables('rank')), 0, indexOf(string(variables('rank')), '.'))))

Then we’re going to add a Compose step for formatting this to be one decimal place, we do using:

formatNumber(outputs('Compose_FractionalPart'), 'N1')

Then we’re going to initialize another variable, which we do simply to “re-sort” our array (I found in testing this was needed). This will have a value of:

sort(variables('FlattenedArray'))

Then we’re going to set our FlattenedArray variable to be the output of this step:

Then we need to calculate the value at our Integer position:

variables('FlattenedArray')[sub(int(outputs('Compose_IntegerPart')), 1)]

Then do the same again for the value at the next integer position:

variables('FlattenedArray')[outputs('Compose_IntegerPart')]

Then add a compose for our interpolated value:

add(
    outputs('Compose_ValueAtIntegerPosition'),
    mul(
        outputs('Compose_FractionalPart'),
        sub(
            outputs('Compose_ValueAtNextIntegerPosition'),
            outputs('Compose_ValueAtIntegerPosition')
        )
    )
)

Remember the variable we created at the beginning for this? This is where we need it again, using the outputs of the previous step to set this as our InterpolatedValue variable:

Then we need to add a Compose step:

if(
    greaterOrEquals(mod(variables('InterpolatedValue'), 1), 0.5),
    formatNumber(variables('InterpolatedValue'), '0'),
    if(
        less(mod(variables('InterpolatedValue'), 1), 0.5),
        if(
            equals(mod(variables('InterpolatedValue'), 1), 0),
            formatNumber(variables('InterpolatedValue'), '0'),
            add(int(first(split(string(variables('InterpolatedValue')), '.'))), 1)
        ),
        first(split(string(variables('InterpolatedValue')), '.'))
    )
)

Then we just need to reformat this to be an integer:

int(outputs('Compose'))

Then we use the output of this to set our rightsize variable:

Next step is to use HTTP again, this time getting all our open Epics in Jira. It should be a GET with the URL (replace JIRAINSTANCE and PROJECT with your details) of:

https://JIRAINSTANCE.atlassian.net/rest/api/3/search?jql=project%20%3D%20PROJECT%20AND%20statuscategory%20%21%3D%20Done%20AND%20hierarchyLevel%20%3D%201%0AORDER%20BY%20created%20DESC&fields=id&maxResults=100

Again, don’t forget to click ‘Show advanced options’ and add in your access token details.

Next we’re going to add a Parse JSON step with the ‘body’ of the previous step and the following schema:

{
    "type": "object",
    "properties": {
        "expand": {
            "type": "string"
        },
        "startAt": {
            "type": "integer"
        },
        "maxResults": {
            "type": "integer"
        },
        "total": {
            "type": "integer"
        },
        "issues": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "expand": {
                        "type": "string"
                    },
                    "id": {
                        "type": "string"
                    },
                    "self": {
                        "type": "string"
                    },
                    "key": {
                        "type": "string"
                    }
                },
                "required": [
                    "expand",
                    "id",
                    "self",
                    "key"
                ]
            }
        }
    }
}

Then you’re going to add in an Apply to each step, using issues from the previous step. Add in a HTTP step, the first part of the URL (replace JIRAINSTANCE with your details) should be:

https://JIRAINSTANCE.atlassian.net/rest/api/3/search?jql=%27Parent%27=

Add in the id field from our Parse JSON step then follow it with:

%20AND%20hierarchyLevel=0&maxResults=0

Which looks like so:

You should know by now what to do with your access token details ;)

Add a Parse JSON with body of the previous step and the following schema:

{
    "type": "object",
    "properties": {
        "startAt": {
            "type": "integer"
        },
        "maxResults": {
            "type": "integer"
        },
        "total": {
            "type": "integer"
        },
        "issues": {
            "type": "array"
        }
    }
}

Then add a Compose step where we’re just going to take the total of the previous step:

Finally, we’re going to add a condition. Here we’ll look at each open Epic and see if the child count is less than or equal to our Rightsize variable:

If yes, then we add an Edit Issue (V2) step where we add in our Jira instance, the Issue ID (which we get from a previous step) and, crucially, what our customfield is for our ‘right-sized’ value (remember at the beginning when we worked out what this was? If not go back and re-read!). We update this with “No” if it’s greater than the right-size, or “yes” if it is not:

And that’s it — you’re (finally) done!

If you run the automation, then it should successfully update your Epics if they are/are not right-sized:

It’s worth noting that any Epics with 0 child items aren’t updated with yes/no, purely due to this likely being too early on in the process. Saying an Epic with 0 child items is ‘right-sized’ feels wrong to me but you are welcome to amend the flow if you disagree!

By implementing continuous right-sizing in Jira using Power Automate, teams can drive faster feedback loops, reduce delivery risks, and improve predictability. Automating the right-sizing check ensures the data remains visible and actionable, empowering teams to stay focused on maintaining manageable work sizes. With this flow in place, you’re not just optimising Epics — you’re fostering a culture of continuous improvement and efficiency.

Nicolas Brown

16 January 2025

Continuously right-sizing your Azure DevOps Features using Power Automate

Nicolas Brown

16 January 2025

A guide to continuously right-sizing Azure DevOps (ADO) Features using OData queries and Power Automate…

Context

Right-sizing is a flow management practice that ensures work items remain within a manageable size. Most teams apply this at the User Story or Product Backlog Item (PBI) level, using historical Cycle Time data to determine the 85th percentile as their right-size, meaning 85% of items are completed within a set number of days. This is often referred to as a Service Level Expectation (SLE).

“85% of the time we complete items in 7 days or less”

Right-sizing isn’t limited to Stories or PBIs — it also applies to Features (and Epics!), which group multiple Stories or PBIs. For Features, right-sizing means keeping the child item count below a manageable limit.

To understand what this right-size is, we choose a selected date range, plotting our completed Features against the date they were completed and the number of child items they had. We can then use percentiles to derive what our ‘right-size’ is (again typically taking the 85th percentile):

Good teams will then use this data to proactively check their current ‘open’ Features (those in progress/yet to start) and see if those Features are right-sized:

Thankfully, we can leverage tools like Power Automate, combined with ADO’s OData queries to make this information readily accessible to all teams…

Prerequisites

This guide assumes the following prerequisites:

You have an inherited process template in ADO and access to edit this
You know how to generate a Personal Access Token (PAT)
States and state categories in your process template are configured correctly
Features are owned by a single ‘team’ — rather than shared across teams
Features regularly flow to ‘done’ — rather than remaining open forever
You have access to Power Automate and can create premium flows

With all those in place — let’s get started!

Adding a custom field for if a Feature is right-sized is not

We first need to add a new field into our process template in ADO called Rightsized. As we are focusing on right-sizing of Features, for the purpose of simplicity in this blog we will stick to Feature as the work item type we will set this up for and be using an inheritance of the Scrum process template.

Please note, if you are wanting to do this for multiple work item types you will have to repeat the process of adding this field for each work item type.

Find the Feature type in your inherited process template work items list
Click into it and click ‘new field’
Add the Rightsized field — ensuring you specify it as picklist type field with two options, “Yes” or “No”

Understanding our Feature right-size

As mentioned earlier in the blog, we plot our completed Features over a given time period (in this case 12 weeks) against the date they were completed on and the number of child items those Features had. We can then draw percentiles against our data to understand our ‘right-size’:

If you’re wondering where the tools are to do this, I have a free template for Power BI you can download and connect to/visualise your ADO data.

For the purpose of simplicity, in this blog we’re going to choose our 85th percentile as our right-size value so, for this team, they have a right-size of 14 child items or less.

Automating our right-size check

Start by creating two ADO queries for automation. The first will retrieve completed Features within a specified time range. A 12-week period (roughly a quarter) works well as a baseline but can be adjusted based on your planning cadence. In this example, we’re querying any Features that were ‘Completed’ in the last 12 weeks, that are owned by a particular team (in this example they are under a given Area Path):

Save that query with a memorable name (I went with ‘Rightsizing Part 1’) as we’ll need it later.

Then we’re going to create a second query for all our ‘open’ Features. Here you’re going to do a ‘Not In’ for your completed/removed states and those that are owned by this same team (again here I’ll be using Area Path):

Make sure ‘rightsized’ is added as a column option and save that query with a memorable name (I went with ‘Rightsizing Part 2’) as we’re going to need it later.

Next we go to Power Automate and we create a Scheduled cloud flow. Call this whatever you want but we want this to run every day at a time that makes sense (probably before people start work). Once you’re happy with the time click create:

Next we need to click ‘+ new step’ and add an action to Get query results from the first query we just set up. ensuring that we input the relevant ‘Organization Name’ and ‘Project Name’ where we created the query:

Following this we are going to add a step to Initialize variable — this is essentially where we will ‘store’ what our Rightsize is which, to start with, will be an Integer with a value of 0:

We’re going to repeat this step a few more times, as we’re Initialize variable for ranking Features (as a ‘float’ type) by their child item count:

Our final Initialize Variable will be for our Interpolated Value, which is a ‘float’ value we’re going to need when it comes to calculating the percentile for our right-sizing:

Then we are going to add an Apply to each step where we’ll select the ‘value’ from our ‘Get query results’ step as the starting point:

Then we’re going to choose a HTTP step. You’ll need to set the method as ‘GET’ and add in the the URL. The first part of the URL (replace ORG and PROJECT with your details) should be:

https://analytics.dev.azure.com/ORG/PROJECT/_odata/v3.0-preview/WorkItems?%20$filter=WorkItemId%20eq%20

Add in the dynamic content of ‘Id’ from our Get work item details step, then add in:

%20&$select=WorkItemID&$expand=Descendants(%20$apply=filter(WorkItemType%20ne%20%27Test%20Case%27%20and%20StateCategory%20eq%20%27Completed%27%20and%20WorkItemType%20ne%20%27Task%27%20and%20WorkItemType%20ne%20%27Test%20Plan%27%20and%20WorkItemType%20ne%20%27Shared%20Parameter%27%20and%20WorkItemType%20ne%20%27Shared%20Steps%27%20and%20WorkItemType%20ne%20%27Test%20Suite%27%20and%20WorkItemType%20ne%20%27Impediment%27%20)%20/groupby((Count),%20aggregate($count%20as%20DescendantCount))%20)

Which should look like:

Click ‘Show advanced options’ and add your PAT details — Set authentication to ‘Basic,’ enter ‘dummy’ as the username, and paste your PAT as the password:

PAT blurred for obvious reasons!

Then we need to add in a Parse JSON step. This is where we are essentially going to extract our count of child items from our completed Features. Choose ‘body’ as the content and add a schema like so:

{
    "type": "object",
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "vsts.warnings@odata.type": {
            "type": "string"
        },
        "@@vsts.warnings": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "value": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "WorkItemId": {
                        "type": "integer"
                    },
                    "Descendants": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "@@odata.id": {},
                                "Count": {
                                    "type": "integer"
                                },
                                "DescendantCount": {
                                    "type": "integer"
                                }
                            },
                            "required": [
                                "@@odata.id",
                                "Count",
                                "DescendantCount"
                            ]
                        }
                    }
                },
                "required": [
                    "WorkItemId",
                    "Descendants"
                ]
            }
        }
    }
}

Then we’re going to go outside our Apply to each loop and add a Compose step to sort our child item counts:

Then we’re going to add a Set Variable step where we’re going to set our Rank variable using the following expression:

float(add(mul(0.85, sub(length(outputs('SortedCounts')), 1)), 1))

Next we’re going to do the part where we work out our 85th percentile. To start with, we first need to figure out the integer part. Add a compose action with the following expression:

int(substring(string(variables('rank')), 0, indexOf(string(variables('rank')), '.')))

Then add another compose part for the fractional part, using the expression of:

sub(float(variables('rank')), int(substring(string(variables('rank')), 0, indexOf(string(variables('rank')), '.'))))

Then we’re going to add a Compose step for formatting this to be one decimal place, we do using:

formatNumber(outputs('Compose_FractionalPart'), 'N1')

Then we’re going to initialize another variable, which we do simply to “re-sort” our array (I found in testing this was needed). This will have a value of:

sort(variables('FlattenedArray'))

Then we’re going to set our FlattenedArray variable to be the output of this step:

Then we need to calculate the value at our Integer position:

variables('FlattenedArray')[sub(int(outputs('Compose_IntegerPart')), 1)]

Then do the same again for the value at the next integer position:

variables('FlattenedArray')[outputs('Compose_IntegerPart')]

Then add a compose for our interpolated value:

add(
    outputs('Compose_ValueAtIntegerPosition'),
    mul(
        outputs('Compose_FractionalPart'),
        sub(
            outputs('Compose_ValueAtNextIntegerPosition'),
            outputs('Compose_ValueAtIntegerPosition')
        )
    )
)

Remember the variable we created at the beginning for this? This is where we need it again, using the outputs of the previous step to set this as our InterpolatedValue variable:

Then we need to add a Compose step:

if(
    greaterOrEquals(mod(variables('InterpolatedValue'), 1), 0.5),
    formatNumber(variables('InterpolatedValue'), '0'),
    if(
        less(mod(variables('InterpolatedValue'), 1), 0.5),
        if(
            equals(mod(variables('InterpolatedValue'), 1), 0),
            formatNumber(variables('InterpolatedValue'), '0'),
            add(int(first(split(string(variables('InterpolatedValue')), '.'))), 1)
        ),
        first(split(string(variables('InterpolatedValue')), '.'))
    )
)

Then we just need to reformat this to be an integer:

int(outputs('Compose'))

Then we use the output of this to set our rightsize variable:

Next step is to use ADO again, get query results of our second query we created for all our ‘open’ features:

Then we’re going to add an Apply to each step using the value from the previous step. Then a HTTP step with the method as ‘GET’ and add in the the URL. This URL is different than the one above! The first part of the URL (replace ORG and PROJECT with your details) should be:

https://analytics.dev.azure.com/ORG/PROJECT/_odata/v3.0-preview/WorkItems?%20$filter=WorkItemId%20eq%20

Add in the dynamic content of ‘Id’ from our Get work item details step, then after the Id, add in:

%20&$select=WorkItemID&$expand=Descendants(%20$apply=filter(WorkItemType%20ne%20%27Test%20Case%27%20and%20State%20ne%20%27Removed%27%20and%20WorkItemType%20ne%20%27Task%27%20and%20WorkItemType%20ne%20%27Test%20Plan%27%20and%20WorkItemType%20ne%20%27Shared%20Parameter%27%20and%20WorkItemType%20ne%20%27Shared%20Steps%27%20and%20WorkItemType%20ne%20%27Test%20Suite%27%20and%20WorkItemType%20ne%20%27Impediment%27%20)%20/groupby((Count),%20aggregate($count%20as%20DescendantCount))%20)

Which should look like:

Make sure again to click ‘Show advanced options’ to add in your PAT details. Then add a Parse JSON step, this is where we are essentially going to extract our count of child items from our open Features. Choose ‘body’ as the content and add a schema like so:

{
    "type": "object",
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "vsts.warnings@odata.type": {
            "type": "string"
        },
        "@@vsts.warnings": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "value": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "WorkItemId": {
                        "type": "integer"
                    },
                    "Descendants": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "@@odata.id": {},
                                "Count": {
                                    "type": "integer"
                                },
                                "DescendantCount": {
                                    "type": "integer"
                                }
                            },
                            "required": [
                                "@@odata.id",
                                "Count",
                                "DescendantCount"
                            ]
                        }
                    }
                },
                "required": [
                    "WorkItemId",
                    "Descendants"
                ]
            }
        }
    }
}

This is where it gets a bit more complicated, we’re then going to add an Apply to each step, using the value from our previous step. Then do ANOTHER Apply to each where we’re going to take the descendant count for each ‘open’ Feature. Then we’re going to add a condition step. Where we’ll look at each open Feature and see if the Descendant count is less than or equal to our Rightsize variable:

If yes, then we add an update work item step and ensure that we are choosing the ID and Work Item Type of the item from our query previously, and setting the Rightsized value to “Yes”. If no, then do the same as above but ensure you’re setting the Rightsized value to “No”:

And that’s it — you’re (finally) done!

If you run the automation, then it should successfully update your Features if they are/are not right-sized:

It’s worth noting that any Features with 0 child items aren’t updated with yes/no, purely due to this likely being too early on in the process. Saying a Feature with 0 child items is ‘right-sized’ feels wrong to me but you are welcome to amend the flow if you disagree!

There may be tweaks to the above you could make to improve flow performance, currently it runs at 1–2 minutes in duration:

By implementing continuous right-sizing in Azure DevOps using Power Automate, teams can drive faster feedback loops, reduce delivery risks, and improve predictability. Automating the right-sizing check ensures the data remains visible and actionable, empowering teams to stay focused on maintaining manageable work sizes. With this flow in place, you’re not just optimising features — you’re fostering a culture of continuous improvement and efficiency.

Stay tuned for my next post, where I’ll explore how to apply the same principles to Epics in Jira!

Nicolas Brown

9 January 2025

Signal vs Noise: How Process Behaviour Charts can enable more effective Product Operations

Nicolas Brown

9 January 2025

In today’s product world, being data-driven (or data-led) is a common goal, but misinterpreting data can lead to wasted resources and missed opportunities. For Product Operations teams, distinguishing meaningful trends (signal) from random fluctuations (noise) is critical. Process Behaviour Charts (PBCs) provide a powerful tool to focus on what truly matters, enabling smarter decisions and avoiding costly mistakes…

What Product Operations is (and is not)

Effective enablement is the cornerstone of Product Operations. Unfortunately, many teams risk becoming what Marty Cagan calls “process people” or even the reincarnated PMO. Thankfully, Melissa Perri and Denise Tilles provide clear guidance in their book, Product Operations: How successful companies build better products at scale, which outlines how to establish a value-adding Product Operations function.

In the book there are three core pillars to focus on to make Product Operations successful. I won’t spoil the other two, but the one to focus on for this post is the data and insights pillar. This is is all about enabling product teams to make informed, evidence-based decisions by ensuring they have access to reliable, actionable data. Typically this means centralising and democratizing product metrics and trying to foster a culture of continuous learning through insights. In order to do this we need to visualise data, but how can we make sure we’re enabling in the most effective way in doing this?

Visualising data and separating signal from noise

When it comes to visualising data, another must read book is Understanding Variation: The Key To Managing Chaos by Donald Wheeler. This book highlights so much about the fallacies in organisations that use data to monitor performance improvements. It explains how to effectively interpret data in the context of improvement and decision-making, whilst emphasising the need to understand variation as a critical factor in managing and improving performance. The book does this through the introduction of a Process Behaviour Chart (PBC). A PBC is a type of graph that visualises the variation in a process over time. It consists of a running record of data points, a central line that represents the average value, and upper and lower limits (referred to as Upper Natural Process Limit — UNPL and Lower Natural Process Limit — LNPL) that define the boundaries of routine variation. A PBC can help to distinguish between common causes and exceptional causes of variation, and to assess the predictability and stability of data/a process.

An example of a PBC is the the chart below, where the daily takings on the fourth Saturday of the month could be ‘exceptional variation’ compared to normal days:

Deming Alliance — Process Behaviour Charts — An Introduction

If we bring these ideas together, an effective Product Operations department that is focusing on insights and data should be all about distinguishing signal from noise. If you aren’t familiar with the term, signal is what you should be looking at, this is the meaningful information you want to focus on, after all, the clue is in the name! Noise is all the random variation that interferes with it. If you want to learn more, the book The Signal and The Noise is another great resource to aid your learning around this topic. Unfortunately, too often in organisations when we work with data people wrongly misinterpret that which is noise to in fact be signal. For Product Operations to be adding value, we need to be pointing our Product teams to signals and cutting out the noise in typical metrics we track.

But what good is theory without practical application?

An example

Let’s take a look at four user/customer related metrics for an eCommerce site from the beginning of December up until Christmas last year:

The use of colour in the table draws the viewer to this information as it is highlighted. What then ends up happening is a supporting narrative something like so, which typically comes from those monitoring the numbers:

The problem here is that noise (expected variation in data) is being mistaken for signal (exceptional variation we need to investigate), particularly as it is influenced through the use of colour (specifically the RAG scale). The metrics of Unique Visitors and Orders contain no use of colour so there’s no way to determine what, if anything, we should be looking at. Finally, our line charts don’t really tell us anything other than if values are above/below average and potentially trending.

A Product Operations team shows value-add in enabling the organisation be more effective in spotting opportunities and/or significant events that others may not see. If you’re a PM working on a new initiative/feature/experiment, you want to know if there are any shifts in the key metrics you’re looking at. Visualising it in this ‘generic’ way doesn’t allow us to see that or, could in fact be creating a narrative that isn’t true. This is where PBCs can help us. They can highlight where we’re seeing exceptional variation in our data.

Coming back to our original example, let’s redesign our line chart to be a PBC and make better usage of colour to highlight large changes in our metrics:

We can see that we weren’t ‘completely’ wrong, although we have missed out a lot more useful information. We can see that Conversion Rate for the 13th and 20th December was in fact exceptional variation from the norm, so the colour highlighting of this did make sense. However, the narrative around Conversion Rate performing badly at the start of the month (with the red background in the cells in our original table) as well as up to and including Christmas is not true, as this was just routine variation that was within values we expected.

For ABV we can also see that there was no significant event up to and including Christmas, so it neither performed ‘well’ or ‘not so well’ as the values every day were within our expected variation. What is interesting is that we can see we have dates where we have seen exceptional variation in both our Orders and Unique Visitors, which should prompt further investigation. I say further investigation as these charts, like nearly all data visualisation doesn’t give you answers, it just gets you asking better questions. It’s worth noting that for certain events (e.g. Black Friday) these may appear as ‘signal’ in your data but in fact it’s pretty obvious as to the cause.

Exceptional variation in terms of identifying those significant events isn’t the only usage for PBCs. We can use these to spot other, more subtle changes in data. Instead of large changes we can also look at moderate changes. These draw attention to patterns inside ‘noisy’ data that you might want to investigate (of course after you’ve checked out those exceptional variation values). For simplicity, this happens when two out of three points in a row are noticeably higher than usual (above a certain threshold not shown in these charts). This can provide new insight that wasn’t seen previously, such as our metrics of Unique Visitors and Orders, which previously had no ‘signal’ to consider:

Now we can see points where there has been a moderate change. We can then start to ask questions such as could this be down to a new feature, a marketing campaign or promotional event? Have we improved our SEO? Were we running an A/B test? Or is it simply just random fluctuation?

Another use of PBCs centre on sustained shifts which, when you’re working in the world of product management is a valuable data point to have at your disposal. To be effective at building products, we have to focus on outcomes. Outcomes are a measurable change in behaviour. A measurable change in behaviour usually means a sustained (rather than one-off) shift. In PBCs, moderate, sustained shifts indicate a consistent change which, when analysing user behaviour data means a sustained change in the behaviour of people using/buying our product. This happens when four out of five points in a row are consistently higher than usual, based on a specific threshold (not shown in these charts). We can now see where we’ve had moderate, sustained shifts in our metrics:

Again we don’t know what the cause of this is but, it focuses our attention on what we have been doing around those dates. Particularly for our ABV metricwe might want to reconsider our approach given the sustained change that appears to be on the wrong side of the average.

The final sustained change focus on smaller, sustained changes. This is a run of at least 8 successive data points within the process limits on the same side of the average line (which could be above or below):

For our example here, we’re seeing this for Unique Visitors, which is good as we’re seeing a small, sustained change in the website’s traffic above the average. Even clearer is for ABV, with multiple points above the average indicating a positive (but small) shift in customer purchasing behaviour.

Key Takeaways

Hopefully, this blog provides some insight into how PBCs enable Product Operations to support data-driven decisions while avoiding common data pitfalls. By separating signal from noise, organisations can prevent costly errors like unnecessary resource allocation, misaligned strategies, or failing to act on genuine opportunities. In a data-rich world, PBCs are not just a tool for insights — they’re a safeguard against the financial and operational risks of misinterpreting data.

In terms of getting started, consider any of the metrics you look at now (or provide the organisation) as a Product Operations team. Think about how you differentiate signal from noise. What’s the story behind your data? Where should people be drawn to? How do we know when there are exceptional events or subtle shifts in our user behaviour? If you can’t easily tell or have different interpretations, then give PBCs a shot. As you grow more confident, you’ll find PBCs an invaluable tool in making sense of your data and driving product success.

If you’re interested in learning more about them, check out Wheeler’s book (I picked up mine for ~£10 on eBay) or if you’re after a shorter (and free!) way to learn as well as how to set them up with the underlying maths, check out the Deming Alliance as well as this blog from Tom Geraghty on the history of PBCs.

Nicolas Brown

6 December 2024

Outcome focused roadmaps and Feature Monte Carlo unite!

Nicolas Brown

6 December 2024

Shifting to focusing on outcomes is key for any product operating model to be a success, but how do you manage the traditional view on wanting to see dates for features, all whilst balancing uncertainty? I’ll share how you can get the best of both worlds with a Now/Next/Later X Feature Monte Carlo roadmap…

What is a roadmap?

A roadmap could be defined as one (or many) of the following:

A strategic plan that defines a goal or desired outcome(s) and includes the major steps or milestones needed to reach it
A description of how you intend to achieve your vision
A way to get your stakeholders excited about the direction you are heading in
A communication tool/high-level document that helps articulate strategic thinking (“the why”) behind both the goal(s) and plan for getting there
A means of facilitating discussions around options and scenario planning

Where do we run into challenges with roadmaps?

Unfortunately, many still view roadmaps as merely a delivery plan to execute. They simply want a list of Features and when they are going to be done by. Now, sometimes this is a perfectly valid ask, for example if efforts around marketing or sales campaigns are dependent on Features in our product and when they will ship. More often than not though, it is a sign of low psychological safety. Teams are forced to give date estimates when they know the least and are then “held to account” for meeting that date that is only formulated once, rather than being reviewed continuously based on new data and learning. Delivery is not a collaborative conversation between stakeholders and product teams, it’s a one-way conversation.

What does ‘good’ look like?

Good roadmaps are continually updated based on new information, helping you solicit feedback and test your thinking, surface potential dependencies and ultimately achieve the best outcomes with the least amount of risk and work.

In my experience, the most effective roadmaps out there find the ability to tie the vision/mission for your product to the goals, outcomes and planned features/solutions for the product. A great publicly accessible example is the AsyncAPI roadmap:

Vision & Roadmap | AsyncAPI Initiative for event-driven APIs

Here we have the whole story of the vision, goals, outcomes and the solutions (features) that will enable this all to be a success.

To be clear, I’m not saying this is the only way to roadmap, as there are tonnes of different ways you can design yours. In my experience, the Now / Next / Later roadmap, created by Janna Bastow, provides a great balance in giving insight into future trajectory whilst not being beholden to dates. There are also great templates from other well known product folk such as Melissa Perri’s one here or Roman Pichler's Go Product Roadmap to name a few. What these all have in common is they are able to tie vision, outcomes (and even measures) as well as features/solutions planned to deliver into one clear, coherent narrative.

Delivery is often the hardest part though, and crucially how do we account for when things go sideways?

The uncertainty around delivery

Software development is inherently complex, requiring probabilistic rather than deterministic thinking about delivery. This means acknowledging that there are a range of outcomes that can occur, not a single one. To make informed decisions around delivery we need to be aware of the probability of that outcome occurring so we can truly quantify the associated “risk”.

I’ve covered in a previous blog about using a Feature Monte Carlo when working on multiple features at once. This is a technique teams adopt in understanding the consequences around working on multiple Features (note: by Feature I mean a logical grouping of User Stories/Product Backlog Items), particularly if you have a date/deadline you are working towards:

An animation of a feature monte carlo chart

Please note: all Feature names are fictional for the purpose of this blog

Yet this information isn’t always readily accessible to stakeholders and means navigating to multiple sources, making it difficult to tie these Features back to the outcomes we are trying to achieve.

So how can we bring this view on uncertainty to our roadmaps?

The Now/Next/Later X Feature Monte Carlo Roadmap

The problem we’re trying to solve is how can we quickly and (ideally) cheaply create an outcome oriented view of the direction of our product, whilst still giving that insight into delivery stakeholders need, AND balance the uncertainty around the complex domain of software development?

This is where our Now/Next/Later X Feature Monte Carlo Roadmap comes into the picture.

Using Azure DevOps (ADO) as our tool of choice, which has a work item hierarchy of Epic -> Feature -> Product Backlog Item/User Story. With some supporting guidance, we can make it clear around what each level should entail:

An example work item hierarchy in Azure DevOps

You can of course rename these levels if you wish (e.g. OKR -> Feature -> Story) however we’re aiming to do this with no customisation so will stick with the “out-the-box” configuration. Understanding and using this setup is important as this will be the data that feeds into our roadmap.

Now let’s take a real scenario and show how this plays out via our roadmap. Let’s say we were working on launching a brand new loyalty system for our online eCommerce site, how might we go about it?

Starting with the outcomes, let’s define these using the Epic work item type in our backlog, and where it sits in our Now/Next/Later roadmap (using ‘tags’). We can also add in how we’ll measure if those outcomes are being achieved:

Note: you don’t have to use the description field, I just did it for simplicity purposes!

Now we can formulate the first part of our roadmap:

A Now, Next, Later roadmap generated from ADO data

For those Epics tagged in the “Now”, we’re going to decompose those (ideally doing this as team!) into multiple Features and relevant Product Backlog Items (PBIs). This of course should be done ‘just in time’, rather than doing it all up front. Techniques like user story mapping from Jeff Patton are great for this. In order to get some throughput (completed PBIs) data, the team are then going to start working through these and moving items to done. Once we have sufficient data (generally as little as 4 weeks worth is enough), we can then start to view our Feature Monte Carlo, playing around with the parameters involved:

A Feature Monte Carlo generated from ADO data

The real value emerges when we combine these two visuals. We can have the outcome oriented lens in the Now / Next / Later and, if people want to drill down to see where delivery of those Features within that Epic (Outcome) is, they can:

A now, next, later roadmap being filtered to show the Feature Monte Carlo

They can even play around with the parameters to understand just what would need to happen in order to make that Feature that’s at risk (Red/Amber) a reality (Green) for the date they have in mind:

It’s worth noting this only works when items in the “Now” have been broken down into Features. For our “Next” and “Later” views, we deliberately stop the dynamic updates as items at these horizons should never be focused on specific dates.

Similarly, we can also see where we have Features with 0 child items that aren’t included in the monte carlo forecast. This could be that either they’re yet to be broken down or that all the child items in it are complete but the Feature hasn’t yet moved to “done” — for example if it is waiting feedback. Similarly, it also highlights those Features that may not be linked to a parent Epic (Outcome):

A Feature monte carlo highlighted with Features without parents and/or children.

Using these tools allows for our roadmap becomes an automated, “living” document generated from our backlog that shows outcomes and the expected dates of the Features that can enable those outcomes to be achieved. Similarly, we can have a collaborative conversation around risk and what factors (date, confidence, scope change, WIP) are at play. In particular, leverage the power of adjusting WIP means we can finally add proof to that agile soundbite of “stop starting, start finishing”.

Interested in giving this a try? Check out the GitHub repo containing the Power BI template then plug in your ADO data to get started…

Nicolas Brown

18 September 2024

Adding Work Item Age to your Jira issues using Power Automate

Nicolas Brown

18 September 2024

A guide on how you can automate adding the flow metric of Work Item Age directly into your issues in Jira using Power Automate…

Context

As teams increase their curiosity around their flow of work, making this information as readily available to them is paramount. Flow metrics are the clear go-to as they provide great insights around predictability, responsiveness and just how sustainable a pace a team is working at. There is however, a challenge with getting teams to frequently use them. Whilst using them in a retrospective (say looking at outliers on a cycle time scatter plot) is a common practice, it is a lot harder trying to embed this into their every day conversations. There is no doubt these charts add great value but, plenty of teams forget about them in their daily sync/scrums as they will (more often than not) be focused on sharing their Kanban board. They will focus on discussing the items on the board, rather than using a flow metrics chart or dashboard, when it comes to planning for their day. As an Agile Coach, no matter how often I show it and stress the importance of it, plenty of teams that I work with still forget about the “secret sauce” of Work Item Age in their daily sync/scrum as it sits on a different URL/tool.

Example Work Item Age chart

This got me thinking about how to might overcome this and remove a ‘barrier to entry’ around flow. Thankfully, automation tools can help. We can use tools like Power Automate, combined with Jira’s REST API, to help improve the way teams work through making flow data visible…

Prerequisites

There are a few assumptions made in this series of posts:

Within Jira you have access to (or know someone who has access to) add custom fields to a project
You know how to create (and have created) an API token in Jira
Statuses/status categories in your Jira project are configured correctly
You have access to Power Automate and can create premium flows

With all those in place — let’s get started!

Adding a Work Item Age field in Jira

The first thing we need to do is add a custom field for Work Item Age. To do this, navigate to your Jira project you want to add this in. Click on ‘Project settings’ then choose a respective issue type (in this case we’re just choosing Story to keep it simple). Choose ‘Number’ and give the field the name of Work Item Age (Days) and add the description of what this is:

Once done, click ‘Save Changes’. If you want to add it for any other issue types, be sure to do so.

Finding out the name of this field

This is one of the trickier parts of this setup. When using the Jira REST API, custom fields do not give any indication as to what they refer to in their naming, simply going by ‘CustomField_12345’. So we have to figure out what our custom field for work item age is.

To do so, (edit: after posting, Caidyrn Roder pointed me to this article) populate a dummy item with a unique work item age, like I have done here with the value 1111:

Next, use the API to query that specific item and do a CTRL + F till you find that value, it will look similar to the below, just change the parts I’ve indicated you need to replace:

https://[ReplaceWithYourJiraInstanceName].atlassian.net/rest/api/2/search?jql=key%20%3D%20[ReplaceWithTheKeyOfTheItem]

My example is:
https://nickbtest.atlassian.net/rest/api/2/search?jql=key%20%3D%20ZZZSYN-38

Which we can do a quick CTRL + F for this value:

We can see that our Work Item Age field is called — customfield_12259. This will be different for you in your Jira instance! Once you have it, note it down as we’re going to need it later…

Understanding how Work Item Age is to be calculated

When it comes to Jira, you have statuses that are respective to a particular workflow. These will be statuses items move through and also map to columns on your board. Status categories are something not many folks are aware of. These are essentially ‘containers’ for where statuses sit. Thankfully, there are only three — To do, In Progress and Done. These are visible when adding statuses in your workflow:

An easier to understand visual representation of it on a kanban board would be:

What’s also helpful is that Jira doesn’t just create a timestamp when an item changes Status, it also does it when it changes Status Category. Therefore we can use this to relatively easily figure out our Work Item Age. Work Item Age is ‘the amount of elapsed time between when a work item started and the current time’. We can think of ‘started’ as being the equivalent to when an item moved ‘In Progress’ — we can thus use our StatusCategoryChangedDate as our start time, the key thing that we need to calculate Work Item Age:

Automating Work Item Age

First we need to setup the schedule for our flow to run. To do this you would navigate to Power Automate and create a ‘Scheduled cloud flow’:

The timing of this is entirely up to you but my tip would be to do be before the daily stand-up/sync. Once you’re happy with it give it a name and click ‘Create’:

Following this we are going to add a step to Initialize variable — this is essentially where we will ‘store’ what our Issue Key is in a format that Power Automate needs it to be in (an array) with a value of ‘[]’:

We are then going to add the Initialize variable step again— this time so we ‘store’ what our Work Item Age is which, to start with, will be an integer with a value of 0:

After this, we’re going to add a HTTP step. This is where we are going to GET all our ‘In Progress’ issues and the date they first went ‘In Progress’, referred to in the API as the StatusCategoryChangedDate. You’ll also notice here I am filtering on the hierarchy level of story (hierarchy level = 0 in JQL world) — this is just for simplicity reasons and can be removed if you want to do this at multiple backlog hierarchy levels:

https://[ReplaceWithYourJiraInstanceName].atlassian.net/rest/api/2/search?jql=project%20%3D%20[ReplaceWithYourJiraProjectName]%20AND%20statuscategory%20%3D%20%27In%20Progress%27%20AND%20hierarchyLevel%20%3D%20%270%27&fields=statuscategorychangedate,key My example: https://n123b.atlassian.net/rest/api/2/search?jql=project%20%3D%20ZZZSYN%20AND%20statuscategory%20%3D%20%27In%20Progress%27%20AND%20hierarchyLevel%20%3D%20%270%27&fields=statuscategorychangedate,key

You will then need to click ‘Show advanced options’ to add in your API token details. Set the authentication to ‘Basic’, add in the username of the email address associated with your Jira instance and paste your API token into the password field:

Then we will add a PARSE JSON step to format the response. Choose ‘body’ as the content and add a schema like so:

{
    "type": "object",
    "properties": {
        "expand": {
            "type": "string"
        },
        "startAt": {
            "type": "integer"
        },
        "maxResults": {
            "type": "integer"
        },
        "total": {
            "type": "integer"
        },
        "issues": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "expand": {
                        "type": "string"
                    },
                    "id": {
                        "type": "string"
                    },
                    "self": {
                        "type": "string"
                    },
                    "key": {
                        "type": "string"
                    },
                    "fields": {
                        "type": "object",
                        "properties": {
                            "statuscategorychangedate": {
                                "type": "string"
                            }
                        }
                    }
                },
                "required": [
                    "expand",
                    "id",
                    "self",
                    "key",
                    "fields"
                ]
            }
        }
    }
}

Then we need to add an Apply to each step and select the ‘issues’ value from our previous PARSE JSON step:

Then we’re going to add a Compose action — this is where we’ll calculate the Work Item Age based on todays date and the StatusCategoryChangedDate, which is done as an expression:

div(sub(ticks(utcNow()), ticks(items('Apply_to_each')?['fields']?['statuscategorychangedate'])), 864000000000)

Next we’ll add a Set Variable action to use the dynamic content outputs from the previous step for our ItemAge variable:

Then add an Append to array variable action where we’re going to append ‘key’ to our ‘KeysArray’:

Then we’ll add another Compose action where we’ll include our new KeysArray:

Then we’re going to add another Apply to each step with the outputs of our second compose as the starting point:

Then we’re going to choose the Jira Edit Issue (V2) action which we will populate with our Jira Instance (choose ‘Custom’ then just copy/paste it in), ‘key’ from our apply to each step in the Issue ID or Key field and then finally in our ‘fields’ we will add the following:

Where ItemAge is your variable from the previous steps.

Once that is done, your flow is ready — click ‘Save’ and give it a test. You will be able to see in the flow run history if it has successfully completed.

Then you should have the Work Item Age visible on the issue page:

Depending on your Jira setup, you could then configure the kanban board to also display this:

Finally, we want to make sure that the work item age field is cleared any time an item moves to ‘Done’ (or whatever your done statuses are). To do this go to the Project Settings > Automation then setup a rule like so:

That way Work Item Age will no longer be populated for completed items, as they are no longer ‘in progress’.

Hopefully this is a useful starting point for increasing awareness of Work Item Age on issues/the Jira board for your team :)

Nicolas Brown

14 August 2024

Flow, value, culture, delivery — measuring agility at ASOS (part 2)

Nicolas Brown

14 August 2024

The second in a two-part series where we share how at ASOS we are measuring agility across our teams and the wider tech organisation…

Recap on part one

For those who didn’t get a chance to read part one, previously I introduced our holistic measurement to agility, covering four themes of:

Flow — the movement of work through a teams workflow/board.
Value — the outcomes/impact in what we do and alignment with the goals and priorities of the organisation.
Culture — the mindset/behaviours we expect around teamwork, learning and continuous improvement.
Delivery — the practices we expect that account for the delivery of Epics/Features, considering both uncertainty and complexity.

Now let’s get to explaining how the results are submitted/visualized, what the rollout/adoption has been like along with our learnings and future direction.

Submitting the results

We advise teams to submit a new assessment every six to eight weeks, as experience tells us this gives enough time to see the change in a particular theme. When teams are ready to submit, they go to an online Excel form and add a new row, then add the rating and rationale for each theme:

A screenshot of a spreadsheet showing how users enter submissions

Excel online file for capturing ratings and rationale for each theme

Visualizing the results

Team view

By default, all teams and their current rating/trend, along with the date when the last assessment was run are visible upon opening the report:

A screenshot of all the team self-assessment results on one page

Note: all Team/Platform/Domain names anonymised for the purposes of this blog!

Viewers can then filter to view their team — hovering on the current rating provides the rationale as well as the previous rating and rationale. There is also a history of all the submitted ratings for the chosen theme over time:

An animation showing the user filtering on the report for a specific team

Note: all Team/Platform/Domain names anonymised for the purposes of this blog!

Filters persist across pages, so after filtering you can also then click through to the Notes/Actions page to remind yourself of what your team has identified as the thing to focus on improving:

An animation showing the user who has filtered on a team then viewing the improvement actions a team has documented

Note: all Team/Platform/Domain names anonymised for the purposes of this blog!

Platform/Domain view

Normally, we facilitate a regular discussion at ‘team of teams’ level which, depending on the size of an area, may be a number of teams in a platform or all the teams and platforms in a Domain:

A screenshot showing the self-assessment results for a particular platform

Note: all Team/Platform/Domain names anonymised for the purposes of this blog!

This helps leaders in an area understand where the collective is at, as well as being able to focus on a particular team. It also can highlight where teams in an area learn from each other, rather than just relying on an Agile Coach to advise. Again filtering persists to allow for Leaders to have a holistic view of improvements across teams:

A screenshot showing the improvement actions for a particular platform

Note: all Team/Platform/Domain names anonymised for the purposes of this blog!

This is key for leaders as it informs them in understanding how they can support an environment towards continuous improvement and agility. For example if a team was experimenting with WIP limits to improve their rating for Flow, if a leader is pushing more work to them then this probably isn’t going to result in the theme improving!

Tech Wide View

The Tech Wide View provides an overview of the most recent submissions for all teams across the organisation. We feel this gives us the clearest ‘measurement’ and holistic view of agility in our tech organisation, with the ability to hover on a specific theme to see if ratings are improving:

As coaches, this also helps inform us as to what practices/coaching areas we should be focusing on at scale, rather than trying to go after everything and/or focusing on just a specific team.

In turn we can use this data to help inform things like our own Objectives and Key Results (OKRs). We use this data to guide us on what we should be focusing on and, more importantly, if we are having impact:

A screenshot showing the OKRs for the Agile Coaching team and how they use the data from the self-assessment in their key results

Rollout, adoption and impact

In rolling this out, we were keen to stick to our principles and invite teams to complete this, rather than mandating (i.e. inflicting) it across all of our technology organisation. We used various channels (sharing directly with teams, presenting in Engineering All Hands, etc.) to advertise and market it, as well as having clear documentation around the assessment and the time commitment needed. After launching in Jan, this is the rate at which teams have submitted their first assessment:

In addition to this, we use our internal Team Designer app (more on this here) to cross-reference our coverage across domains and platforms. This allows us to see in which areas adoption is good and in which areas we need to remind/encourage folks around trialling it:

A screenshot of the Team Designer tool showing the percentage of teams in a platform/domain that have completed a self-assessment

Note: numbers may not match due to date screenshots were taken!

With any ‘product’, it’s important to consider what are appropriate product metrics to consider, particularly as we know measurable changes in behaviour from users are typically what correlate with value. One of the ways to validate if the self-assessment is adding value for teams is if they continue to use it. One-off usage may give them some insight but if it doesn’t add value, particularly with something they are ‘invited’ to use, then it will gradually die and we won’t see teams continue to complete it. Thankfully, this wasn’t the case here, as we can see that 89% of teams (70 out of 79) have submitted more than one self-assessment:

The main thing though that we are concerned with in demonstrating the impact/value of this approach is if teams are actually improving. You could still have plenty of teams adopt the self-assessment yet stay the same for every rating and never actually improve. Here we visualise each time a team has seen an improvement between assessments (note: teams are only counted the first time they improve, not counted again if they improve further):

Overall we can see the underlying story is that the vast majority of teams are improving, specifically that 83% of teams (58 out of 70) who have submitted >1 assessment have improved in one (or more) theme.

Learnings along the way

Invitation over infliction

In order to change anything around ways of working in an organisation, teams have to want to change or “opt-in”. Producing an approach without the acceptance going in that you may be completely wrong and being prepared to ditch it leads to sunk cost fallacy. It is therefore important with something like this that teams can “opt-out” at any time.

Keep it lightweight yet clear

There are many agility assessments that we have seen in different organisations/the industry over the years and almost always these fall foul of not being lightweight. You do not need to ask a team 20 questions to “find out” about their agility. Having said this, lightweight is not an excuse for lack of clarity, therefore supporting documentation on how people can find out where they are or what the themes mean is a necessity. We used a Confluence page with some fast-click links to specific content to allow people to quickly get to what they needed to get to:

A screenshot showing a Confluence wiki with quick links

Shared sessions and cadence

Another way to increase adoption is to have teams review the results together, rather than just getting them to submit and then that’s it. In many areas we, as coaches, would facilitate a regular self-assessment review for a platform or domain. In this each team would talk through their rationale for a rating whilst the others can listen in and ask questions/give ideas on how to improve. There have been times for example when ratings have been upgraded due to teams feeling they were being too harsh (which surprisingly I also agreed with!) but the majority of time there are suggestions they can make to each other. In terms of continuous improvement and learning this is way more impactful than hearing it from just an Agile Coach.

Set a high bar

One of the observations we made when rolling this out was how little ‘green’ there was in particular themes. This does not automatically equate to teams being ‘bad’, more just they are where we think good is from an agility perspective, relevant to our experience and industry trends.

One of the hard parts with this is not compromising in your view of what good looks like, even though it may not be a message that people particular like. We leaned heavily on the experience of Scott Frampton and his work at ASOS to stay true to this, even if it at times it made for uncomfortable viewing.

Make improvements visible

Initially, the spreadsheet did not contain the column about what teams are going to do differently as a result of this, it was only after a brief chat with Scott and with his learnings we implemented this. Whilst it does rely on teams to add in the detail about what they are going to do differently, it helps see that teams are identifying clear action to take, based on the results of the assessment.

Culture trumps all when it comes to improvement

This is one of the most important things when it comes to this type of approach. One idea I discussed with Dan Dickinson from our team was around a ‘most improved’ team, where a team had improved the most from their initial assessment to what they are now. In doing this one team was a clear standout, yet they remained at a value of ‘red’ for culture. This isn’t the type of team we should be celebrating, even if all the other factors have improved. Speedy delivery of valuable work with good rigour around delivery practices is ultimately pointless if people hate being in that team. All factors to the assessment are important but ultimately, you should never neglect culture.

Measure it’s impact

Finally, focus on impact. You can have lots of teams regularly assessing but ultimately, if it isn’t improving the way they work it is wasted effort. Always consider how you will validate that something like an assessment can demonstrate tangible improvements to the organisation.

What does the future hold?

As a coaching team we have a quarterly cadence of reviewing and tweaking the themes and their levels, sharing this with teams when any changes are made:

A screenshot of a Confluence page showing change history for the self-assessment

Currently, we feel that we have the right balance in the number of themes vs. the light-weightiness of the self-assessment. We have metrics/tools that could bring in other factors, such as predictability and/or quality:

A screenshot showing a Process Behaviour Chart (PBC) for work in progress and a chart showing the rate of bug completion for a team

Left — a process behaviour chart highlighting where WIP has become unpredictable | Right — a chart showing the % of Throughput which are Bugs which could be a proxy for ‘quality’

Right now we’ll continue the small tweaks each quarter with an aim to improve as many teams, platforms and domains as we can over the next 12 months…watch this space!

Nicolas Brown

23 July 2024

Flow, value, culture, delivery — measuring agility at ASOS.com (part 1)

Nicolas Brown

23 July 2024

The first in a two-part series where we share how at ASOS we are measuring agility across our teams and the wider tech organisation. This part covers the problems we were looking to solve, what the themes are, as well as exploring the four themes in detail…

Context and purpose

As a team of three coaches with ~100 teams, understanding where to spend our efforts is essential to be effective in our role. Similarly, one of our main reasons to exist in ASOS is to help understand the agility of the organisation and clearly define ‘what good looks like’.

Measuring the agility of a team (and then doing this at scale) is a difficult task and in lots of organisations this is done through the usage of a maturity model. Whilst maturity models can provide a structured framework, they often fall short in addressing the unique dynamics of each organisation, amongst many other reasons. The rigidity of these models can lead to a checklist mentality, where the focus is on ticking boxes (i.e. ‘agile compliance’) rather than fostering genuine agility. Similarly they assume everyone follows the same path where we know context differs.

Such as this…

Unfortunately, we also found we had teams at ASOS that were focusing on the wrong things when it comes to agility such as:

Planned vs. actual items per sprint (say-do ratio)
How many items ‘spill over’ to the next sprint
How many story points they complete / what’s our velocity / what is an “8-point story” etc.
Do we follow all the agile ceremonies/events correctly

When it comes to agility, these things do not matter.

We therefore set about developing something that would allow our teams to self-assess, which focuses on the outcomes agility should lead to. With the main problems to solve being:

Aligning ASOS Tech on a common understanding of what agility is
Giving teams a lightweight approach to self-assess, rather than relying on Agile Coaches to observe and “tell them” how agile they are
Having an approach that is more up to date with industry trends, rather than how people were taught Scrum/Kanban 5, 10 or 15+ years ago
Having an approach to self-assessment that is framework agnostic yet considers our ASOS context
Allowing senior leaders to be more informed about where their teams are agility wise

Our overarching principle being that this is a tool to inform our collective efforts towards continuous improvement, and not a tool to compare teams, nor to be used as a stick to beat them with.

The Four Themes

In the spirit of being lightweight, we restricted ourself to just four themes, these being things that we think are most important for teams when it comes to the outcomes you should care about when it comes to your way of working.

Flow — the movement of work through a teams workflow/board.
Value — focusing on the outcomes/impact of what we do and alignment with the goals and priorities of the organisation.
Culture — the mindset/behaviours we expect around teamwork, learning and continuous improvement.
Delivery— the practices we expect that account for the delivery of features and epics, considering both uncertainty and complexity.

Each theme has three levels, which are rated on a Red/Amber/Green scale — mainly due to this being an existing scale of self-assessment in other tools our engineering teams have at their disposal.

Flow

The focus here is around flow metrics, specifically Cycle Time and Work Item Age, with the three levels being:

Teams already have flow metrics available to them via a Power BI app, so are able to quickly navigate and understand where they are:

The goal with this is to make teams aware just how long items are taking, as well as how long the “in-flight” items have actually been in progress. The teams that are Green are naturally just very good at breaking work down, and/or have already embedded looking at this on a regular basis into their way of working (say in retrospectives). Those teams that are at Red/Amber have since adopted techniques such as automating the age of items on the kanban board to highlight aging items which need attention:

Value

The focus with this theme is understanding the impact of what we do and ensuring that we retain alignment with the goals and priorities of the organisation:

In case you haven’t read it, in the ASOS Tech blog previously I’ve covered how we go about measuring portfolio alignment in teams. It essentially is looking at how much of a team backlog goes from PBI/User Story > Feature > Epic > Portfolio Epic. We visualise this in a line chart, where teams can see the trend as well as flipping between viewing their whole backlog vs. just in-flight work:

Similar to flow metrics, teams can quickly access the Power BI app to understand where they are for one part of value:

The second part is where many teams currently face a challenge. Understanding the impact (value) of what you deliver is essential for any organisation that truly cares about agility. We’re all familiar with feature factories, so this was a deliberate step change to get our teams away from that thinking. What teams deliver/provide support for varies, from customer facing apps to internal business unit apps and even tools or components that other teams consume, so having a ‘central location’ for looking at adoption/usage metrics is impossible. This means it can take time as either data is not readily available to teams or they had not actually really considered this themselves, most likely due to being a component part of a wider deliverable.

Still, we’ve seen good successes here, such as our AI teams who measure the impact of models they build around personalised recommendations, looking at reach and engagement. Obviously for our Web and Apps teams we have customer engagement/usage data, but we also have many teams who serve other teams/internal users, like our Data teams who look at impact in terms of report open rate and viewers/views of reports they build:

Culture

Next we look at the behaviours/interactions a team has around working together and continuously improving:

Ultimately, we’re trying to get away from the idea that many have around agility that continuous improvement = having retrospectives. These are meaningless if they are not identifying actionable (and measurable) improvements to your way of working, no matter how “fun” it is to do a barbie themed format!

We aren’t prescriptive in what team health tool teams use, so long as they are doing it. This could be our internal tool, PETALS (more on this here), the well known Spotify Team/Squad Health Check or even the Team Assessment built into the retrospectives tool in Azure DevOps:

All tools welcome!

The point is that our good teams are regularly tracking this and seeing if it is getting better.

A good example of what we are looking for at ‘green’ level is from this team who recently moved to pairing (shout out to Doug Idle and his team). Around 8 weeks before this image was taken they moved to pairing, which has not only made them happier as a team, but has clearly had a measurable impact in reducing their 85th percentile cycle time by 73%:

73% reduction (from 99 to 27 days) in 85th percentile Cycle Time as a result of pairing

Combine this with then sharing more widely, primarily so teams can learn from each other, then this is what we are after in our strongest teams culturally.

Delivery

The final theme touches on what many agile teams neglect which is ultimately about delivering. When we mean delivery in this context we’re focusing on delivery of Features/Epics (as opposed to PBI/Story level). Specifically, we believe it’s understanding risk/uncertainty and striving towards predictability and what this means when using agile principles and practices:

The good teams in this theme understand that due to software development being complex, you need to forecast delivery with a percentage confidence, and do this regularly. This means using data which, for our teams is available to them within a few clicks, here they can forecast, given a count of items when will they be done or, given a time box, what can they deliver.

Many teams have multiple features in their backlog, thus to get to ‘green’ our teams should leverage Feature Monte Carlo so the range of outcomes that could occur for multiple Features is visible:

Note: Feature list is fictional/not from any actual teams

Previously I’ve covered our approach to capacity planning and right-sizing, where teams focus on making Features no bigger than a certain size (batch) and thus can quickly (in seconds) forecast the amount of right-sized features they have capacity for, which again is what we look for in our ‘green’ criteria:

Note: Feature list is fictional/not from any actual teams

The best way to do this really is to have a regular cadence where you specifically look at delivery and these particular metrics, that way you’re informed around your progress and any items that may need breaking down/splitting.

Part two…

In this post I share how teams submit their results (and at what cadence) as well as how the results are visualised, what the rollout/adoption has been like, along with our learnings and future direction for the self-assessment…

Nicolas Brown

5 April 2024

Measuring value through portfolio alignment

Nicolas Brown

5 April 2024

Understanding and prioritising based on (potential) value is key to the success of any agile team. However, not all teams have value measures in place and often are just a component part of a delivery mechanism in an organisation. Here’s how we’re enabling our teams at ASOS Tech to better understand the value in the work they do…

Context

We’ve previously shared how we want people in our tech teams to understand the purpose in their work, rather than just blindly building feature after feature. In terms of our work breakdown structure, we use a four-level hierarchy of work items, with some flexibility around how that may look in terms of ‘standards’:

To bring this to life with an example, take our launch of ASOS Drops where the portfolio epic would represent the whole proposition/idea, with the child epic(s) representing the different domains/platforms involved:

Please note: more platforms were involved in this, this is just simplified for understanding purposes :)

We also want our teams to have a healthy balance of Feature work vs. that which is Hygiene/BAU/Tech Debt/Experimentation, with our current guidance around capacity being:

Team feature capacity will, in most instances, be work related to portfolio epics, as these are the highest priority for our organisation that tech is delivering. If someone can trace the work they are doing on a daily basis (at User Story/Product Backlog Item level) all the way to the portfolio, they should be able to see the outcomes we are striving for, the value they are delivering and ultimately how they are contributing towards ASOS’ strategy (which was consistent feedback in Vibe surveys as something our people in technology want). It is therefore a good proxy measure for (potential) value in helping teams understand just how much of their backlog compliments the priorities for ASOS. This is where portfolio alignment comes into play.

Understanding portfolio alignment

Portfolio alignment, simply put, is how much of a team backlog traces all the way up to the priorities for delivery the organisation desires from its technology department.

To calculate it, we start with a team backlog at user story/product backlog item (PBI) level. Here we look at every item at this level and to see if that item has a parent Feature. It then looks at those Features to see if they have a parent Epic. Finally, it then looks at those Epics to see if they have a parent Portfolio Epic.

To show a simplified example, imagine a backlog of 10 PBI’s that had the following linkage:

This would have an alignment score of 10%, as 1/10 PBI’s in the team backlog link all the way to the portfolio.

Even if a team backlog has good linkage at Feature and/or Epic level, it would still only receive a ‘count’ if it linked all the way. For example if this team improved their linkage like so:

This would still only result in an alignment of 10%, as only 1/10 PBI’s link all the way to the top.

As we’re looking at this on a consistent basis across teams, platforms and domains, we look purely at count, as it would simply be impossible to do any sort of single calculation around complexity.

Visualising portfolio alignment

The alignment starts at a team backlog at User Story/PBI level. Teams get two views. The top view is a line chart which shows, on a daily basis, what their backlog (any items that are yet to start and those that have started) alignment was on a particular date. The value on the far right shows their current alignment, with a summary number showing the average alignment over the selected period, as well as a trend line to see if it is improving in the selected period:

Teams also have the ability to filter between seeing this view for the whole backlog or just those “in flight” (items that are in progress):

Finally, there is the table underneath this chart which details the full backlog and all those relevant parent-child links, with every item clickable to open it up in Azure DevOps.

We also have aggregated views for both a domain (a logical grouping of teams) and ASOS tech wide view:

Rollout across tech

Rolling this out initially was a hard sell in some areas. This was mainly due to how people can immediately react to viewing ‘their’ data. Careful messaging is needed around it not being a stick/tool to beat teams with, but a method to understand (and improve) alignment. Similarly, we were very clear in that it should never be at 100% and that there isn’t a single number to hit, as context varies. This way we are accounting for any type of Goodhart's Law behaviour we may see.

Similarly, to help team leads and leaders of areas understand where they could improve, as coaches we advised around what people might want to consider to improve their alignment. Which predominantly wasn’t through improving your linking, more deleting old stuff you’re never going to do!

At times this was met with resistance, which was surprising as I always find deleting backlog items to be quite cathartic! However showing teams a large number of items that had not been touched in months or added many months (and sometimes years!) ago did prompt some real reflection as to if those items were truly needed.

Impact and outcomes

As a team, we set quarterly Objectives and Key Results (OKRs) to measure the impact we’re having across different areas in the tech organisation, ideally demonstrating a behavioural change. This was one of our focuses, particularly around demonstrating where there has been significant improvements and behavioural change from teams:

With anything agility related, it’s important to recognise those innovators and early adopters, so those that had seen a double digit improvement were informed/celebrated, with positive feedback from leaders in different areas around this being the right thing to be doing:

Portfolio alignment also now helps our teams in self-assessing their agility , as one of our four themes (more to come on this in a future post!):

This way, even our teams that struggle to measure the value in their work at least have a starting point to inform them how they are contributing to organisational priorities and strategy.

Nicolas Brown

25 January 2024

Objectively measuring “predictability”

Nicolas Brown

25 January 2024

Predictability is often one of the goals organisations seek with their agile teams but, in the complex domain, how do you move past say-do as a measurement for predictability? This post details how our teams at ASOS Tech can objectively look at this whilst accounting for variation and complexity in their work…

Predictability is often the panacea that many delivery teams and organisations seek. To be clear, I believe predictability to be one of a balanced set of themes (along with Value, Flow, Delivery and Culture), that teams and organisations should care about when it comes to agility.

Recently, I was having a conversation around this topic with one of our Lead Software Engineers about his team. He explained how the team he leads had big variation in their weekly Throughput and therefore were not predictable, with the chart looking like so:

Upon first glance, my view was the same. The big drops and spikes suggested too much variation for this to be useful from a forecasting perspective (in spite of the positive sign of an upward trend!) and that this team was not predictable.

The challenge, as practitioners, is how do we validate this perspective?

Is there a way that we can objectively measure predictability?

What predictability is not

Some of you may be reading and saying that planned vs. actual is how we can/should measure predictability. Often referred to as “say-do ratio”, this once was a fixture in the agile world with the notion of “committed” items/story points for a sprint. Sadly, many still believe this is a measure to look at, when in fact the idea of a committed number of items/points left the Scrum Guide more than 10 years ago. Measuring this has multiple negative impacts on a team, which this fantastic blog from Ez Balci explains.

Planned Vs. Actual / Committed Vs. Delivered / Say-Do are all measurement relics of the past we need to move on from. These are appropriate when the work is clear, for example when I go to the supermarket and my wife gives me a list of things we need, did I get what we needed? Did I do what I said I was going to do? Software development is complex, we are creating something (features, functionality, etc.) from nothing through writing lines of code.

About — Cynefin Framework — The Cynefin Co

Thinking about predictability as something that is ‘black and white’ like those approaches encourage simply does not work, therefore we need a better means of looking at predictability that considers this.

What we can use instead

Karl Scotland explored similar ideas around predictability in a blog post, specifically looking at the difference in percentiles of cycle time data. For example if there is a significant difference in your 50th percentile compared to your 85th percentile. This is something that as a Coach I also look at, but more to understand variation than being predictable. Karl himself shared in a talk after exploring the ideas from the blog further how this was not a useful measure around predictability.

Which brings us on to how we can do it, using a Process Behaviour Chart (PBC). A PBC is a type of graph that visualises the variation in a process over time. It consists of a running record of data points, a central line that represents the average value, and upper and lower limits (referred to as Upper Natural Process Limit — UNPL and Lower Natural Process Limit — LNPL) that define the boundaries of routine variation. A PBC can help to distinguish between common causes and exceptional causes of variation, and to assess the predictability and stability of a process.

I first gained exposure to this chart through watching the Lies, damned lies, and teens who smoke talk from Dan Vacanti, as well as learning more through one of my regular chats with a fellow coach, Matt Milton. Whilst I will try my best not to spoil the talk, Dan looks at Wilt Chamberlains points scoring over the 1962 season in a PBC and in particular if the 100 point game should be attributed to what some say it was.

Dan Vacanti — Lies, Damned Lies, and Teens Who Smoke

In his new book, Actionable Agile Metrics Volume II: Advanced Topics in Predictability, Dan goes to great lengths in explaining the underlying concepts behind variation and how to calculate/visualise PBCs for all four flow metrics of Throughput, Cycle Time, Work In Progress (WIP) and Work Item Age.

With software development being complex, we have to accept that variation is inevitable. It is about understanding how much variation is too much. PBCs can highlight to us when a team's process is predictable (within our UNPL and LNPL lines) or unpredictable (outside our UNPL and LNPL lines). It therefore can (and should) be used as an objective measurement of predictability.

Applying to our data

If we take our Throughput data shown at the beginning and put it into a PBC, we can now get a sense for if this team is predictable or not:

We can see that in fact, this team is predictable. Despite us seemingly having lots of up and down values in our Throughput, all those values are within our expected range. It is worth noting that Throughput is the type of data that is zero bound as it is impossible for us to have a negative Throughput. So, by default, our LNPL is considered to be 0.

Another benefit of these values being predictable is that it also means that we can confidently use this data as input for forecasting delivery of multiple items using Monte Carlo simulation.

What about the other flow metrics?

We can also look at the same chart for our Cycle Time, Work In Progress (WIP) and Work Item Age flow metrics. Generally, 10–20 data points is the sweet spot for the baseline data in a PBC (read the book to understand why), so we can’t quite use the same time duration as our Throughput chart (as this aggregated weekly for the last 14 weeks).

If we were to look at the most recent completed items in that same range and their Cycle Time, putting it in a PBC gives us some indication as to what we should be focusing on:

The highlighted item would be the one to look at if you were wanting to use cycle time as an improvement point for a team. Something happened with this particular item that made it significantly different than all the others in that period. This is important as, quite often, a team might look at anything above their 85th percentile, which for the same dataset looks like so:

That’s potentially four additional data points that a team might spend time looking at which were in fact just routine variation in their process. This is where the PBC helps us, helping to separate signal from noise.

With a PBC for Work In Progress (WIP), we can get a better understanding around where our WIP has increased to the point of making us unpredictable:

We often would look to see if we are within our WIP limits when in fact, there is also the possibility (as shown in this chart) of having too low WIP, as well as too high. There may be good reasons for this, for example keeping WIP low as we approach Black Friday (or as we refer to it internally as— Peak) so there is capacity if teams need to work on urgent items.

Work Item Age is where it gets the most interesting. As explained in the book, looking at this in a PBC is tricky. Considering we look at individual items and their status, how can we possibly put this in a chart that allows us to look at predictability? This is where tracking Total Work Item Age (which Dan credits to Prateek Singh) helps us:

Total Work Item Age is simply the sum of the Ages of all items that are in progress for a given time period (most likely per day). For example, let’s say you have 4 items currently in progress. The first item’s Age is 12 days. The second item’s Age is 2 days. The third’s is 6 days, and the fourth’s is 1 day. The Total Age for your process would be 12 + 2 + 6 + 1 = 21 days…using the Total Age metric a team could see how its investment is changing over time and analyse if that investment is getting out of control or not.

Plotting this gives new insight, as a team may well be keeping their WIP within limits, yet the age of those items is a cause for concern:

Interestingly, when discussing this in the ProKanban Slack, Prateek made the claim that he believes Total Work Item Age is the new “one metric to rule them all”. Stating that keeping this within limits and the other flow metrics will follow…and I think he might be onto something:

Summary

So, what does this all mean?

Well, for our team mentioned at the very beginning, the Lead Software Engineer can be pleased. Whilst it might not look it on first glance, we can objectively say that from a Throughput perspective their team is in fact predictable. When looking at the other flow metrics for this team, we can see that we still have some work to be done to understand what is causing the variation in our process.

As Coaches, we (and our teams) have another tool in the toolbox that allows us and our teams to quickly, and objectively, validate how ‘predictable’ they are. Moving to something like this allows for an objective lens on predictability, rather than relying on differing opinions of people who are interpreting data different ways. To be clear, predictability is of course not the only thing (but it is one of many) that matters. If you’d like to try the same for your teams, check out the template in this GitHub repo (shout out to Benjamin Huser-Berta for collaborating on this as well — works for both Jira and Azure DevOps).

Nicolas Brown

7 December 2023

Framework agnostic capacity planning at scale

Nicolas Brown

7 December 2023

How can you consistently plan for capacity across teams without mandating a single way of working? In this blog I’ll share how we are tackling this in ASOS Tech…

What do we mean by capacity planning?

Capacity planning is an exercise undertaken by teams for planning how much work they can complete (in terms of a number of items) for a given sprint/iteration/time duration. Sadly, many teams go incredibly detailed with this, getting into specifics of the number of hours available per individual per day, number of days holiday and, even worse, using story points:

When planning on a broader scale and at longer term horizons, say for a quarter and looking across teams, the Scaled Agile Framework (SAFe) and its Program Increment (PI) planning appears to be the most popular approach. However, with its use of normalised story points, it is quite rightly criticised due to it (whatever your views on them may be) abusing the intent of story points and, crucially, offering teams zero flexibility in choosing how they work.

At ASOS, we pride ourselves as being a technology organisation that allows teams autonomy in how they work. As Coaches, we do not mandate a single framework/way of working as we know that enforcing standardisation upon teams reduces learning and experimentation.

The problem that we as Coaches are trying to solve is aligning on a consistent understanding and way to calculate capacity across teams all whilst avoiding the mandating of a single way of working and aligning with agile principles. Our current work on this has led us down the path of taking inspiration from the work of folks like Prateek Singh in scaling flow through right-sizing and probabilistic forecasting.

Scaling Simplified: A Practitioner’s Guide to Scaling Flow eBook : Singh, Prateek: Amazon.co.uk: Books

How we are doing it

Right-sizing

Right-sizing is a practice where we acknowledge and accept that there will be variability in sizes of work items at all levels. What we focus on is, depending on backlog level, understanding what our “right-size” is. The most common type of right-sizing a team will do is taking their 85th percentile of their cycle time for items at story level, and using this as their “right-size”, saying 85% of items take n days or less. They then proactively manage items through Work Item Age, compared to their right-size:

Adapted from

John Coleman — Just use rightsizing, for goodness sake

However, as we are looking at planning for Features (since this is what business stakeholders care about), we need to do something different. Please note, when I say “Features”, what I really mean here is the backlog hierarchy level above User Story/Product Backlog Item. You may call this something different in your context (e.g. Epic), but for simplicity in this blog I will use the term “Feature” throughout.

I first learnt about this method from Prateek’s “How many bottles of whiskey will I drink in 4 months?” talk from Lean Agile Global 2021. We visualise the Features completed by the team in the last n weeks, plotting them on a scatter plot with the count of completed child items (at story level) and the date the Features were completed. We then add in percentiles to show the 50th/85th/95th percentiles for size (in terms of child item count), typically taking the 85th percentile for our right-size:

What we also do is visualise the current Features in the backlog and how they compare to the right-size value (giving this to teams ‘out the box’ we choose 85th percentile for our right-size). This way a team can quickly understand, of their current Features, which may be sized correctly (i.e. have a child item count lower than our right-size), which might be ones to watch (i.e. are the same size as our right-size) and which need breaking down (i.e. bigger than our right-size):

Please note: all Feature names are fictional for the purpose of this blog

Note that the title of the Feature is also a hyperlink for a team to open the item in their respective backlog tool (Azure DevOps or Jira), allowing them to directly take action for any changes they wish to make.

What will we get?

Now we know what our right-size for Features is, we need to figure out how many backlog items/stories we have capacity for. To do this, we are going to a run a Monte Carlo simulation to forecast how many items we will complete. I am not planning to go into detail on this approach and why it is more effective than other methods such as story points, mainly because I (and countless others!) have covered this in detail previously. We will use this to allow a team to forecast, to a percentage likelihood, the number of items the team is likely to complete in the forecasted period (in this instance 12 weeks):

It is important to note here that the historical data used as input to the forecast should contain the same mix of conditions as the future you are trying to predict. As well as this, you need to understand about the variability in your system and whether it is the right amount or too much — check out Dan Vacanti's latest book if you want more information around this. Given nearly all our teams are stable and dedicated to an application/service/part of the journey, this is generally a fair assumption for us to make.

How many Features?

Now that we have our forecast for how many items, as well as our right-size for our Features, we can calculate how many Features we have capacity for. Assuming we are using our 85th percentile, we would do this via:

Taking the 85th percentile value in our ‘What will we get?’ forecast
Divide this by our 85th percentile ‘right-size’ value
If necessary, round this number (down)
This gives us the number of ‘right-sized’ features we have capacity for

The beauty of this approach is, unlike other methods which just provide a single value in terms of capacity, with no understanding of what the risk involved with that calculation is, this method allows teams to play around with the risk appetite they have. Currently this is set to 85% but what if we were feeling more risky? For example, if we’ve paid down some tech debt recently that enables us to be more effective in delivery, then maybe 70% is better to select. Know of new joiners and people leaving your team in the coming weeks therefore need to be more risk averse? Then maybe we should be more conservative with 95%…

Tracking Feature progress

When using data for planning purposes, it is also important that we are transparent around progress with existing Features and when they are expected to complete. Another part to the template teams can use is running a Monte Carlo simulation on their current Features. We visualise Features in their priority order in the backlog along with their remaining child count, with the team able to select a target date, percentile likelihood and crucially, how many Features they work on in parallel. For a full explanation on this I recommend checking out Prateek Singhs Feature Monte Carlo blog which, combined with Troy Magennis’ multiple feature forecaster, was the basis for this chart. The Feature Monte Carlo then shows, depending on the percentage confidence chosen, which Features are likely to complete on or before the selected date, which will finish up to one week after the selected date, and which will finish more than one week after the selected date:

Please note: all Feature names are fictional for the purpose of this blog

Again, the team is able to play around with the different parameters here to understand which is the determining factor which, in almost all cases, is to limit your work in progress (WIP) — stop starting and start finishing!

Please note: all Feature names are fictional for the purpose of this blog

Aggregating information across teams

As the ASOS Tech blog has shared previously, we try to gather our teams at a cadence for our own take on quarterly planning (titled Semester Planning). We can use these techniques above to make clear what capacity a team has and, based on their current features, what may continue into another quarter and/or have scope for reprioritisation:

Capacity for 8 ‘right-sized’ features with four features that are being carried over (with their projected completion dates highlighted)

Within our technology organisation we work with a Team > Platform (multiple teams) > Domain (multiple platforms) model — a platform could therefore leverage the same information across multiple teams (in a different section of a Miro board) to present their view of capacity across teams as well as leveraging Delivery Plans to show when (in terms of dates) that capacity may be available:

Please note: all Feature names are fictional for the purpose of this blog

Domains are then also able to leverage the same information, rolling this info up one level further for a view across their Platform(s):

Please note: all Feature names are fictional for purpose of this blog

One noticeable addition at this level is the portfolio alignment value.

This is where we look at what percentage of a Domains work is linked to our overall Portfolio Epics. These portfolio items ultimately represent the highest priorities for ASOS Tech and in turn directly align to strategic priorities, something which I have covered previously in this blog. It is therefore very important we are aware of and striking the right balance between feature delivery, the needs of our platforms and tech debt/hygiene.

These techniques allow us to present a data-informed, aligned view of capacity across our technology organisation whilst still allowing our teams the freedom in choosing their own way of working (aligned to agile principles).

Conclusion

Whilst we do not mandate a single way of working, there are some practices that need to be in place for teams/platforms to leverage this, these being:

Teams and platforms regularly review and move work items (User Stories, PBIs, Features, Epics, etc.) to in progress (when started) and done (once complete)
Teams regularly monitor the size (in terms of number of child work items) of Features
At all levels we always try to break work down to thin, vertical slices
Features are ‘owned’ by a single team (i.e. not shared across multiple teams)

All teams and platforms, regardless of Scrum, Kanban, XP, DevOps, blended methods, etc. should be doing these things already if they care about agility in their way of working.

Hopefully this blog has given some insight on how you can do capacity planning, at scale, whilst allowing your teams freedom to choose their own way of working. If you are wondering what tool(s) we use for this, we have a Power BI template that teams can download, connect to their Jira/Azure DevOps project and get the info. If you want, you can give this a go with your team(s) via the GitHub repo here(don’t forget to check against the pre-requisites!).

Let me know in the comments if you have any other approaches for capacity planning that allow teams freedom in their way of working…

Nicolas Brown

2 November 2023

The time we went to SEAcon

Nicolas Brown

2 November 2023

What is the latest thinking being shared at Agile conferences? The Agile Coaches at ASOS took a day out at SEAcon — The Study of Enterprise Agility conference to find out more…

A joint article by myself, Dan and Esha, fellow Agile Coaches here at ASOS Tech.

What is SEAcon?

According to the website, it first started in 2017, with the organisers creating the first-ever enterprise agility related meetup (SEAm) and conference (SEAcon) as they wanted to support organisations’ desires to be driven by an authentic purpose. Having attended the 2020 version of the conference and hearing some great talks, we knew it was a good one to go check out, our first time doing this as a team of coaches at ASOS.

This year, the conference had over 30 speakers across three different tracks — Enterprise, Start-up/Scale-up and Agile Leadership. This meant that there was going to be a good range of both established and new speakers covering a variety of topics. A particular highlight was the Agile Leadership track which had a number of speakers outside of software development but with plenty of applicable learnings, which was refreshing.

The conference itself

Morning

The conference was hosted at the Oval Cricket Ground, which was a fantastic venue!

We started the day by attending Steve Williams’ How to win an Olympic Gold Medal: The Agile rowing boatsession. This talk was so good and it set a high bar for all the sessions to follow that day. What was great about it was not only Steve's passion but the fact that there were so many parallels with what teams and organisations are trying to do with employing agility. One of our favourites was the concept of a ‘hot wash up’. Here the rowers would debrief after an exercise on how it went and share feedback amongst each other, all overseen/facilitated by the coach. Not just any coach mind you, this was with Jürgen Gröbler, one of the most successful Olympic coaches ever with 13 consecutive Olympic Gold medals.

Interestingly, Steve shared that Jürgen did not have the build of nor was ever a rower himself, which, when you consider the seemingly endless debate around ‘should Agile Coaches be technical?’ offers an alternative thought in our world. Another great snippet was that in a rowing team of four, work is always split evenly; shared effort and no heroes. There is also no expectation to operate at 100% effort all the time as you can’t be ‘sprinting’ constantly (looking at you Scrum 👀).

Late morning and lunch

After the morning break, we reconvened and chose to check out Andrew Husak (Emergn) and his session on It’s not the people, it’s the system. We found this session very closely aligned with what we are looking at currently. It effectively covered how to measure the impact from the work you are doing is having in the organisation, with Andrew offering a nice blend of historical references (Drucker, Deming, Goldratt, etc.) and metrics (cycle time, lead time, work in progress, etc.) to share how Emergn focus on three themes of Value, Flow and Quality, with engagements they have.

A key thing here being about measuring end to end flow (idea -> in the hands of users) rather than just the delivery cycle (code started -> deployed to production). Albeit, it may be that you have to start with the delivery cycle first, gathering the evidence/data on improving it, before going after the whole ‘system’.

We ended late morning/early lunch by going to Stephen Wilcock (Bloom & Wild) on Pivot to Profitability. Now, it wasn’t quite all relevant to us and where we work currently (not the fault of the speaker!) however there were still useful learnings like mapping business outcomes to teams (rather than Project or Feature teams) and the importance of feature prioritisation by CD3 (Cost of Delay Divided by Duration). Although there are sources that argue CD3 may not be the most effective way to prioritise.

We took a break after that, chatting to other attendees and digesting the morning, then before we knew it, it was time for lunch. Conference lunches can be really hit or miss and thankfully a great selection was available and, unlike other conferences, there were multiple stations to get your food from so “hangry-ness” was averted.

Afternoon

After lunch we were all really looking forward to the Leadership styles in the Royal Navy talk however, once we sat down for the session we actually realised we were in The Alignment Advantage session by Richard Nugent. That would be one small criticism of the conference in that it was really difficult to find a printed schedule (none were given out) and it seems schedule changes like this suffered as a result.

Thankfully, this talk was totally worth it. At the minute, we are reviewing and tweaking our Agile Leadership training and this gave us tonnes of new thinking/material we could be leveraging around strategy and the importance of alignment in achieving this. In the talk, Richard posed a series of questions for us all to note down our take on (either within our team or our organisation), such as:

What is strategy?
What is your key strategic objective?
What is your definition of culture?
On a scale of 1–6, to what degree does your current culture support the delivery of a strategic objective?
What is the distinction between service and experience?
What is your x?

What was great was, rather than leave this ambiguously for us to answer, Richard validated our understanding by giving his view on what the answer was to all the above. After the session, we were all very energised about how we could be using this approach for leaders we work with in ASOS and baking this into our leadership training.

After Richard it was time for Jon Smart and Thriving over surviving in a changing world. As big fans of his book Sooner, Safer, Happier, we were all particularly excited about this talk, and we were not disappointed. Jon highlighted that organisational culture is at the core of thriving, however, culture cannot be created, it has to be nurtured.

Leaders need to provide clear behavioural guardrails that are contextual and need to be repeatedly communicated to enable teams and leaders to hold each other to account.

Jon went on to explain the three key factors for success:

Leaders go first, become a role model
Psychological safety
Emergent mindset; the future is unknown so experiment and optimise for that

At ASOS, one of our focuses is on flow, so when Jon spoke about intelligent flow by optimising for outcomes, we were all naturally intrigued. By having high alignment (through OKRs) with minimum viable guardrails, teams are empowered to experiment on how to best achieve their north star. However, something that is always forgotten about is limiting WIP at all levels to create a pull not push system, where organisations stop starting and start finishing.

As we all had to leave early, the last session for the day we went to was Ben Sawyers Leadership Lessons from bomb disposal operations in Iraq and Afghanistan. Sharing stories, pictures and diagrams from his several tours, Ben provided a detailed explanation of how the British Army use Mission Command to create a shared understanding of goals, while enabling teams to decide and own their tactics and approach.

This decentralised approach to leadership echoed many of the other talks throughout the day and reiterated the importance of trust to reach success. Ben also referred to Steve Williams’ approach of using a ‘hot wash up’ to reflect on recent activities and consider improvements for next time. To round off, it was interesting to hear that despite so many contextual differences, similarities in approaches have led to success in many different industries.

Key learnings

So what were our main takeaways?

One of the standouts has to be about the concepts around agility traversing multiple industries and domains, not limited to software development. It’s a great reminder as Coaches about the importance of language and how, when it comes to agility, people are likely already applying aspects of this in their day to day but calling it something else, and this is ok. Being able to have more anecdotes of what different industries use which are similar to what teams are using is great.

Secondly, the importance of focusing on outcomes and measuring impact when it comes to ways of working. As Coaches we’re talking more and more about moving teams away from measuring things like agile compliance (stand-up attendance, contribution to refinement) to the things that truly matter (speed, quality, value delivery).

Finally, the recurring theme of being outcome oriented and setting direction, allowing individuals and teams to choose their own path in how they get there being the most effective way to work. Rather than fixating on the how (e.g. methods/frameworks), it’s clear that whether you’re an agilist or not, alignment in strategy and direction is paramount for success.

For its price, SEAcon is probably the best value for money agile conference you’ll get the chance to attend. Good talks, networking and food make it one to watch out for when tickets go on sale — hopefully we’ll be back there in 2024!

Nicolas Brown

25 August 2023

Mastering flow metrics for Epics and Features

Nicolas Brown

25 August 2023

Flow metrics are a great tool for teams to leverage for an objective view in their efforts towards continuous improvement. Why limit them to just teams?

This post reveals how, at ASOS, we are introducing the same concepts but for Epic and Feature level backlogs…

Flow at all levels

Flow metrics are one of the key tools in the toolbox that we as coaches use with teams. They are used as an objective lens for understanding the flow of work and measuring the impact of efforts towards continuous improvement, as well as understanding predictability.

One of the challenges we face is how we can improve agility at all levels of the tech organisation. Experience tells us that it does not really matter if you have high-performing agile teams if they are surrounded by other levels of backlogs that do not focus on flow:

Source:

Jon Smart

(via

Klaus Leopold — Rethinking Agile

)

As coaches, we are firm believers that all levels of the tech (and wider) organisation need to focus on flow if we are to truly get better outcomes through our ways of working.

To help increase this focus on flow, we have recently started experimenting with flow metrics at the Epic/Feature level. This is mainly because the real value for the organisation comes at this level, rather than at an individual story/product backlog item level. We use both Epic AND Feature level as we have an element of flexibility in work item hierarchy/levels (as well as having teams using Jira AND Azure DevOps), yet the same concepts should be applicable. Leaving our work item hierarchy looking something like the below:

Note: most of our teams use Azure DevOps — hence the hierarchy viewed this way

Using flow metrics at this level comprises of the typical measures around Throughput, Cycle Time, Work In Progress (WIP) and Work Item Age, however, we provide more direct guidance around the questions to ask and the conversations to be having with this information…

Throughput

Throughput is the number of Epics/Features finished per unit of time. This chart shows the count completed per week as well as plotting the trend over time. The viewer of the chart is able to hover over a particular week to get the detail on particular items. It is visualised as a line chart to show the Throughput values over time:

In terms of how to use this chart, some useful prompts are:

What work have we finished recently and what are the outcomes we are seeing from this?

Throughput is more of an output metric, as it is simply a count of completed items. What we should be focusing on is the outcome(s) these items are leading to. When we hover on a given week and see items that are more ‘customer’ focused we should then be discussing the outcomes we are seeing, such as changes in leading indicators on measures like unique visits/bounce rate/average basket value on ASOS.com.

For example, if the Epic around Spotify partnerships (w/ ASOS Premier accounts) finished recently:

We may well be looking at seeing if this is leading to increases in ASOS Premier sign-ups and/or the click-through rate on email campaigns/our main site:

The click-through rate for email/site traffic could be a leading indicator for the outcomes of that Epic

If an item is more technical excellence/tech debt focused then we may be discussing if we are seeing improvements in our engineering and operational excellence scores of teams.

What direction is the trend? How consistent are the values?

Whilst Throughput is more output-oriented, it could also be interpreted as a leading indicator for value. If your Throughput is trending up/increasing, then it could suggest that more value is being delivered/likely to be delivered. The opposite would be if it is trending downward.

We also might want to look at the consistency of the values. Generally, Throughput for most teams is ‘predictable’ (more on this in a future post!) however it may be that there are spikes (lots of Epics/Features moving to ‘Done’) or periods or zeros (where no Epics/Feature moved to ‘Done’) that an area needs to consider:

Yes, this is a real platform/domain!

Do any of these items provide opportunities for learning/should be the focus of a retrospective?

Hovering on a particular week may prompt conversation about particular challenges had with an item. If we know this then we may choose to do an Epic/Feature-based retrospective. This sometimes happens for items that involved multiple platforms. Running a retrospective on the particular Epic allows for learning and improvements that can then be implemented in our overall tech portfolio, bringing wider improvements in flow at our highest level of work.

Cycle Time

Cycle Time is the amount of elapsed time between when an Epic/Feature started and when it finished. Each item is represented by a dot and plotted against its Cycle Time (in calendar days). In addition to this, the 85th and 50th percentile cycle times for items in that selected range are provided. It is visualised as a scatter plot to easily identify patterns in the data:

In terms of how to use this chart, some useful prompts are:

What are the outliers and how can we learn from these?

Here we look at those Epics/Features that are towards the very top of our chart, meaning they took the longest:

These are useful items to deep dive into/run a retrospective on. Finding out why this happened and identifying ways to try to improve to prevent this from happening encourages continuous improvement at a higher level and ultimately aids our predictability.

What is our 85th percentile? How big is the gap between that and our 50th percentile?

Speaking of predictability, generally, we advise platforms to try to keep Features to be no greater than two months and Epics to be no greater than four months. Viewing your 85th percentile allows you to compare what your actual size for Epics/Features is, compared to the aspiration of the tech organisation. Similarly, we can see where there is a big gap in those percentile values. Aligned with the work of Karl Scotland, too large a gap in those values suggests there may be too much variability in your cycle times.

What are the patterns from the data?

This is the main reason for visualising these items in a scatter plot. It becomes very easy to spot when we are closing off work in batches and have lots of large gaps/white space where nothing is getting done (i.e. no value being delivered):

We can also see maybe where we are closing Epics/Features frequently but have increased our variability/reduced our predictability with regards to Epic/Feature cycle time:

Work In Progress (WIP)

WIP is the number of Epics/Features started but not finished. The chart shows the number of Epics/Features that were ‘in progress’ on a particular day. A trend line shows the general direction WIP is heading. It is visualized as a stepped line chart to better demonstrate changes in WIP values:

In terms of how to use this chart, some useful prompts are:

What direction is it trending?

We want WIP to be level/trending downward, meaning that an area is not working on too many things. An upward trend alludes to potentially a lack of prioritisation as more work is starting and then remaining ‘in progress’.

Are we limiting WIP? Should we change our WIP limits (or introduce them)?

If we are seeing an upward trend it may well be that we are not actually limiting WIP. Therefore we should be thinking about that and discussing if WIP limits are needed as a means of introducing focus for our area. If we are using them, advanced visuals may show us how often we ‘breach’ our WIP limits:

A red dot represents when a column breached its WIP limit

Hovering on a dot will detail which specific column breached its WIP on the given day and by how much.

What was the cause of any spikes or drops?

Focusing on this chart and where there are sudden spikes/drops can aid improvement efforts. For example, if there was a big drop on a given date (i.e. lots of items moved out of being in progress), why was that? Had we lost sight of work and just did a ‘bulk’ closing of items? How do we prevent that from happening again?

The same goes for spikes in the chart— meaning lots of Epics/Features moved in progress. It certainly is an odd thing to see happen at Epic/Feature level but trust me it does happen! You might be wondering when could this happen — in the same way, some teams hold planning at the beginning of a sprint and then (mistakenly) move everything in progress at the start of the sprint, an area may do the same after a semester planning event — something we want to avoid.

Work Item Age

Work Item Age shows the amount of elapsed time between when an Epic/Feature started and the current time. These items are plotted against their respective status in their workflow on the board. For the selected range, the historical cycle time (85th and 50th percentile) is also plotted. Hovering on a status reveals more detail on what the specific items are and the completed vs. remaining count of their child items. It is visualised as a dot plot to easily see comparison/distribution:

In terms of how to use this chart, some useful prompts are:

What are some of our oldest items? How does this compare to our historical cycle time?

This is the main purpose of this chart, it allows us to see which Epics/Features have been in progress the longest. These really should be the primary focus as this represents the most risk for our area as they have been in flight the longest without feedback. In particular, those items that are above our 85th percentile line are a priority, as now these are larger than 85% of the Epics/Features we completed in the past:

The items not blurred are our oldest and would be the first focus point

The benefit of including the completed vs. remaining (in terms of child item count) provides additional context so we can then also understand how much effort we have put in so far (completed) and what is left (remaining). The combination of these two numbers might also indicate where you should be trying to break these down as, if a lot of work has been undertaken already AND a lot remains, chances are this hasn’t been sliced very well.

Are there any items that can be closed (Remaining = 0)?

These are items we should be looking at as, with no child items remaining, it looks like these are finished.

The items not blurred are likely items that can move to ‘Done’

Considering this, they really represent ‘quick wins’ that can get an area flowing again — getting stuff ‘done’ (thus getting feedback) and in turn reducing WIP (thus increasing focus). In particular, we’ve found visualizing these items has helped our Platform Leads in focusing on finishing Epics/Features.

Why are some items in progress (Remaining = 0 and Completed = 0)?

These are items we should be questioning why they are actually in progress.

Items not blurred are likely to be items that should not be in progress

With no child items, these may have been inadvertently marked as ‘in progress’ (one of the few times to advocate for moving items backwards!). It may, in rare instances, be a backlog ‘linking’ issue where someone has linked child items to a different Epic/Feature by mistake. In any case, these items should be moved backwards or removed as it’s clear they aren’t actually being worked on.

What items should we focus on finishing?

Ultimately, this is the main question this chart should be enabling the conversation around. It could be the oldest items, it could be those with nothing remaining, it could be neither of those and something that has become an urgent priority (although ignoring the previous two ‘types’ is not advised!). Similarly, you should also be using it in proactively managing those items that are getting close to your 85th percentile. If they are close to this value, it’s likely focusing on what you need to do in order to finish these items should be the main point of discussion.

Summary

Hopefully, this post has given some insights about how you can leverage flow metrics at Epic/Feature Level. In terms of how frequently you should look at these then, at a minimum, I’d recommend this is done weekly. Doing it too infrequently means it is likely your teams will be unclear on priorities and/or will lose sight of getting work ‘done’. If you’re curious how we do this, these charts are generated for teams using either Azure DevOps or Jira, using a Power BI template (available in this repo).

Comment below if you find this useful and/or have your own approaches to managing flow of work items at higher levels in your organisation…

Nicolas Brown

26 July 2023

Our survey says…uncovering the real numbers behind flow efficienc

Nicolas Brown

26 July 2023

Flow Efficiency is a metric that is lauded as being a true measure of agility yet it has never had any clear data supporting it, until now. This blog looks at over 60 teams here in ASOS and what numbers they see from a flow efficiency perspective…

A mockup of a family fortunes scoreboard for flow efficiency

Not too long ago, I risked the wrath of the lean-agile world by documenting the many flaws of flow efficiency. It certainly got plenty of engagement — as it currently is my second most read article on Medium. On the whole it seemed to get mostly positive engagement (minus a few rude replies on LinkedIn!) which does make me question why it still gets traction through things like the flow framework and Scaled Agile Framework (SAFe). I put it down to something akin to this:

A comic of mocking a consutlancy changing their mind on selling SAFe

Source: No more SAFe

For those who missed the last post and are wondering what it is, flow efficiency is an adaptation of a metric from the lean world known as process efficiency. This is where, for a particular work item, we measure the percentage of active time — i.e., time spent actually working on the item against the total time (active time + waiting time) that it took to for the item to complete.

For example, if we were to take a team’s Kanban board, it may look something like this:

Source: Flow Efficiency: Powering the Current of Your Work

Flow efficiency is therefore calculated like so:

An explanation of the flow efficiency calculation of active time divided by total time multiplied by 100

One of my main issues with Flow Efficiency is the way it is lauded as the ‘thing’ to measure. There are plenty of anecdotal references to it, yet zero evidence and/or data to back up the claims. Here’s some of the top results on Google:

None of these statements have any data to support the claims they make. Rather than bemoan this further, I thought, in line with the previous blog on Monte Carlo forecasting accuracy and with the abundance of teams we have in ASOS Tech, let’s actually look at what the data says…

Gathering data

At ASOS, I am very fortunate as a coach that those before me such as Helen Meek, Duncan Walker and Ian Davies, all invested time in educating teams about agility and flow. When randomly selecting teams for this blog, I found that none of the teams were missing “queues” in their workflow, which is often the initial stumbling block for measuring flow efficiency.

Flow Efficiency is one of the many metrics we have available to our ASOS Tech teams, to use when it comes to measuring their flow of work and as an objective lens in their efforts towards continuous improvement. Teams choose a given time period and select which steps in their workflow are their ‘work’ states (i.e. when work on an item is actually taking place):

An animation showing how teams configure their flow efficiency metric

Once they have done this, a chart will then plot the Flow Efficiency for each respective item against the completed date, as well as showing the average for all the completed items in that period. The chart uses a coloured scale to highlight those items with the lowest flow efficiency (in orange) through to those with the highest (in blue):

An example of a flow efficiency chart showing 29% for a completed item

For this article, I did this process for 63 teams, playing around with the date slicer for periods over the last twelve months and finding the lowest and highest average flow efficiency values for a given period:

An animation showing the date range showing different flow efficiency calculations

This was then recorded in my dataset like so:

A table showing the minimum and maximum flow efficiency measures for a team

All teams use a blend of different practices and frameworks — some use Scrum/ScrumBut, others use more Kanban/continuous flow or blend these with things like eXtreme Programming. Some teams are working on things you will see on the front-end/customer facing parts of the journey (e.g. Saved Items), others back-end system (e.g. tools we use for stock management and fulfilment).

Now that I’ve explained the data capture process — let’s look at the results!

Our survey says…

With so many teams to visualize, it’s difficult to find a way to show this that satisfies all. I went with a dumbbell chart as this allows us to show the variance in lowest-highest flow efficiency value per team and the average across all teams:

A summary dumbbell chart of the results showing a range of 9–68%

Some key findings being:

We actually now have concrete data around flow efficiency! We can see that with this study flow efficiency values from 9–68% have been recorded
The average flow efficiency is 35%
Flow efficiency has variability — any time you hear someone say they have seen flow efficiency values typically of n% (i.e. a single number), treat this with caution/scepticism as it should always be communicated as a range. We can see that all teams had variation in their values, with 38% of the teams in this data (25 of 63 teams) actually having a flow efficiency difference of >10%
If we were to take some of those original quotes and categorised our flow efficiency based on what is ‘typically’ seen in to groups of Low (<5%), Medium (5–15%), High (15–40%) and Very High (>40%) then the teams would look something like this:

A chart showing the distribution of results

The problem with this is, whilst we do everything we can to reduce dependencies on teams and leverage microservices, there is no way nearly all teams have either high or very high flow efficiency.

I believe a better categorisation that we should move to would be — Low (<20%), Medium (20–40%), High (40–60%) and Very High (>60%), as this would look like:

A chart showing an updated distribution of the results

But, but…these numbers are too high?!

Perhaps you might be reading this and have your own experiences and/or share the same views as those sources referenced at the beginning of the blog. There is no way I can disagree with your lived experience but I do think that when talking about numbers people need to be prepared to bring more to the conversation than anecdotal reference points. Are these numbers a reflection of the “true” flow efficiency of those items? Definitely not! There is all the nuances of work items not being updated in real time, work being in active states on evenings/weekends when it clearly isn’t being worked on (I hope!), work actually being blocked but not marked as blocked etc. — all of which I explained in the previous article.

Let’s take a second and look at what it would mean practically if you wanted to get a more ‘accurate’ flow efficiency number. Assume a team has a workflow like so:

If we take for example, 30 work items that have been completed by this team in a one month period. We assume that 60% of those items went through all columns (7 transitions) and the remaining 40% skip some columns (2 to 6 transitions) — allowing for some variability in how items move through our workflow:

An example table of data showing column transitions

Then we assume that, like all teams at ASOS (or teams using tools like Azure DevOps or Jira), they comment on items. This could be to let people know something has progressed, clarify part of the acceptance criteria, ask a question, etc. Let’s say that this can happen anywhere from 0–4 times per item:

An example table of data showing column transitions and comment count

Not only that but we also mark an item when it is blocked and also then unblocked:

An example table of data showing column transitions, comment count, times blocked and unblocked

Note: randomised using Excel

Now, if we just look at that alone, that means 348 updates to work items in a one month period. If we then wanted to add in when work is waiting, we would need to account for a few (2) or many (6 — conservatively!) times an item is waiting, as well as adding a comment sometimes (but not all the time) so that people know the reason why it is waiting:

An example table of data showing column transitions, comment count, times blocked and unblocked, wait time and waiting reason

Random values again calculated via Excel :)

We can already see that with these conservative guesses we’re adding nearly 200 more updates to just 30 work items. Once you start to do this at scale, whether that be for more items and/or more teams as well as over a longer period, you can see just how much additional work this means for teams in interacting with their tool of choice. Combine this with the costs of context switching (i.e. moving out of being ‘in the work’ to log into the tool that you are ‘waiting’) and you can see why tracking flow efficiency to more accuracy is a fools errand.

Summary

My hope is that we can now start to have a proper conversation around what ‘typical’ flow efficiency numbers we see. Whether it be what ‘typical’ values you see or what a ‘high’ flow efficiency looks like. To my knowledge this is the first attempt at something like this in our industry… and it should not be the last! In addition to this, I wanted to demonstrate what it would truly mean if you wanted a team to place more focus on getting an ‘accurate’ flow efficiency value.

For those wondering if it has changed my opinion on flow efficiency, as you can probably tell it has not. What I would say is that I have cooled a little on flaw #1 of teams not modelling queues in their workflow, given nearly all these teams had multiple queue states modelled (after good training/coaching). I still stand by the other flaws that come with it and now I hope folks who are familiarising themselves with it have this information as a common reference point when determining its appropriateness for them and their context.

What are the alternatives? Certainly Work Item Age would be a good starting point, as items aging at a rate greater than your historical cycle time might allude to an inefficient way of working. Another approach could also be looking at blocked work metrics and insights those bring.

What are your thoughts on the findings? Surprised? Sceptical?

Let me know in the replies what you think…

Nicolas Brown

21 July 2023

Adding a Service Level Expectation to your Azure DevOps board using Power Automate

Nicolas Brown

21 July 2023

The final part in this series of three blogs covering how you can add flow metrics directly into your kanban board in Azure DevOps. This part covers how to add in your Service Level Expectation (SLE). Check out part one if you want to add Work Item Age or part two for adding Cycle Time…

What is a Service Level Expectation (SLE)?

As per the Kanban Guide:

The SLE is a forecast of how long it should take a single work item to flow from started to finished. The SLE itself has two parts: a period of elapsed time and a probability associated with that period (e.g., “85% of work items will be finished in eight days or less”). The SLE should be based on historical cycle time, and once calculated, should be visualized on the Kanban board. If historical cycle time data does not exist, a best guess will do until there is enough historical data for a proper SLE calculation.

I’ve highlighted the part that is most relevant for this blog, which is about making this visible on the kanban board. Again, like the previous posts, some assumptions are made as part of this:

Within Azure DevOps (ADO) you have an inherited process template and access to edit this
You know how to generate a PAT in ADO and have already done so with full or analytics read access
States and state categories in your process template are configured correctly
You have access to Power Automate (Microsoft’s automation tool)
We are using data (not a best guess) to calculate our SLE and that we have enough data for it
We are calculating the SLE for all items on the kanban board (i.e. not breaking this down by work item type)
We are going to use the 85th percentile cycle time for all items flowing through our board as our SLE

Deciding where our SLE will go

The kanban guide is not explicit about where the SLE should go, simply that it should be visualized on the board. Given we are working with ADO, it limits our options in where we can make this visible. For the purpose of this blog, we will focus on how we can add it to the work item form, although the wrap up at the end of this blog will show another way that it can be done…

Adding custom fields for Cycle Time percentiles

Consistent with the previous blogs on Work Item Age and Cycle Time, we will add a couple of custom fields to our work item form. Keeping with the same theme of focusing on a single work item type, we are again going to use Product Backlog Item as our chosen type.

We are going to add two custom fields to the form. ‘Cycle Time 50th’ will be for the 50th percentile of Cycle Time and ‘SLE’ will be the field for the 85th percentile cycle time (our chosen percentile for our SLE). Again make sure both of these are configured as integer type fields:

Now, an optional step here is to hide these fields on the work item form. We can still populate these fields (and make them visible on the board and/or query them) but it just means less ‘distraction’ from an end user perspective:

Now those are done, we can move onto the automation!

Adding the SLE to the work item

We will start by making a slight tweak to our query we created when configuring our Work Item Age automation. If you go back to that query, you want to add in to your column options ‘Cycle Time 50th’ and ‘SLE’:

After this we are going to go to Power Automate. There are two options here for the type of automation we choose and how up to date you want your SLE to be. One way is to take the same approach we did for our Cycle Time automation and setup an automated cloud flow which would then have the SLE update as and when an item moves to ‘Closed’.

The other way (and the way this blog will cover how to do) is to use a scheduled cloud flow like we did for our Work Item Age automation:

However, what we are going to do is set this up to run more than once a day. Mainly because multiple items will (hopefully!) be moving to done during the day and we want our SLE to be as close to real-time as possible. I’ve gone with the following schedule of running every four hours:

Our next step is the same as our Work Item Age one, where we will get our query results:

Again, ensure that you input the relevant Organization Name and Project Name where you have created the query:

Following this we will add a step to Initialize variable (called ‘DateRange’). This is where we are going to dynamically look at the last 12 weeks’ worth of cycle time to calculate our percentiles. The reason why we use 12 weeks is so that we have a good amount of samples in our dataset — feel free to use less or more if you prefer. Our variable is going to be called DateRange of type String, with the following expression:

formatDateTime(subtractFromTime(utcNow(), 12, 'Week'), 'yyyy-MM-ddTHH:mm:ssZ')

The next part is where we are going to do something different than previous. Rather than add a step, we are going to ‘Add a parallel branch’:

The reason why is because we are populating both our 50th percentile AND our SLE (85th percentile) on the work item form, therefore we want them to run in parallel.

Under each branch, you are going to add a step to initialize a variable. One should be called CT85 (for the 85th percentile), the other CT50 (for the 50th percentile). Both should be of type ‘Float’:

Next we are going to add an Apply to each step under each branch, populating it with the value from our ‘Get query results’ step:

After this we are going to add a step under each branch to Get work item details. Here we want to make sure our Organization Name and Project Name match what we entered at the beginning and we are going to populate our ‘Work Item Type’ and ‘Work Item Id’ fields as dynamic content from our Get query results step:

Next we are going to add a HTTP step under each branch. This is where we are going to get our cycle time percentile data. Same as before the method should be ‘GET’ and our URL should consist of the first part (same for both):

https://analytics.dev.azure.com/ORG/PROJECT/_odata/V3.0-preview/WorkItemBoardSnapshot?%20$apply=filter(%20Team/TeamName%20eq%20%27TEAM%20NAME%27%20and%20BoardCategoryReferenceName%20eq%20%27Microsoft.RequirementCategory%27%20and%20DateValue%20ge%20

Please note —it is essential here that the ORG | PROJECT | TEAM NAME values match your own ADO project, otherwise it will fail.

Next it’s the dynamic content of the DateRange variable:

Then we do something slightly different. On the branch that is the 85th percentile you need to add the following:

%20)%20/compute(%20percentile_cont(CycleTimeDays,%200.85)%20as%20CT85)%20/groupby(%20(CT85))

For the branch that is the 50th percentile you need to add the following:

%20)%20/compute(%20percentile_cont(CycleTimeDays,%200.5)%20as%20CT50)%20/groupby(%20(CT50))

Which should then look like so:

Then click ‘Show advanced options’ for both branches and add in your PAT:

Next we are going to add in a Parse JSON step. Similar to before, this is where we are going to extract the CycleTimeDays value. For both choose ‘body’ from your previous HTTP step.

For your 85th percentile branch your schema should be:

{
    "type": "object",
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "value": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "@@odata.id": {},
                    "CT85": {
                        "type": "number"
                    }
                },
                "required": [
                    "@@odata.id",
                    "CT85"
                ]
            }
        }
    }
}

For your 50th percentile it should be:

{
    "type": "object",
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "value": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "@@odata.id": {},
                    "CT50": {
                        "type": "number"
                    }
                },
                "required": [
                    "@@odata.id",
                    "CT50"
                ]
            }
        }
    }
}

For our next step we are going to add an Apply to each for each branch. Before adding a step we need to add a concurrency control, which we do via clicking the three dots next to ‘Apply to each’ and then ‘Settings’:

Then we want to turn Concurrency Control to ‘On’ and set our degree of parallelism to 1:

Please make sure you do this for both branches!

After this we can select ‘value’ from our Parse JSON step for our ‘Select an output from previous steps’ field:

Then we are going to add a step to Set Variable which, for our respective branches, we are using dynamic content to populate the value field:

Next is a Compose step where we will use an expression. For our 85th percentile this should be:

if(greaterOrEquals(mod(variables('CT85'),1),0.5),formatNumber(variables('CT85'),'0'),if(less(mod(variables('CT85'),1),0.5),if(equals(mod(variables('CT85'),1),0),formatNumber(variables('CT85'),'0'),add(int(first(split(string(variables('CT85')),'.'))),1)),first(split(string(variables('CT85')),'.'))))

For our 50th percentile it should be:

if(greaterOrEquals(mod(variables('CT50'),1),0.5),formatNumber(variables('CT50'),'0'),if(less(mod(variables('CT50'),1),0.5),if(equals(mod(variables('CT50'),1),0),formatNumber(variables('CT50'),'0'),add(int(first(split(string(variables('CT50')),'.'))),1)),first(split(string(variables('CT50')),'.'))))

Final step for each branch is to Update a work item. This is where we are going to be adding the percentiles to the respective work item. Here we need to make sure the organization and project match what we entered previously. Our ‘Id’ and ‘Work item type’ should be dynamic content from our previous steps. Finally our respective fields for the SLE or 50th percentile should match the names we gave them at the very beginning and the values should be the ‘outputs’ of our previous Compose steps.

That’s the automation complete! Make sure all your step names match the above images and hit ‘Test’ to give it a test run:

Making this visible on the kanban board

The final step is to make these fields visible on the kanban board. To do this we need to go into our board settings and find the respective work item type. Under ‘Additional fields’ you’ll want to add Cycle Time 50th and SLE:

Now, we can see our Work Item Age and compares this to our 50th percentile for our cycle time as well as, more importantly, our SLE:

Taking this further…and alternative ways to display SLE

Unfortunately we cannot configure styles within the Kanban board if one field is greater than another. For example we ideally want a rule whereby if Work Item Age ≥ Cycle Time 50 then turn that item yellow and/or if Work Item Age ≥ SLE then turn that item orange. ADO (currently) doesn’t let you do that, instead allowing just the entry of a value:

The best we can do here for now is just to use that 50th percentile and SLE information to add styling rules for at risk and/or breached items:

I mentioned previously about an alternative approach to making your SLE visible on your board. Another way, which may be a more controversial approach (as it directly impacts the design of the teams workflow), is to have a placeholder item on the board that will always display the current SLE. To do this, create any work item type for your board and give it any name you like (don’t worry, we are going to overwrite this). Configure a Swimlane for the top of your board called Service Level Expectation (SLE) and place this item in one of your in progress columns. Here is an example:

Following slightly different steps (not detailed in this post but can be shared if it’s of interest) we can do something like the following:

With the result being an item on the board that looks like so:

Similar to previous posts, having the SLE as a field on the card allows you to better highlight those items that may be close to exceeding this value.

Of course you could leverage the same styling rules approach as previously shown:

You can also take this further and define SLE’s for different work item types. For example if I wanted this to be dynamic for different work item types, I would adjust my HTTP action like so:

Hopefully this series of blogs have been helpful in making this information around the measures defined in the kanban guide more accessible for you and your teams. Don’t forgot to add a comment below for any feedback :)

Nicolas Brown

14 July 2023

Adding Cycle Time to your Azure DevOps board using Power Automate

Nicolas Brown

14 July 2023

The second in a series of three blogs covering how you can add flow metrics directly into your kanban board in Azure DevOps. Part one covered how to add Work Item Age. Part two (this blog) will cover adding Cycle Time and part three will show how to add in your Service Level Expectation…

What do we we mean by Cycle Time?

As per the Kanban Guide:

Cycle Time — the amount of elapsed time between when a work item started and when a work item finished.

The challenge we have in Azure DevOps is that despite information around cycle time being available to populate widgets such as the cycle time chart, it requires again moving away from the board to view on a separate page.

Analytics widgets — Azure DevOps | Microsoft Learn

What would be even better for teams would be getting that information real time, ideally as soon as possible after an item moves to done. Here’s how this can be made possible…

Prerequisites

Similar to the last post, here are some assumptions made in this guidance:

Within Azure DevOps (ADO) you have an inherited process template and access to edit this
You know how to generate a Personal Access Token (PAT) in ADO and have already done so with full or analytics read access
States and state categories in your process template are configured correctly
You have access to Power Automate

With all those in place — let’s get started!

Adding a ‘Cycle Time’ field to ADO

We need to add a new field into our process template in ADO called Cycle Time. You need also to know the respective work item type(s) you want to do this for. Again, for the purpose of simplicity in this blog we will stick to Product Backlog Item (PBI) as the work item type we will do this for and be using the inheritance of the Scrum process template. Please note, if you are wanting to do this for multiple work item types you will have to repeat this process.

Find the PBI work item type in your inherited process work items list
Click into it and click ‘New field’
Add the Cycle Time field — ensure you specify it as an ‘integer’

If you have followed the previous post and have added a custom Work Item Age field, you’ll want to also implement a work item rule here. This is so that when items that were in progress move to done, we clear the Work Item Age field. You can do this like so:

Now, before automating, let’s briefly recap on how cycle time is calculated…

Understanding how Cycle Time is calculated

From Microsoft’s own documentation, we can see that Cycle Time is calculated from when an item first enters an ‘In Progress’ state category to entering a ‘Completed’ state category.

Source:

Cycle Time and Lead Time control charts — Azure DevOps Services | Microsoft Learn

Fortunately for us, when this happens, Microsoft auto-calculates this cycle time, storing it in a column in the database/analytics views known as CycleTimeDays. As mentioned previously, is not the intent of this blog to get into a debate about adding +1 days to an item as there are no instances where an item has taken 0 days to complete.

Ultimately, calculating cycle time this way still aligns with the definition as set out in the kanban guide as it is still “the amount of elapsed time between when a work item started and when a work item finished.”

Time to move on to automation…

Automating the input of Cycle Time on items

Our automation looks slightly different this time as, rather than a scheduled automation, we want this to run any time an item moves to done. Therefore we need to pick an Automated cloud flow in Power Automate:

We are going to call it Cycle Time and our trigger is going to be When a work item is closed:

We will add in our ‘Organization Name’, ‘Project Name’ and ‘Type’. Again, for this instance we are going to be consistent and just use the Product Backlog Item (PBI) type.

Please note, if you are wanting to do this for multiple work item types you will have to repeat the process of adding this field for each work item type.

The closed state field should be auto-populated:

Next we need to add a step for a Delay:

The reason for this is sometimes the calculating of cycle time by Microsoft can be a little slow. All we are going to add in here is a 30 second delay to give enough time for the CycleTimeDays column to be populated:

Following this, we are going to add a Get work item details step:

Here we want to make sure our organization, project and work item type are consistent with our first step. We also want to add in the ‘Id’ field from our first action when a work item is closed:

After this, we want to add in a HTTP step which is where we will pull in the CycleTimeDays for the completed item:

You’ll need to set the method as ‘GET’ and add in the the URL. The first part of the URL (replace ORG and PROJECT with your details) should be:

https://analytics.dev.azure.com/ORG/PROJECT/_odata/v3.0-preview/WorkItems?$filter=WorkItemId%20eq%20

Add in the dynamic content of ‘Id’ from our Get work item details step:

After the Id, add in:

&$select=CycleTimeDays

Which should then look like:

Again, ensure you have added your PAT details in the advanced options:

PAT blurred for obvious reasons!

Next we are going to add a Parse JSON step, where we are going to extract the CycleTimeDays value:

For content you’ll want to choose ‘Body’ and add a schema like so:

{
    "type": "object",
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "value": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "CycleTimeDays": {
                        "type": "number"
                    }
                },
                "required": [
                    "CycleTimeDays"
                ]
            }
        }
    }
}

Next we need to add an Initialize variable step:

This will serve the purpose of temporarily storing our cycle time before we write it back to the respective work item. Add in the following:

Apply to each is the next step to add in:

Here we will use the ‘value’ from our Parse JSON action as our output from previous steps:

Then we’ll need to add in a Set variable step which is essentially where we are going to pass through our CycleTimeDays in the value field:

Then we need to add a Compose step for rounding our Cycle Time:

Here we need to set an expression of:

if(greaterOrEquals(mod(variables('CycleTimeDays'),1),0.5),formatNumber(variables('CycleTimeDays'),'0'),if(less(mod(variables('CycleTimeDays'),1),0.5),if(equals(mod(variables('CycleTimeDays'),1),0),formatNumber(variables('CycleTimeDays'),'0'),add(int(first(split(string(variables('CycleTimeDays')),'.'))),1)),first(split(string(variables('CycleTimeDays')),'.'))))

The final action is to write this back to Azure DevOps. Here we want to add a step to Update a work item:

Ensure the organization, project and work item type all match our previous steps. We want to choose ‘Id’ from our Get Work Item Details action previously. Click Advanced Options and ensure that Cycle Time is populated with the outputs of our previous step:

Then hit save and the flow is created!

To test if it works, you will need to create a dummy item and move it to your closed state on your board to see if the flow works. With a successful run looking like so:

Making this visible on the kanban board

The final step is to make this visible on the kanban board. To do this we need to go into our board settings and find the respective work item type. Under ‘Additional fields’ you’ll want to add Cycle Time:

Then, when an item moves to done and the flow runs, you will then see the cycle time for your completed items:

Please note — this will not retrospectively update for items that were in a Closed/Done state before this automation was setup. You could however combine the logic of this and the previous blog post to do that :)

Ways you could take it further

Now, teams who are using the board as part of their daily sync/scrum also have the cycle time for completed items visible on the cards themselves. This provides insights around how long items actually took. You could then take this further, for example adding styling rules for any items that took longer than an agreed duration (or a Service Level Expectation — SLE).

This team may choose to do that for anything that took over 10 days:

Which could be the basis for discussing these orange items in a retrospective.

Now you have two of the four flow metrics visible for all in progress and completed items on your board, which is hopefully this is a useful next step for increasing awareness around the flow of work :)

Check out part three which covers how to automate the adding of a Service Level Expectation (SLE) to the kanban board…

Nicolas Brown

28 June 2023

The Full Monte

Nicolas Brown

28 June 2023

Probabilistic forecasting by Agile teams is increasingly becoming a more common practice in the industry, particularly due to the great work of people such as Larry Macherrone, Troy Magennis, Julia Wester, Dan Vacanti and Prateek Singh. One question that isn’t particularly well documented is how accurate is it? Here we look at 25 ASOS teams’ data to find out just how right (or wrong!) it really is…

Whatever your views on the relevance in 2023 of the Agile Manifesto, no practitioner should ignore the very first line of “uncovering better ways”. I’ve always tried to hold myself and peers I work with true to that statement, with one of my biggest learning/unlearning moments being around velocity and story points. Instead of these approaches, moving towards techniques such as probabilistic forecasting and Monte Carlo simulation (I have Bazil Arden to thank for introducing me to it many years ago) is more aligned to modern, more complex environments. I don’t intend to cover the many pitfalls of story points and/or velocity, mainly because I (and many others) have covered this in great detail previously.

The challenge we face with getting people to adopt approaches such as probabilistic forecasting is that those sceptical will often default to asking, “well how accurate is it?” which can often lead to many people being confused. “Erm…not sure” or “well it’s better than what you do currently” are often answers that unfortunately don’t quite cut it for those wanting to learn about it and potentially adopt it.

Whilst those familiar with these techniques will be aware that all models are wrong, we can’t begrudge those who are motivated by seeing evidence in order to convince them to adopt a new way of working. After all, this is how the diffusion of innovations works, with those in the early majority and late majority motivated by seeing social proof, aka seeing it working (ideally in their context):

Source:

BVSSH

Yet social proof in the context of probabilistic forecasting is hard to come by. Many champion it as an approach, but very few share just how successful these forecasts are, making it very difficult for this idea to “cross the chasm”.

Why validating forecasts is important

The accuracy of forecasts is not only important for those wanting to see social proof of them working, but this should in fact matter for anyone practicing forecasting. As Nate Silver says in the Signal and the Noise:

One of the most important tests of a forecast — I would argue that it is the single most important one — is called calibration. Out of all the times you said there was a 40 percent chance of rain, how often did rain actually occur? If, over the long run, it really did rain about 40 percent of the time, that means your forecasts were well calibrated. If it wound up raining just 20 percent of the time instead, or 60 percent of the time, they weren’t.

A quick sense check for anyone using these approaches should be about just how frequently they validate what was forecast against the actual outcome. In the same way when it’s forecast to be sunny and a rain shower occurs, people don’t forget significantly wrong forecasts — just ask Michael Fish!

[embed]https://www.youtube.com/watch?v=NnxjZ-aFkjs[/embed]

Therefore, it’s essential when using these probabilistic approaches that we regularly validate the difference in what we forecast vs. what occurred, using that as learning to tweak our forecasting model.

How we forecast

Coming back to the matter at hand, it’s worth noting that there is no single approach to Monte Carlo simulation. The simplest (and the one we coach our teams to use) is to use random sampling — taking a random number from a random distribution. You can however have other approaches (for example Markov Chain), but it is not intended for the scope of this blog to compare these. If you would like to know more, I’d highly recommend Prateek Singh’s blog comparing the effectiveness of each approach.

For our teams here at ASOS, we use random sampling of historical weekly throughput:

This then feeds into our forecasts on “when will it be done?” or “what will we get?” — the two questions most commonly asked of our teams.

Each forecast contains 10,000 simulations, with the outcome distribution viewed as a histogram. Colour coding shows a percentile likelihood for an outcome — for example, in the image shown we can see that for When Will It Be Done we are 85% likely (furthest left ‘green bar’) to take 20 weeks or less to complete 100 items. For What Will We Get we are 85% likely (furthest right ‘green bar’) to complete 27 items or more in the next six weeks.

There is also a note on the x-axis of the stability of the input data.

This shows the stability between two random groups of the samples we are using.

Forecast factors

In terms of what I set out to achieve with this, there were four main things I wanted to be more informed about:

Just how wrong are the forecasts?
What percentile (50th / 70th / 85th) is ‘best’ to use?
How big a factor is the amount of historical data that you use?
How different are the results in short term (2–4 weeks) and long term (12–16 weeks) forecasts?

In terms of the forecasting approach, the focus was on the ‘what will we get?’ forecast, mainly due to this being easier to do at scale and that very few of our teams have strict, imposed delivery date deadlines. Historical data of 6, 8, 10 and 12 weeks was used to forecast for a given period (in this example, the next 2 weeks) the number of items a team would complete.

This would then be captured for each team, with forecasts for 2, 4, 8 and 12 weeks using 6–12 weeks’ historical data. The forecasts to be used to compare would be the 50th, 70th and 85th percentiles.

A snapshot of the forecast table looking like so:

In total I used 25 teams, with 48 forecasts per team, meaning there were 1200 forecasts to compare.

Anyone who has used these approaches in the past will know how important a factor having historical data that is a fair reflection of the same work you will be doing in the future. Across 25 teams this is somewhat hard to do, so I settled with choosing a time of year for historical data that could (at best) reflect the forecast period for bank holidays in the UK. With the forecast being done on 25th April 2022 it incorporated two previous bank holidays (15th and 18th April 2022 respectively). The next 2–4 weeks from the forecast date having one bank holiday (2nd May 2022), the 8–12 weeks having three (2nd May, 2nd June and 3rd June 2022) bank holidays.

Validating forecast accuracy

After a brief DM exchange with Prateek, he informed me of an approach he had taken in the past where he had used brier score. This is a way to verify the accuracy of a probability forecast.

Whilst this is completely valid as an approach, for an audience that can take a while to grasp the concept of Monte Carlo simulation, I decided best to not add another data science element! Similarly, people are more interested in if, say you forecast 40 items, how far above/below that were the team. Therefore, a better answer really is to know how wrong we were. Due to this I chose to go with something far simpler, with two visualizations showing:

How often forecasts were right/wrong
How far out (in terms of % error) each forecast was

The results

As it’s a lot of data for someone to quickly view and understand the outcomes, my initial results were simply visualised in a table like so:

Any time a cell is green this means that the forecast was correct (i.e. the team completed the exact OR more than number of items).

Any time the cell is red this means that the forecast was incorrect (i.e. the team completed less than the number of items forecast).

Some observations with this were:

Using the 85th percentile, this was ‘correct’ in 361 out of 400 (90%) of forecasts. This compares with 336 out of 400 (84%) for the 70th percentile and 270 out of 400 (68%) for the 50th percentile
Forecasts that were longer term (8 or 12 weeks) were ‘incorrect’ 25% (150 out of 600) of the time compared to 16% (93 out of 600) of the time for short term (2 or 4 weeks) forecasts
The difference in terms of how much historical data to use and the forecast outcome was minimal. 6 weeks’ historical data was ‘incorrect’ 19% (56 out of 300) of the time, 8 weeks’ was 20% (60 out of 300), 10 weeks’ was by 23% (68 out of 300) and 12 weeks’ was 19% (59 out of 300)
Teams 8 and 9 are standouts with just how many forecasts were incorrect (red boxes). Whilst it’s not in scope to provide an ‘answer’ to this — it would be worth investigating as to why this may have happened (e.g. significant change to team size, change in tech, new domain focus, etc.)

If you have that age old mantra of “under promise, over deliver”, then completing more items than forecasted is great. However, if you forecast 10 items and you completed 30 items then chances are that’s also not particularly helpful for your stakeholders from a planning perspective! Therefore, the other way we need to look at the results is in terms of margin of error. This is where the notion of ‘how wrong’ we were comes into play. For example, if we forecasted 18 items or more (85th percentile) and 29 items or more (50th percentile) and we completed 36 items, then the 50th percentile forecast was close to what actually occurred. Using the previous language around ‘correct’ or ‘incorrect’, we can use a scale of:

The results look like so:

Again, some interesting findings being:

281 of the 1200 forecasts (23%) were within +/- 10% (dark green or pink shade) of the actual result
Short term forecasts (2 or 4 weeks) tend to ‘play it safe’ with 297/700 (42%) being ‘correct’ but more than 25% from the actual outcome (light green shade)
Whilst forecasts that were long term (8 or 12 weeks) were ‘incorrect’ more often than short term (2 or 4 weeks) forecasts, those short-term forecasts were significantly more incorrect than the long-term ones (shown by darker red boxes to the left of the visual)
85th percentile forecasts were rarely significantly incorrect, in fact just 9 of 400 (0.5%) of these were more than 25% from the actual outcome

Coming back to the initial questions

In terms of what I set out to achieve with this, there were four main things I wanted to be more informed about:

Just how wrong are the forecasts?

In order to answer this, you need to define ‘wrong’. To keep this simple I went with wrong = incorrect = forecasting more than what the team actually did. Using this definition and looking at our first visual we can see that forecasts are wrong 20% of the time, based on the forecasts made (243 out 1200 forecasts).

What percentile (50th / 70th / 85th) is ‘best’ to use?

This really is all about how far out you’d like to forecast.

For short term (2–4 weeks) forecasts, you’re more likely to get closer ‘accuracy’ with the 50th percentile, however this does also mean more risk as this had a higher frequency of over forecasting.

The 85th percentile, whilst often correct, was still some way off the actual outcome. Therefore, for short term forecasts, the 70th percentile is your best bet for the best balance of accuracy vs risk of being wrong.

For long term forecasts, the 85th percentile is definitely the way to go — with very few significantly incorrect forecasts.

How big a factor is the amount of historical data that you use?

It isn’t immediately obvious when we compare the visuals what the answer to this is.

When looking at how often they were incorrect, this ranged from 19–23% of the time. Similar applies when looking at accuracy (3% variance) within 10% of the actual number of items. Therefore, based on this data we can say that the amount of historical data (when choosing between 6–12 weeks) does not play a significant factor in the outcomes of forecast accuracy.

How different are the results in short term (2–4 weeks) and long term (12–16 weeks) forecasts?

This one was the most surprising finding — generally it’s an accepted principle that the longer out your forecast is, the more uncertain it is likely to be. This is because there is so much uncertainty of what the future holds, both with what it is the team may be working on but also things such as the size of the team, things that may go wrong in production etc.

When looking at the short term vs long term forecasts, we see a much higher degree of accuracy (darker green boxes) for the longer term forecasts, rather than those that are short term.

Conclusion

The main reason for this study was to start to get some better information out there around Monte Carlo simulation in software development and the “accuracy” of these approaches. Hopefully the above provides some better insight if you’re new to or experienced in using these approaches. Please remember, this study is based on the tools we use at ASOS — it may well be other tools out there that use different approaches (for example Actionable Agile uses daily throughput samples rather than weekly and I’d love to see a comparison). It is not the intent of this article to compare which tool is better!

As stated at the beginning, “all models are wrong” — the hope is these findings give some insight into just how wrong they are and, if you’re considering these approaches but need to see proof, here is some evidence to inform your decision.

One final point to close, never forget:

It is forecasting’s original sin to put politics, personal glory, or economic benefit before the truth of the forecast. Sometimes it is done with good intentions, but it always makes the forecast worse

(Nate Silver — The Signal & The Noise)

Nicolas Brown

23 June 2023

Adding Work Item Age to your Azure DevOps board using Power Automate

Nicolas Brown

23 June 2023

The first in a series of three blogs covering how you can add flow metrics directly into your kanban board in Azure DevOps. Part one (this blog) will cover adding Work Item Age. Part two covers adding Cycle Time and part three will show how to add in your Service Level Expectation (SLE)…

Context

Example Work Item Age chart

This got me thinking about how we might overcome this and remove a ‘barrier to entry’ around flow. Thankfully, automation tools can help. We can use tools like Power Automate, combined with other sources, to help improve the way teams work through making flow data visible…

Prerequisites

There are a few assumptions made in this series of posts:

Within Azure DevOps (ADO) you have an inherited process template and access to edit this
You know how to generate a Personal Access Token (PAT) in ADO and have already done so with full or analytics read access
States and state categories in your process template are configured correctly
You have access to Power Automate

With all those in place — let’s get started!

Adding a ‘Work Item Age’ field to ADO

We first need to add a new field into our process template in ADO called Work Item Age. You need to also know the respective work item type(s) you want to do this for. For the purpose of simplicity in this blog we will stick to Product Backlog Item (PBI) as the work item type we will set this up for and be using an inheritance of the Scrum process template.

Please note, if you are wanting to do this for multiple work item types you will have to repeat the process of adding this field for each work item type.

Find the Product Backlog Item type in your inherited process template work items list
Click into it and click ‘new field’
Add the Work Item Age field — ensuring you specify it as an ‘integer’ type

That’s the easy part done, now let’s tackle the trickier bits…

Understanding how Work Item Age is to be calculated

From Microsoft's own documentation, we can see that in ADO their Cycle Time calculation is from when an item first enters an ‘In Progress’ state category to entering a ‘Completed’ state category:

Source:

Cycle Time and Lead Time control charts — Azure DevOps Services | Microsoft Learn

Therefore, we can determine that for any items that have been started but not completed, the Work Item Age is calculated as the difference, in calendar days, between the current date and the time when an item first entered the ‘In Progress’ state category, also known as the InProgressDate.

It is not the intent of this blog to get into a debate about adding +1 days to an item as there are no instances where an item has taken 0 days to complete — for that we have Drunk Agile ;)

Ultimately, calculating Work Item Age this way still aligns with the definition as set out in the kanban guide as it is still “the amount of elapsed time between when a work item started and the current time.”

Now let’s jump into the automation…

Automating Work Item Age

We start by creating a query in ADO of all our current ‘in progress’ items. The complexity of this will of course vary depending on your ADO setup. For this we are keeping it simple — any PBI’s in our single ‘In Progress’ state of Committed:

Please ensure that Work Item Age is added as one of your columns in your query. It needs to be saved as a shared query and with a memorable title (sometimes I like to add DO NOT EDIT in the title).

Next we go to Power Automate and we create a Scheduled cloud flow:

We are going to call this ‘Work Item Age’ and we will want this to run every day at a time that is before a teams daily sync/scrum (e.g. 8am).

Once you’re happy with the time click create:

Next we need to click ‘+ new step’ and add an action to Get query results from the query we just set up:

Please ensure that you input the relevant ‘Organization Name’ and ‘Project Name’ where you have created the query:

Following this we are going to add a step to Initialize variable — this is essentially where we will ‘store’ what our Work Item Age is which, to start with, will be an integer with a value of 0:

Then we are going to add an Apply to each step:

We’ll select the ‘value’ from our ‘Get query results’ step as the starting point:

Then we’ll add a Get work item details step. Here we need to make sure the ‘Organization’ and ‘Project’ match what we set out at the beginning.

For Work Item Type we need to choose ‘Enter Custom Value’:

We can then choose ‘Work Item Type’ and ‘ID’ as dynamic content from our ‘Get query results’ step previously:

With the end result being:

Next we need to add a HTTP step. This is essentially where we are going to get the InProgressDate for our items:

You’ll need to set the method as ‘GET’ and add in the the URL. The first part of the URL (replace ORG and PROJECT with your details) should be:

https://analytics.dev.azure.com/ORG/PROJECT/_odata/v3.0-preview/WorkItems?$filter=WorkItemId%20eq%20

Add in the dynamic content of ‘Id’ from our Get work item details step:

After the Id, add in:

&$select=InProgressDate

Which should look like:

You’ll then need to click ‘Show advanced options’ to add in your PAT details. Set the authentication to ‘Basic’, add in a username of ‘dummy’ and paste your PAT into the password field:

PAT blurred for obvious reasons!

Then we need to add in a Parse JSON step:

This is where we are essentially going to extract our InProgressDate.

Choose ‘body’ as the content and add a schema like so:

{
    "type": "object",
    "properties": {
        "@@odata.context": {
            "type": "string"
        },
        "value": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "InProgressDate": {
                        "type": "string"
                    }
                },
                "required": [
                    "InProgressDate"
                ]
            }
        }
    }
}

Then we need to format this how we want so it’s easier to do the date difference calculation. Add a Compose step:

Rename this to be Formatted InProgressDate and with the following as an expression:

formatDateTime(body('Parse_JSON')?['value'][0]['InProgressDate'], 'yyyy-MM-dd')

Then add another Compose step, this time to get the Current Date, which should be an expression like so:

formatDateTime(utcNow(), 'yyyy-MM-ddTHH:mm:ssZ')

Then we will add one more Compose step to calculate the Date Difference, which is the following expression:

div(sub(ticks(outputs('Current_Date')), ticks(outputs('Formatted_InProgressDate'))), 864000000000)

This is essentially doing a (rather long-winded!) date difference calculation. This appears to be the only way to do this type of calculation in Power Automate.

Then we need to add a step to Set variable, which was something we established earlier on to store the work item age:

Here we just need to choose the same variable name (ItemAge) and use the ‘outputs’ from the previous step (Date Difference) as the dynamic content:

Final step is to populate this on the respective work item in ADO. To do this, search for an Update a work item step:

Then you will want to populate it with the ‘organization name’ and ‘project’ you’ve been using throughout. You also need to ensure you add in ‘Id’ for the Work Item Id and ‘Work Item Type’ from your steps previous:

Then you need to click ‘Show advanced options’ and add ‘Work Item Age’ into other fields and choose ‘ItemAge’ as the value:

Then hit save and the flow is created (ensure all your step names match the below):

Then it’s best to do a test run, which you can do by clicking Test, select ‘Manually’ and then click Test > Run Flow:

Clicking ‘Done’ will take you to a page that will show you if steps have been successful and which are in progress:

You can also view the ‘Run history’ to see if it is successful. Please note the amount of in progress items will impact how long the flow takes:

Once run, if you check back into Azure DevOps and your query, you should now see your Work Item Age field populated:

Making this visible on the Kanban board

The final step is to make this visible on the Kanban board. To do this we need to go into our board settings and find the respective work item type. Under ‘Additional fields’ you’ll want to add Work Item Age:

Then, when your board refreshes, you will see the Work Item Age for your ‘In Progress’ items:

Ways you could take it further

Now, teams who are using the board as part of their daily sync/scrum also have the age of work items visible on the cards themselves. This allows them to make informed decisions about their plan for the day, without having to flip between different tools/links to view any charts.

You can then take this further, for example adding styling rules for when items go past a particular age:

Similarly, you could also leverage the swimlane rules functionality that was recently released and create a swimlane for items a team should be considering swarming on. This could be where items are close to exceeding a teams’ forecasted cycle time or Service Level Expectation (SLE):

Hopefully this is a useful starting point for increasing awareness of Work Item Age on the board for a team.

Check out part two which details how to automate the adding of Cycle Time to the work item form for completed items…

Nicolas Brown

31 May 2023

Seeking purpose – intrinsic motivation at ASOS

Nicolas Brown

31 May 2023

Autonomy, mastery and purpose are the three core components to intrinsic motivation. How do you embed these into your technology function/department? Read on to explore these concepts further and how we go about it at ASOS…

The book Drive by Daniel Pink is an international bestseller and a commonly referenced book around modern management. If you haven’t read it, the book essentially looks at what motivators are for people when it comes to work.

Some may immediately assume this is financial, which, to a certain degree is true. The research in the book explains that for simple, straightforward work, financial rewards to motivation are indeed effective. It also explains how we need to understand this as being an ‘external’ motivational factor. Motivation from these external factors is classed as extrinsic motivation. These factors only go so far and, in the complex domain such as software development, quickly lose effectiveness when pay is fair.

This is where we look at the second motivational aspect of intrinsic motivation. When pay is fair and work is more complex, thisis when the behaviour of the person is motivated by an inner drive that propels a person to pursue an activity. Pink explains how intrinsic motivation is made up of three main parts:

Autonomy — the desire to direct our own lives
Mastery — the desire to continually improve at something that matters
Purpose — the desire to do things in service of something larger than ourselves

What drives us: autonomy + mastery + purpose

Source

When people have intrinsic motivation, it motivates people to do their best work. So how do we try to bring intrinsic motivation to our work in Tech @ ASOS?

Autonomy

Autonomy is core to all our teams here at ASOS. From a technical perspective, teams have aligned autonomy around technologies they can leverage. We do this through things such as our Patterns and Practices group, which looks to improve technical alignment across teams and agree on patterns for solving particular problems. We then communicate these patterns both internally and externally, which makes our software safer to operate and reduces re-learning effort.

As a team of Agile Coaches, we uphold this autonomy principle by not prescribing a single way of working for any of our teams. Instead, we give them the freedom to choose however they want to work, but guiding them around ensuring this way of working aligns with agile values and principles.

Comic Agilé of a leader telling teams they are self-organising

Not like this!

From books such as Accelerate, we know that enforcing standardisation with working practices upon teams actually reduces learning and experimentation. When your target market is fashion-loving 20-somethings, teams simply must be able to innovate and change without having what Marty Cagan would call ‘process people’ who impose constraints on how teams must work. You cannot inhibit yourselves by mandating one single way of working.

To bring this to life with a simple example, we don’t have any teams that use all elements of Scrum as per the guide. Do we have teams that take inspiration and practices from Scrum? Yes. Can they change/get rid of practices that don’t add value? Of course. Do they also blend practices from other frameworks too? Absolutely! For instance, we have plenty of teams who work in sprints (Scrum), love pairing (eXtreme Programming) and use flow metrics (Kanban) to continuously improve, all whilst retaining a core principle of “you build it, you run it” (DevOps). Autonomy is therefore an essential factor for all our technology teams.

Enough about autonomy… what about mastery?

Mastery

Mastery exists in a few forms for our teams. A core approach to mastery our teams use is our Fundamentals. These are measures we use to drive continuous improvement and operational excellence across our services. Our own Scott Frampton discussing the history and evolution of this in detail in this series. In short, it comprises of four pillars:

Monitoring & Observability
Performance & Scalability
Resiliency
Deployability

Teams self-assess and use this as a compass (rather than a GPS) to guide them in their improvement efforts. This means we are aligned in “what good looks like” when engineering and operating complex systems.

The levels of the respective measures are continually assessed and evolve quarter to quarter, in line with industry trends, as well as patterns and practices, so teams never “sit still” or think they have achieved a level of mastery that they will never surpass.

Similarly, mastery is something that is encouraged and celebrated through our internal platforms and initiatives. ASOS Backstage is our take on Spotify Backstage, another tool in our toolbox to better equip our teams in understanding the software landscape at ASOS. We also have our Defenders of the Wheel group — a collection of engineers who work to support the development and growth of new ASOS Core libraries and internal tools.

To encourage mastery, individuals across Tech are able to achieve certifications relevant to their role(s) and/our contributions to these internal platforms/groups:

This means that there are frequent sources of motivation for individuals in our teams from a mastery perspective.

What about the final aspect of intrinsic motivation, purpose?

Purpose

This is probably the most challenging area for our teams, as often this may be outside of their control. As an organisation, we’re very clear on what our vision and purpose is:

Our vision is to be the world’s number one fashion destination for fashion-loving 20-somethings

Source: ASOS PLC

Similarly, our CEO José recently reminded us all about what makes ASOS the organisation it is, covering our purpose, performance and passion at a recent internal event:

José talking purpose, performance and passion at Town Hall

Source: José’s LinkedIn

The challenge is that in a tech organisation, this doesn’t always easily translate into the specific work an individual and/or team is doing. If a team is working on a User Story for example, it’s not an unfair question for them to be asking “Why am I doing this?” or “What impact will this have?” or even “Where is the value?”. One of our efforts around this has been introducing and improving what we call ‘Semester Planning’, which Paul Taylor will cover in a future post. The other main effort has been around portfolio transparency.

Portfolio transparency, as a concept, is essentially end-to-end linkage in the work anyone in a team is doing so that they, as an individual, can understand how this aligns with the goals and strategy of the organisation. Books such as Sooner, Safer, Happier by Jonathan Smart bring this concept to life in visuals like so:

Source: Sooner Safer Happier

The key to this idea is that an individual should be able to understand the value in the work they are doing. This value should be as simple as possible – i.e. not via some Fibonacci voodoo or ambiguous mathematical formula (e.g. SAFe’s version of WSJF). The acid test being that can anyone in the tech organisation understand how a given item (story, feature, epic) contributes to the goals of the organisation and the value this brings. My own ‘self-imposed’ constraint for this being that they should achieve this in less than five clicks.

At its core, this really is just about better traceability of work end to end. We have high-performing teams who regularly showcase technical excellence, but how does that fit into the big picture?

With the work we have been doing, a team can now take a User Story that they will be working on and, within five clicks, understand the value this brings and the strategic alignment to the goals of the organisation (note these numbers have been modified for the purpose of this blog):

Sample hierarchy of User Story to Feature to Epic to Portfolio Epic

*Note —

not

the actual £ values*

Sample products demo from previous user story

And this is what it looks like to you!

Of course, this is dependent on quality data entry! Not everything (yet!) in our portfolio contains this information, however, this is the first positive step in making visible the purpose and value in our work.

How do you do this in your organisation? Can teams easily see the value in what they are doing? I’d love to hear your thoughts in the comments below…

Nicolas Brown

22 March 2023

The many flaws of Flow Efficiency

Nicolas Brown

22 March 2023

As organisations try to improve their ways of working, better efficiency is often a cited goal. ‘Increasing flow’ is something else you may hear, with Flow Efficiency being a measure called out as something organisations need to focus on. The problem is that few, if any, share the many flaws of this metric.

Read on to find out just what these pitfalls are and, more importantly, what alternatives you might want to focus on instead to improve your way of working…

Queues

Queues are the enemy of flow in pretty much every context, but especially software development. Dependencies, blockers and org structure(s) are just a few that spring to mind when thinking about the reasons why work sits idle. In the world of Lean manufacturing, Taiichi Ohno once stated that in a typical process only around 5% is defined as value adding activity. There is also the measure of overall equipment effectiveness (OEE), with many manufacturing lines only 60% productive.

More recently, the work of Don Reinertsen in the Principles of Product Development Flow has been a frequent inspiration for Agile practitioners, with this quote in particular standing out:

Our greatest waste is not unproductive engineers but work products sitting idle in process queues.

Many thought leaders, coaches and consultants champion the use of a metric known as flow efficiency when coaching teams and organisations about improving their way of working, but what exactly is it?

What is flow efficiency?

Flow efficiency is an adaptation from the lean world metric of process efficiency. This is where for a particular work item (backlog item, user story, whatever your preferred taxonomy is) we measure the percentage of active time — i.e., time spent actually working on the item against the total time (active time + waiting time) that it took to for the item to complete.

For example, if we were to take a software development team’s Kanban board, it may look something like this:

Source: Flow Efficiency: Powering the Current of Your Work

Where flow efficiency would be calculated like so:

Flow efficiency (%) = Active time / Total time x 100%

The industry standard says that anything between 15% and 40% flow efficiency is good.

In terms of visualizing flow efficiency, it typically will look like this:

Flow Efficiency: Powering the Current of Your Work

In this chart, we can see the frequency (number) of work items with a certain percentage flow efficiency, and an aggregated view of what the average flow efficiency looks like.

All makes sense, right? Many practitioners would also advocate this as an important thing to measure.

I disagree. In fact, I would go as far as to say that I believe flow efficiency to be the least practical and overhyped metric in our industry.

So, what exactly are some of the problems with it?

Anecdotal evidence of “typical” flow efficiency

Now I don’t disagree with the ideas of those above regarding queues being an issue with a lot of time spent waiting. I also don’t deny that flow efficiency in most organisations is likely to be poor. My issue comes with those who cite flow efficiency percentages or numbers, quoting ‘industry standards’ and what good looks like without any solid proof. “I’ve seen flow efficiency percentages of n%” may be a common soundbite you hear — #DataOrItDidntHappen needs to be a more frequent hashtag for some claims in our industry. If we take a few examples near the top of a quick Google search:

“The industry standard says that anything between 15% and 40% flow efficiency is good” — yet this article has no supporting ‘industry’ sources to back this up
“Most teams’ have a flow efficiency of between 5% to 15%. The lowest recorded was around 2%; the maximum, after much effort, was about 40%” — another blog with no source data or a link to the study (despite it saying it was ‘recorded’)
“Flow efficiencies of as low as 2% have been commonly reported by managers and consultants reporting at Lean Kanban conferences over the last 2 years. My own personal experience is that firms I encounter exhibit 5% — 15% flow efficiency. After considerable efforts to make improvements, we see this improve but 40% is still very good” — the most common anecdotal reference around flow efficiency from David Anderson, yet this again is just anecdotal.

I finally thought I’d found some hard data with “the average Scrum team Process Efficiency for completing a Product Backlog Item is on the order of 5–10%” which is cited in Process Efficiency — Adapting Flow to the Agile Improvement Effort. That is until we see the full text:

And then the supporting reference link:

Surveying a few people in a classroom ≠ ‘the average Scrum team’.

It amazes me that in all the years of data collated in various tools, as well as our frequent emphasis on empiricism — there is not one single study that validates the claims made around what flow efficiency percentages “typically” are.

Lack of wait states

Now, discounting the lack of a true study, let’s look at how a typical team works. Plenty of teams do not know or have not identified the wait states in their workflow:

In this example (which is not uncommon) — all the workflow states are ‘active’ states, therefore there is no way to calculate when work is waiting, thus flow efficiency will always be 100% (and therefore useless!). Plenty of teams have this where they do in fact know what their wait states are yet have not modelled them appropriately in their workflow.

Impossibility of measuring start/stop time

Let’s say now we’ve identified our wait states and modelled them appropriately in our workflow:

How often (IME fairly regularly!) do we hear updates when reviewing the board like the below:

Expecting near real-time updates (in order to accurately reflect active vs. wait time) is just not practical and therefore any flow efficiency number is flawed due to this delay in updating items. Furthermore, there are so many nuances with product development that making a binary call as to whether something is active vs. wait is impossible. Is thinking through something on a walk active time or idle time? What about experimentation? Even more so think about when we leave for work at the end of the day. None of our items are being worked on, so shouldn’t they all be marked as ‘waiting’ until the next day?

Not accounting for blockers

Keeping the same workflow as before, the next scenario to consider is how we handle when work is blocked.

This particular item is highlighted/tagged due to being blocked as we need feedback before we can move it along in our workflow. Yet it’s in a ‘work’ state as it cannot be progressed. More often than not, this is not factored into any flow efficiency calculation or literature, such as this example:

Tasktop — Where is the Waste in Your Software Delivery Process?

There is no way an item was “In Dev” for a clear, uninterrupted period and therefore it is not a realistic picture presented in terms of how product development actually happens.

That’s NumberWang!

For those unaware, Numberwang is a well-known sketch from the comedy TV show That Mitchell and Webb Look. It is a fictional gameshow where the two contestants call out random numbers which are randomly then told by the ‘host’ to be “Numberwang!”

[embed]https://youtu.be/0obMRztklqU[/embed]

Why is this relevant? Well, when looking at items that have moved through our process and their respective flow efficiency percentage, all we are doing is playing an Agilists version of the same comedy sketch.

Face ID Login has a flow efficiency of 19% but QR code returns only had 9%! OMG :( :( :( So what? It’s just a numbered percentage— it doesn’t mean anything! Also, look at the cycle time for those items, can we definitively say that one item was “more efficient” than the other? Does this tell us anything about how to improve our workflow and where our bottlenecks are? No! It’s just reading out numbers and thinking it means something because it’s “data”.

The Flaw of Averages

Anyone who has read previous posts of mine will know that any sort of use of average with flow metrics is a way to push my buttons. Unfortunately, the visualisation of flow efficiency often comes with an average of the efficiency for a collection of completed work items, like so:

Using averages with any sort of metrics is a dangerous flirtation with misleading information, and we can see that for a series of items this is quite easy to do:

Three of our five completed items have poor flow efficiency yet aggregating to a single number alludes to (if the close to 40% flow efficiency being “good” anecdote is being cited!) us having a fairly effective process. By aggregating we are losing all that context of those ‘inefficient’ items and where we might be using them as the basis for a conversation around improving our way of working.

What should we use instead?

In theory flow efficiency seems like a good idea, however when you look at all those reasons above that it is simply not practical for teams and organisations to actually implement and put to effective use (without at least being clear they are using flawed data). Proceed with caution for anyone advocating it without those caveats mentioned above!

https://twitter.com/danvacanti/status/1321547554136428544 https://twitter.com/DReinertsen/status/1106975020432031744

A better metric/use of your time is looking at blocker data and going after those that occur the most frequently and/or are the most impactful. Troy Magennis has a great tool for this (thank you also to Troy for sharing some thoughts on this piece).

Here are some of the examples we use for some of our teams here at ASOS:

Shout out to

Francis Gilbert

in our Saved Items team for these!

Which can then be used to reduce the frequency of particular blockers occurring and seeing where you need to next focus:

Shout out to Gary Sedgewick in our PayTech team for this!

This way you’re actually going after the problems teams face, which will in turn positively impact the flow of work. All this is done without the need of some ‘efficiency’ measure/number.

What are your thoughts? Agree? Disagree?

I’d love to hear what you think in the comments below…