I have a new project and a new team, and I’m asked to create a financial estimate for a data technology project. How do I come up with a reasonable forecast?

While conducting a workshop, a colleague asked how I would go about creating a reliable financial estimate for a net new IT data platform project, while establishing a new team.  It’s a great question given many organizations have a wide variance in estimation, with cause a lot of frustration from stakeholders. This case is a bit more challenging given there isn’t an established team with historical velocity metrics and scope definition.  Let’s get started.

1. First things first

Let’s apply some structure and assumptions to the address the challenge.  We begin by asking the question, what am I estimating for?  Seems simple, but often this gets very convoluted amongst the many stakeholders.  Let’s make a few assumptions for this example:

  • You are asked to define the entire technology / IT new project (cost and time….in other words, answer the question: how much and how long will this take?)
  • The product vision is created.  Let’s say it’s a data platform with analytical use cases defined.  The business value is defined, but nothing further
  • The end users of the platform (data scientist, analysts etc) are already part of the organization, and therefore are not counted in the estimate
  • The final output will be a costing model over time

2. Ok, this might not seem like a lot of information, but does it provide a good starting point?

Next, let’s apply a framework to organize the estimate, and identify the variables to solve for. I like to think of the life cycle in terms of building blocks:

Estimation building blocks
Estimation building blocks

A key point is these topics are iterative, not sequential.  It’s intuitive to think of each topic as a series of steps, however, there tends to be a constant evolution through iterations.  For example, new technologies to handle data volume, or new use cases which require the teams to reevaluate the architecture, or staff turnover.  This is a shift from a project mindset to a product mindset….which we can cover another time.

Great, now that we have a framework, I can just plug in the numbers, right?  Unfortunately, this isn’t the case.  Let’s dive deeper into product planning.

3.  The next step is to frame the scope in a way the estimations can be financially modeled. 

We know the first phase of the work is to create a data platform and activate three use cases.  In this example, I prefer to take a top down approach (as opposed to a bottom up approach…we can discuss the merits another time). 

First step is to deconstruct the use cases into the grouping of data elements (I call them domains…or epics).  These domains serve as the key output for the technology team.  The product team will need to conduct a data requirements exercise, and the technology team will need to conduct a data mapping to the source systems process (this is part of architecture and infrastructure building block).  Let’s say four domains were identified (Customer, Finance, Supplies, Purchases history).  This type of activity is fairly straight forward, and I would time-box two people (BA and Architect) for 4 weeks for the domain definition and data mapping to source.

Example: Defining the epics for t-shirt estimation
Focusing the estimation from domain/epics to data source

4. Often in parallel to the domain definition and data mapping, the architecture and infrastructure building blocks can be defined.

With security at the forefront, I like to take a DevSecOps mindset to the estimation.  Since we are dealing with net new technology, I focus on defining an end state architecture and tooling set.  Thought this can be a time-consuming exercise with stakeholder alignment and governance processes, the resources need to be focused on getting off the ground, rather than architecting for every scenario.  This is another exercise I prefer to time-box with two people (same BE and architect) over a 6 week cycle, and an additional security person for 2 weeks to perform input and audit (part time). In addition to the scope definition exercise, the team will gather and compile subscription costs for hosting and tooling. 

5. Now that the product, architecture, infrastructure, and security have some clarity, we can estimate the build. 

Start with a few assumptions:

  • I’m assuming the estimation model focuses on allocating full time employees (FTE) or contract resources, therefore not outsourced as a fixed fee from a 3rd party firm.
  • Start with one pod of 7 plus or minus 2 people.  Product owner, scrum master, tester, four engineers.
  • Assume there will be flex resources cycling in and out of the program.  Such as an architect, data modeler and business analyst.
  • Since this is a new net team with limited data points on velocity, assume one engineer has a throughput of 8 story points per week (I’m using the Fibonacci system as the standard story pointing method).
  • Estimate the build from domain back to the source data. I prefer to focus work from an end to end perspective, rather than chop the work into sequential or waterfall blocks. This minimizes switching costs.
  • Data consumption from stakeholders and end-users (for example reporting and analytics) will need to be a separate estimation exercise.
  • Source data will not change, and endpoints are exposed for data acquisition.

With these assumptions, we can t-shirt size the epics.  Since this team has four engineers, the top velocity of the pod is 64 pts (128 pts per sprint).  Therefore, I think of the t-shirt denominations as: Small = 0-63 pts, medium = 64-128 pts, large = 129-257, extra-large needs to be broken down into small parts (it is reasonable to think in terms of sprints, however I prefer the story points to align with the engineering teams agile cadence of story point estimating).  The next step is to apply the t-shirt size denominations to the domains.  Below is an example for creating a high level t-shirt sizing, as well as a legend.

Example: T-shirt estimation example
T-shirt estimation
Example: T-shirt estimation legend example
T-shirt estimation legend

For a project of this size, I will add an additional 2 sprints for defect remediation, deployment and knowledge transition to the support team, which increase the total sprint range between 7 and 11 sprints. ** the range allows for some cone-of-uncertainty, iterations and flexibility.

6. At this point, we have a fairly clear understanding for the product planning, design, and build building blocks. 

As we dive deeper into the build & execution, we have an idea of the t-shirt size for duration estimation based on a single pod.  What if the executives want to speed up the process? The logic approach would be to add another pod.  The premise would be the build & execution duration can be reduced by half, however I’ve seen this to be false.  Although the build duration will be shorter, the calculation isn’t linear.  Rather, roughly 10-20% of total speed will be reduced, and a slight increase in overhead will be required to coordinate the teams.  If required, we can create a scenario analysis, and compare the pros and cons of different approaches. To keep things simple for this example, let’s assume a 1 pod structure. 

7. Next we will want to calibrate the team. 

The pod (core team) consists of 7 people.  Product owner, scrum master, tester, four engineers.  However, we know over the course of the 7 to 11 sprints, we will need other resources to support the effort (for example trouble shooting the data sources).  From an estimation lenses, we can allocate a percentage of funds for these types of roles: 25% for a data modeler, and 50% for a business analyst.  If this was a resource planning exercise, we could allocate the resource to specific sprints. When building the cost model, my preference is to create a sprint plan, and specify the sprint cycles for flex resourcing. This give my higher confidence in the financials, rather than using the peanut butter spread approach.

A question often asked is: What happens to the build & execution team after the 7 to 11 sprints? From an estimation point of view, the funds will have been exhausted, and the effort will have been transitioned to the enhancements and the support teams. This essentially means there will be no further feature or functionality forward development. However, in many cases the effort moves from a project based mindset to a product based mindset. This means the business and technology stakeholders roadmap plan for future investments required to activate additional value from the product. Therefore, the build and execution team would continue focusing on new feature and functionality under a new funding sources.

8. There is another component to estimation that often get’s overlooked.

It’s common for stakeholders to ask the question: How much will this cost and how long with it take? However, the another part of the question should be: How much will it cost to support the operations once the system is build?

The estimate will need to include funds to maintain the system with small enhancements into perpetuity. Typically, these are different personnel who focus on light weight development activities (UI changes, consuming a updated services, or adjusting minor calculations. My approach is to allocate 15-20% of the monthly build cost for small enhancements on an annual basis.  Some organizations use a pool of funds to support multiple applications, others allocate specific head count to provide support. In addition, the estimate will need to include production and troubleshooting support.  Similar to small enhancements, I allocated 15-20% per month of the monthly build costs for production support on a multi-year basis. These are the first responders when there is an issue for end-users; in addition, they are typically par to separate group within the organization. If the system is deemed mission-critical to the organization, the estimate will be on the higher range with 24×7 support and dedicated enhancements teams.

Another part to estimate are the non-labor costs. These are typically subscriptions and licensing costs; for example hosting (AWS, Azure, Google etc), computing cost, collaboration tools (Jira and Rally), databases (oracle etc), hardware, travel and expenses etc. For this example, we will assume non-labor costs are all native to the hosting provider, and development and collaboration tools are paid funded separately. Most hosting providers will provide a cost estimator, which I will increase by a factor of x2 as a starting point…followed by a growth rate.

9. To complete the calculation, labor rates and non-labor rates need to be factored. 

For this exercise, I will use a blended rate of $100 per hour on a 8 hour daily basis (check with your local market on rates), and a 20% annual growth rate on subscriptions for the first 3 years (4% growth rate thereafter).

10. With all the estimate building blocks created, it is time to compile the information and create reports. 

My preference is to create the model using ms excel, however others will use other instrumentations.

Example: labor cost estimation model
Example: Distilled cost estimation model – Labor only
Example:  Estimation model totals for labor costs
Example: Total estimation costs – Labor and non-labor

Most executives will want to see the data from different perspectives: cost by function, duration, department etc.  The estimate needs to be created in a robust way, and allow for deep analysis (for example comparing different scenarios of the estimate).  Some of the typical reports I like to include are:

  1. Total cost (labor and non-labor) over 3-5years
  2. Total annual cost
  3. Cost by department: Allocations for development, testing, product, infrastructure, security, support and operations
  4. Cost by function: Product planning, architecture and infrastructure, build, enhancements, support on a quarterly or annual basis
  5. Cost by time: Annual, quarterly, monthly and sprint burn rates
  6. Resource analysis:  Total resources needed, segmented resource needs by function, evaluation of contractor vs fte ration, and perhaps rates comparisons/evaluations
  7. ….

Let’s wrap this up.

There are several pieces to navigate across the six building blocks of IT estimation (product planning, architecture & infrastructure, security, build & execution, enhancements and support & maintenance). The process often involves more than a dozen people to analysis and development, followed by additional input and iterations from leadership teams. Keeping up with all the permutations can be overwhelming. Leaning on experts across the SDLC will help to anchor the input and assumptions. In addition, keep a checklist of specific topics across the building blocks, and version your estimate with good notes.

In summary, it’s important to understand the scope of the estimation (what are we estimating, what are we not estimating), apply a framework with assumptions, and develop a flexible model for iterations.