Reference Class Forecasting & Optimism Bias Explained

Why Projects Overrun

Optimism Bias & Strategic Misrepresentation

Two distinct forces push estimates below outturn — one psychological, one political. Reference class forecasting was devised to correct for both.

The empirical record on large infrastructure is unforgiving: across road, rail, tunnel and fixed-link projects worldwide, the typical estimate is materially below the cost the project actually incurs, and the error runs in one direction. This is not noise that averages out — it is a systematic underestimation. The work of Bent Flyvbjerg and colleagues, building on the behavioural research of Daniel Kahneman and Amos Tversky, traced this to two reinforcing causes.

Optimism bias is the cognitive tendency, well documented since Kahneman and Tversky, to take an “inside view” of one’s own plan — to assume the most favourable scope, the smoothest delivery, the best-case productivity and pricing — and to underweight the things that routinely go wrong. Strategic misrepresentation is the deliberate, incentive-driven shading of numbers: to win funding, secure approval or beat a competing business case, proponents have an interest in a low estimate and an optimistic schedule. The two are hard to separate in practice, and both bend the base estimate toward a best case rather than an expected case.

Systematic, not random

Overruns cluster on one side of the estimate. A random error would scatter; optimism bias does not — which means it can be measured and corrected, not just hoped away.

The inside view is the trap

Estimating from the project’s own plan and assumptions feels rigorous but inherits every optimistic assumption baked into that plan. Detail is not the same as accuracy.

Incentives compound it

Where funding hinges on a low number, strategic misrepresentation pushes the same direction as optimism bias — so the correction has to be evidence-based, not self-reported.

Note on language: the RES Contingency Guideline and Infrastructure Australia / Treasury policy use the term optimism bias directly. The Commonwealth DITRDCA Guidance Notes prefer the framing of estimate bias and calibration rather than the phrase “optimism bias” — but the concern is the same: an estimate that is not calibrated against real outcomes is an estimate that will tend to be exceeded.

The Outside View

What Reference Class Forecasting Is

Reference class forecasting (RCF) sets aside the project’s own optimistic plan and instead forecasts from the actual cost-overrun distribution of a class of genuinely comparable past projects.

The method has three steps. First, identify a reference class — a set of completed projects similar enough to the one in front of you (same type, scale and delivery context) that their outcomes are informative. Second, establish the distribution of cost overruns across that class: by how much, as a percentage, did those projects exceed their original sanctioned estimate, and how is that overrun spread out? Third, read off the uplift required to bring your estimate to the confidence level you need — for a high-confidence budget, you select the uplift at the corresponding percentile of the reference distribution.

The intellectual lineage is explicit: Kahneman & Tversky on the planning fallacy; Lovallo & Kahneman on the inside view versus the outside view; and Flyvbjerg & COWI (2004), whose study underpinned the UK Department for Transport’s adoption of mandatory optimism-bias uplifts. The core insight is deceptively simple — the single best predictor of how your project will perform is how similar projects actually performed, not how confident you feel about your own plan.

Inside view (Monte Carlo)

A bottom-up model of this project — ranged quantities, priced risk events, correlation. Precise and project-specific, but it can inherit the plan’s own optimism if assumptions go unchallenged.

Outside view (reference class)

The historical overrun distribution of similar completed projects. Immune to the plan’s optimism — but blind to what makes this project different. It answers “how do projects like this usually go?”

Convergence = a checked estimate

When the modelled result and the reference-class benchmark agree, confidence is high. When they diverge, that gap is the signal to investigate — not to silently average the two.

RCF Done Right · per RES

A Benchmark & Governance Check — Not a Method

The RES Contingency Guideline (3rd Edition, 2025) is unambiguous about where RCF belongs. This is the positioning Cenex follows.

RES lists reference class forecasting among the probabilistic non-simulation methods (Appendix D), attributing it to Kahneman & Tversky, Lovallo & Kahneman, and Flyvbjerg & COWI (2004), and noting it is also called “optimism-bias uplifts.” But RES is explicit on its standing: RCF is “more properly considered a validation or benchmarking practice for quality management and governance, NOT a contingency determination method.” RES recommends it be used to complement rather than replace First Principles Risk Analysis (FPRA) supported by quantitative schedule risk analysis.

Where RCF Belongs

As a governance and validation cross-check: does the modelled P50/P90 sit sensibly against what similar projects actually cost? It is one of the bias-control measures RES lists — benchmarking against historical data — alongside independent review and P-range inputs.

Best used for: sanity-checking a bottom-up result, exposing an estimate that looks optimistic against history, and giving a board or funder an independent reference point.

Where RCF Does Not Belong

As the primary contingency determination on a major project. It cannot tell you which risks drive the number, cannot be interrogated risk-by-risk in a tornado, and cannot be re-run when scope or a single assumption changes. A funder challenging the contingency needs a model, not a percentile from someone else’s history.

Not a substitute for: bottom-up Monte Carlo / FPRA, which RES prefers at key decision points such as Final Business Case and FID.

The framework consensus

All three Australian frameworks treat RCF the same way. RES calls it a benchmark/governance check, not a method. TMR cites Flyvbjerg in the project-delay category rather than as a contingency engine. The Commonwealth DITRDCA Project Cost Breakdown template carries a built-in “Reference Class” dropdown — signalling RCF is part of the governance furniture of a submission, sitting beside the mandatory Monte Carlo result, not replacing it. See them lined up in Frameworks Compared.

Know the Edges

The Seven Limitations RES Lists

RES is candid that reference class forecasting is a blunt instrument if leant on too hard. These seven limitations are why it stays a cross-check, not the primary tool.

1

Assumes the future mirrors the past

RCF projects yesterday’s overruns onto tomorrow’s project. Where methods, materials or delivery models have genuinely changed, the historical class may no longer be representative.

2

Discourages improvement

Baking in the historical overrun can entrench it — if you always uplift to the past average, you remove the pressure to deliver better than the past.

3

Most organisations lack the data

A credible reference class needs a clean, comparable, well-populated history of completed projects. Few organisations hold one — and a thin class produces a fragile forecast.

4

Needs a genuinely similar class

The reference projects must be truly comparable in type, scale and context. A loosely-assembled class imports irrelevant outcomes and flatters or punishes the estimate arbitrarily.

5

Ignores project-specific events

A class-wide uplift cannot capture the discrete contingent risks unique to this project — a particular latent-condition exposure, a specific interface, a known approval risk.

6

Cost only — ignores other objectives

RCF speaks to cost (and time) overrun. It is silent on safety, quality, environment and reputation — objectives a full risk process must still address.

7

Not transparent at the driver level

An uplift read off a distribution is a single aggregate number. It cannot be decomposed into the specific risks driving it, cannot be tested in a tornado diagram, and cannot be re-run when one assumption changes — which is exactly what an assurance reviewer or funding gate asks of the primary contingency.

The Fallback Table

Optimism-Bias Uplifts — the UK Green Book Reference

Where an organisation has no reference class of its own, RES reproduces the UK Supplementary Green Book optimism-bias uplift table as a fallback — a published, project-type-banded set of uplifts to apply to an early estimate.

The UK approach, which grew directly out of Flyvbjerg & COWI’s work for the Department for Transport, mandates an optimism-bias uplift on capital costs at early business-case stages, decaying as the estimate matures and design firms up. The uplift is highest for the project types history shows are hardest to estimate — non-standard buildings and rail — and lowest for routine, well-understood works. The figures below are illustrative upper-bound bands in the spirit of the Green Book supplementary guidance; they are a starting point to be calibrated, not a substitute for analysis.

Project Type	Capital Cost — Upper-Bound Uplift	Why this band
Standard buildings	~24%	Well-understood, repeatable scope; lowest historical overrun.
Non-standard buildings	~51%	Bespoke design, novel systems; higher and more variable outturn.
Standard civil engineering	~44%	Roads, utilities — exposed to ground risk and scope growth.
Non-standard civil engineering	~66%	Complex structures, first-of-kind works; high uncertainty.
Equipment / development	~200%+	Technology and development projects show the largest optimism gap.

Read these uplifts for what they are: a fallback for the data-poor case and a benchmark to test a modelled result against — not a contingency for a major project. A 50%+ headline uplift on an early estimate is also a powerful illustration of just how far optimism bias can sit below outturn when nothing corrects for it. The uplift should decay as design matures and a project-specific Monte Carlo model takes over the contingency role.

Inside + Outside Together

How RCF Complements Monte Carlo

The inside view and the outside view are not rivals. The most defensible estimate uses both — the model to derive the number, the reference class to validate it.

Monte Carlo Derives the Number

A bottom-up FPRA / QRA model ranges base line items, prices discrete contingent risks as probability × impact, and models correlation — producing the S-curve and the P50/P90 contingency, fully traceable and re-runnable. This is the primary contingency, the one a funder can interrogate. See the Monte Carlo method and P50 vs P90 explained.

RCF Validates the Number

The reference class then asks the harder question the model can’t answer about itself: does this result look credible against what projects like this actually cost? If the modelled P90 sits below the typical outturn of the reference class, the model is probably carrying the very optimism bias it was meant to remove — a flag to revisit ranges, assumptions and missing risks.

Original estimate (bars)

The sanctioned figure at approval — the inside-view number, normalised here so every project’s estimate sits at the same level for comparison.

Actual outturn (line)

Every point lands above the estimate level. The overrun varies in size but rarely in direction — the systematic lean RCF measures and corrects for.

The lean is the signal

If your project’s estimate looks like the bars but ignores the line, it is likely to overrun like the class. RCF turns that lean into an uplift; Monte Carlo turns the project’s own risks into one.

How Cenex Delivers

RCF as a Sanity-Check on the Modelled P50/P90

Cenex uses reference class forecasting exactly as RES intends — an independent benchmark over the top of a bottom-up model, never a shortcut around it.

The base is a best case

Left to itself, an uncorrected base estimate sits low — the inside-view, optimism-prone figure. It needs lifting to a funded confidence level.

The uplift reaches the funded level

The reference-class uplift bracket (outside view) lifts the base to the confidence level the funder requires — an independent route to roughly the same height as the modelled P90.

Agreement is the goal

When the outside-view uplift and the inside-view P90 land at a similar level, the budget is defended from two independent directions. A wide gap is a finding to chase down before the gate.

Model First

A bottom-up First Principles Risk Analysis with Monte Carlo — inherent risk ranged on line items, contingent risks priced as probability × impact, correlation modelled, reported as S-curve, histogram and tornado at P10/P50/P90.

Benchmark Second

The modelled P50/P90 is then cross-checked against a reference class of comparable completed projects — or, where no clean class exists, against the published optimism-bias uplift bands as a fallback benchmark.

Investigate Divergence

If the model sits well below the reference class, that is a finding — revisit ranges, hunt for missing contingent risks, and stress the optimistic assumptions, rather than quietly averaging the two numbers.

Independent Sign-Off

With no downstream delivery interest, Cenex challenges rather than inflates — bias-controlled per the RES list, benchmarked against history, and signed off by a Chartered Engineer with re-runnable model files for the funder.

Continue through the hub

See how the inside view is built in the Monte Carlo method and what the confidence levels mean in P50 vs P90 explained. Read the framework pages — TMR, RES and DITRDCA — or see them compared side by side. Start fresh from the introduction or the hub overview. For the engagement view, see our Risk Modelling & Management service.

Is Your Estimate Optimism-Checked?

Cenex builds the bottom-up Monte Carlo model your framework requires — and benchmarks the result against the outside view, so the contingency you take to a funding gate stands up from both directions.

Talk to Cenex Our Risk Service

Reference Class Forecasting & Optimism Bias