What is survivorship bias in a backtest?

Survivorship bias is testing a strategy only on the stocks that still exist today, silently dropping the ones that delisted, went bankrupt, or were acquired. Because the survivors did better on average, the backtest looks stronger than the strategy would have been in real time.

How do you avoid survivorship bias?

By using a point-in-time, survivorship-free universe: at every historical date the backtest sees exactly the names that were tradeable then, including ones that later delisted, and a name's bars stop when it stops trading. No filtering on today's survivors.

Why do most public backtests have survivorship bias?

Because clean, delisting-inclusive historical data is harder to source and store than a current ticker list. Many backtests simply pull today's index members and run them backwards, which quietly removes every failure from the sample.

Survivorship bias: what most backtests quietly leave out

The countermeasures described below are properties of the Shishin backtest pipeline. They are not a guarantee that any forward result is free of bias of any kind, every backtest carries the assumptions of its construction. The point of disclosing the construction is to let the reader judge.

Survivorship bias is the silent killer of backtest credibility. Most contaminated backtests are not built with intent to deceive. They are built on the cleanest dataset their author could find, which happens to be the dataset of names that still exist today.

The trap

A reasonable engineer sits down to backtest a long-bias equity strategy. They pull the current constituents of a major US index. They reach back five years for price history on each name. They run the strategy. The numbers are excellent.

The numbers are excellent because the universe under test is, by construction, the universe of names that did not die. Companies that delisted, went private, were acquired out of existence, or were reverse-split into oblivion are not present in the current index, and therefore are not present in the backtest. Every name in the test by definition survived the period being tested. The strategy has not been measured against the actual opportunity set of the period. It has been measured against the opportunity set of the survivors.

Why the distortion is not small

In an average five-year US equity window, somewhere between five and fifteen percent of mid- and small-cap names listed at the start of the window are no longer tradable at the end. They were acquired, taken private, or wound down. A backtest that does not see them sees a cleaner version of the period than any operator lived through. The cleaner version produces a higher CAGR, a higher Sharpe, a shallower drawdown. None of these improvements come from the strategy.

The distortion compounds. The strategy might be a perfectly defensible momentum or quality screen, and yet the live version of it will under-perform the backtest by a meaningful margin for years before anyone realises the gap is structural rather than circumstantial.

What Shishin does

Five mechanisms inside the pipeline are designed to keep the backtest universe honest at each historical date, rather than letting it be quietly re-painted by what is still listed today.

Point-in-time universe. On any given historical date, the universe under test is the set of symbols that were tradable on that date, with the OHLC and volume that printed on that date. A name that listed in 2023 does not appear in the 2021 universe. A name that delisted in 2024 does appear in the 2023 universe, with the prices it actually traded at, and stops appearing on the date it actually stopped trading.
Delistings are recorded, not erased. When a symbol stops fetching new bars, it is marked deactivated, not removed. Its historical bars remain in the database. The backtest can therefore re-construct the universe of any past date without losing the names that died between then and now.
Deactivation grace period. A symbol that fails to fetch new bars for a single session is not immediately written off. There is a grace window. Real delistings are confirmed across multiple sessions; a single missing bar is more likely to be a vendor or calendar artefact than a death. Without this grace period the universe would lose names to administrative noise rather than to actual delisting.
False-positive reactivation. A symbol that was incorrectly deactivated is automatically reactivated when it next prints a bar. This matters because asymmetric error handling, too quick to deactivate, too slow to revive, produces a backtest universe that is steadily, silently smaller than the real one. The reactivation logic is the partner of the grace period.
Breadth-based macro classifier. The regime call that gates which of the four engines fires is built on the percentage of the universe trading above its 50- and 200-day moving averages, not on a price-weighted index. A price-weighted index can be held up by three mega-caps while the other ninety percent of names die. A breadth reading cannot. The regime call sees the death of the median name, not the theatre of the indices.

What this costs us

An honest universe produces a lower headline number than a survivor-screened one. The backtest CAGR on the coordinator stack would be higher if we ran it on the survivors. We do not. The cost is paid in marketing appeal and recovered in the durability of the live track , a strategy whose backtest is honest is a strategy whose live performance will not surprise its operators on the downside.

What we don't do

We do not filter the historical universe by “currently listed.” We do not backfill missing prices with values the company could not actually have traded at. We do not re-derive a delisted company’s exit by walking forward from the last clean bar; the trade is closed at the last actual print, with the gap noted. We do not allow the macro classifier to look at a price-weighted index as a regime input, because a price-weighted index is by construction the most survivor-biased indicator on the screen.

How a reader can check

The honest way to detect survivorship contamination in a published backtest is to ask the author three questions. How many names are in the universe at the start of the window and how many at the end? What is the procedure for a name that delisted halfway through? Is the regime signal computed from an index or from breadth?

The tell is not the size of the universe but its turnover. A survivor-screened universe built from today’s constituents usually still grows toward the present, recent listings have no early history, so a rising count proves nothing on its own. A clean, point-in-time universe instead contains names that were tradable early and then delisted before the end: dead tickers are present, with a documented exit procedure. A contaminated universe has none of them. The names that died are not there. They were never there. The strategy was only ever tested against the names that lived.

Sources & further reading

Brown, S. J., Goetzmann, W. N., Ibbotson, R. G. & Ross, S. A. (1992). “Survivorship Bias in Performance Studies.” Review of Financial Studies, 5(4), 553 to 580.

Shishin Research