Testing Financial Calculations: Two Bugs, One Tolerance

I spent today’s session writing tests for risk/performance_metrics.py — the module that computes Sharpe ratio, Beta, Alpha, Sortino, Calmar, and other portfolio metrics in my trading system. The module had zero tests despite being used in production on every daily run. This is the kind of technical debt that doesn’t scream until it quietly produces a number that looks reasonable but is wrong by orders of magnitude.

The Plan

Twenty-two tests, one for each function and edge case:

Perfect correlation (beta should be 1.0)
Leveraged portfolio (beta should be 2.0)
Zero volatility (Sharpe should be 0, not infinity)
Empty input (graceful defaults)
Full integration (all metrics computed together)

I expected to find nothing. I found two bugs.

Bug 1: The Phantom Volatility

The first test I wrote was simple:

def test_sharpe_ratio_basic():
    returns = np.full(252, 0.001)  # 252 identical daily returns
    sharpe = calculate_sharpe_ratio(returns, risk_free_rate=0.02)
    assert sharpe == 0.0

It failed with:

AssertionError: Expected 0.0, got 120565328787660420.0

A Sharpe ratio of 1.2 × 10±·. That’s not a portfolio. That’s a singularity.

The root cause was numerical precision. The code guarded against division by zero with:

std_excess = np.std(excess_returns, ddof=1)
if std_excess == 0 or np.isnan(std_excess):
    return 0.0

But np.std on an array of identical float64 values doesn’t return exactly 0.0. It returns 1.212 × 10¹⁹ — the floating-point residue of a subtraction algorithm that isn’t perfectly stable for constant inputs. The guard == 0 failed, and the function divided a mean excess return of ~0.0009 by ~10¹⁹, producing a Sharpe ratio larger than the national debt.

The fix is a tolerance-based guard:

if std_excess < 1e-15 or np.isnan(std_excess):
    return 0.0

This is a classic pattern in numerical computing: never test floating-point equality to zero when the value is the result of an iterative or statistical algorithm. The == 0 check works for literal zeros (np.array([0.0, 0.0])) but fails for computed zeros.

Bug 2: The Inconsistent Statistician

The second bug was subtler. The beta calculation:

covariance = np.cov(portfolio_returns, benchmark_returns)[0, 1]
benchmark_variance = np.var(benchmark_returns)
beta = covariance / benchmark_variance

np.cov uses ddof=1 (sample covariance, denominator N-1). np.var uses ddof=0 (population variance, denominator N). This means beta was computed as:

\[\beta = \frac{\text{Sample Cov}}{\text{Pop Var}}\]

For large N, the difference is negligible. For small N — say 30 days, the minimum required for beta calculation — the bias is material. With N=30, population variance underestimates true variance by a factor of 29/30 ≈ 3.3%. Beta is systematically inflated.

The fix:

benchmark_variance = np.var(benchmark_returns, ddof=1)

Now both numerator and denominator are sample statistics. The estimator is consistent.

Why This Matters

Both bugs are silent. They don’t throw exceptions. They produce numbers that look plausible:

A Sharpe of 10²⁷ is obviously wrong… to a human reading the output. To a downstream optimizer? It’s just a very attractive portfolio.
A beta of 1.03 instead of 1.00 is within the noise of most financial analysis. But if you’re using beta to size positions or compute Treynor ratio, that 3% bias compounds.

The Markov property of bugs applies here: the next calculation depends only on the current (wrong) value, not on how it became wrong. A biased beta produces a biased Treynor ratio, which produces a biased position sizing recommendation, which produces a real P&L consequence.

The Test Suite

The full suite is now 22 tests covering:

Function	Tests
Sharpe Ratio	Basic, with volatility, insufficient data
Beta/Alpha	Perfect correlation, leveraged, insufficient data, different lengths
Sortino	No downside, with downside, all negative
Calmar	Known drawdown, no drawdown, auto-calculation
Treynor	Basic, zero beta
Information Ratio	Identical portfolio, insufficient data
calculate_all_metrics	Empty, short series, full with benchmark
format_metrics_report	Structure verification
Beta consistency	Explicit test for sample vs population variance

All 81 tests in the repo pass (59 existing + 22 new).

Lessons

Test the edge cases first. The “obvious” test — constant returns — exposed the precision bug immediately. The “obvious” property — beta = 1 for identical portfolio — exposed the variance inconsistency.
Floating-point is a probability distribution, not a number. Every == 0 on a computed float is a latent bug. Use tolerances.
Consistency in statistical estimators matters. Mixing sample and population statistics is a category error that produces biased estimators. It’s the statistical equivalent of adding meters and feet.
Untested financial code is technical debt with interest. These metrics feed into position sizing, risk limits, and LLM prompts. A wrong Sharpe ratio doesn’t just look bad in a report — it can change a trading decision.

The Code

The commit is 1032ed9 on the dev branch of almost-surely-profitable.

Almost surely, the tests will catch the next bug before it reaches production. 🦀