Fixing the Sharpe Ratio: A Machine Learning Approach | Portfolio for the Future

The Sharpe ratio has long served as a simple but important item in the due diligence tool kit.

Formulated by William F. Sharpe in 1966 and first called the “reward to variability” ratio, the number arises from an investment’s rate of return minus the risk-free rate divided by the standard deviation of the excess return. Standard deviation serves as a proxy for risk, so the ratio should tell an investor how much excess return he is getting in return for each unit of risk he has accepted.

This is both simple and rational, two features which help explain its importance for a period of more than half a century now. But … the Sharpe ratio is far from noncontroversial. In general, it presumes a presumes a normal distribution of returns. Further, the fact that it uses the risk-free return in the numerator (rather than the required return) may yield misleading results.

When applied specifically to hedge funds, selection bias, back-testing and performance inflation plague the application of the Sharpe ratio. In this connection, specifically, the rise of big data, machine learning, etc. may prove germane.

Bailey and de Prado

It is worth looking at these issues again because David H. Bailey and Marcos Lopez de Prado have recently posted a revised version of their 2014 article on the Sharpe ratio. The article is called “The Deflated Sharpe Ratio,” and it develops the revised ration, the DSR for short, as a way of mitigating dangers that have arisen out of the advent of “large financial data sets, machine learning and high-performance computing.”

As the authors say, the amount of data used by most quant teas circa 2014 is “comparable to the memory stored by Netflix to support its video-streaming business nationwide.” There are unintended consequences of this vast increase in computational ability. Backtesting is an important example. A backtest is “a historical simulation of how a particular investment strategy would have performed” over a defined period in the past. It is difficult for analysts to resist the tendency to backtest, and 21st century technologies allow for a lot of it, very quickly.

The problem, though, is that any random sample extracted from a population will show patterns. Suppose I have 20 nickels in my pocket and throw them in the air within a small room. They will all fall to the ground. They won’t arrange themselves in an even distribution across the floor. There will be clusters.

A backtest is a bit like looking at the clusters of one such experiment and drawing inferences, then betting on the occurrence of similar clusters for a future toss. “Oh, I see several nickels are over near the radiator. There must be something about that radiator that draws the coins in that direction. I’ll bet on that!” That’s a bad idea.

Weeding out Statistical Flukes

Of course, if I try to experiment a few more times I may notice that there is no consistency to the clustering. There is no coin-attractive power in the radiator. Or, perhaps, I do in fact have a magnetic radiator and further experiments will confirm that. Either way, as Bailey and de Prado rightly say, the most important piece of information about any backtested strategy is the number of trials attempted, yet this parameter is absent from “virtually all backtests published in academic journals and investment offerings.”

In backtesting strategies, the article indicates, “false positives” are more of a worry to most investors than “false negative.” A false positive could lead investors to adopt a losing strategy. A false negative could cause them to miss an opportunity, to refrain from adopting what would have been a winning strategy. Given a reasonable level of risk aversion, one will accept some false negatives as part of a trade-off in order to stay clear of false positive.

The DSR, then, is a revision of the Sharpe ratio to correct for two leading sources of false positives (also known as performance inflation): selection bias and non-normally distributed returns. The deflation “helps separate legitimate empirical findings from statistical flukes.”