If you recently had a gathering with a couple of friends back from high school, you might have noticed that almost everybody seems to be at least familiar with trading some asset (most probably, cryptocurrencies).
There have been considerable advances in democratizing access to the markets, and you can now even open a brokerage account on Alpaca or Robinhood and trade with no fees (except for PFOR). All of this comes at the expense of leading many people to face considerable losses, many of which could have been avoided.
Although concepts such as backtesting and forward-testing are standard practices for all industry practitioners and even for some savvy retail investors, they are still underutilized by the vast majority. This is the topic of the article. In short: is backtesting accurate?
Although not always precise, backtesting is one of the most valuable tools available to traders who want a scientific approach to the financial markets. It is only via backtesting that practitioners can start evaluating the performance of a strategy before shipping it to a live environment. The reliability of the backtest itself is dependent on the type of strategy, the assumptions that are being made, the quality of the data available, and the experience of the trader in avoiding common mistakes.
Required features for reliable Backtesting
As many of us have found out the hard way, there are going to be a few surprises when shipping a strategy from the backtesting phase into live trading. This surprise factor, oftentimes in the shape of lower returns, can be mitigated by incorporating as many features and components from the real world into the backtesting framework.
In addition to the features that lead to more reliable results, there are also other components that should be sought after when choosing a backtester:
- Account for fees: it is essential to use the same fee structure you would have access to in a real-life scenario. This is obvious for less sophisticated but active traders but is a widespread mistake made in academia.
- Slippage occurs when the price of an asset changes between the time you submit an order and when the order gets filled. It will depend on your order type, execution algorithm (single order, multiple orders, time-weighted average price, etc.), order volume, and the underlying asset’s liquidity, among others.
- Reliable data source: not using a reliable data provider is one of the most frustrating and costly mistakes you can make. If you are getting started, there is no shame in using yahoo finance’s historical data. If you’re trading mainly on well-known and liquid assets, you can assume that the data is almost always going to be of relatively good quality. If, on the other hand, you want to trade low-volume micro-cap biotechnology companies from another country, you should be careful when choosing the data vendor. If your broker offers a data feed, it is good practice to use it.
- Fast processing: when backtesting a strategy with multiple assets and intra-minute bars across multiple years of data, you will notice the importance of using a backtesting framework optimized for processing speed. In technical terms, this usually means that the engine behind the backtester works in a vectorized fashion (as opposed to an event-based backtester).
- Rigorously tested: trading strategies come in all flavors, and even if you’re dealing with a very simple set of trading rules, you want to make sure that the backtester has been tested and improved during quite a few years and by a handful of users.
Advantages and Disadvantages of Backtesting
Although you might have noticed that I recommend backtesting every single strategy you come up with, this methodology has its obvious disadvantages. Regardless of its apparent advantages, do not fool yourself: backtests are imperfect!
Advantages of backtesting
It helps to avoid mistakes
Although a set of trading rules can be very simple, it is usually very difficult to estimate (guesstimate) the performance of a strategy under different scenarios and market regimes. You might even find an error in the strategy, like checking if we already hold a position on a given asset instead of placing an infinite number of orders.
Improves the understanding of a strategy
It is not enough to know what a trading strategy is supposed to do. Suppose your strategy consists in buying at the open, placing Take-Profit and Stop-Loss orders, and making sure that you’re selling before the closing bell. The behavior in the strategy is as simple as it gets. A backtest will not only help you estimate your future results but also comprehend how often a TP or SL gets triggered vs. how often you have to sell before the market closes. Also, it allows you to compare everything with the results you would have obtained by holding the assets overnight.
Provides Important Performance Metrics
Without backtesting, there is no way of knowing any of the most important key metrics a trader should know by heart. Here’s a brief and incomplete list of performance metrics that a backtest should provide:
- Average Return (%) / Best Trade (%) / Worst Trade (%)
- Number of trades (#)
- Performance against a Benchmark
- Max. Drawdown (%) / Max. Drawdown (days)
- Sharpe-Ratio / Calmar-Ratio / Sortino-Ratio
- Exposure-Time (%)
- Win Rate (%)
- Volatility (%)
Disadvantages of backtesting
Limited Representation of Reality
Much like in every simulation, you have to narrow down the level of complexity and choose a finite set of features you want to model.
In an extreme and infeasible simulation, you would incorporate every single agent participating in the stock market and assign him his actual behavior function. On the other end of the spectrum, you would backtest your strategy with only a handful of daily closing prices. The first approach is useless because it does not have a tractable solution, whereas the second will yield less reliable and statistically meaningless results.
On a more serious note, you almost always have to make an assumption regarding slippage unless you have access to Level III quotes. Furthermore, if you indeed have access to that level of data granularity, you have to make a judgment call and decide if your trading strategy requires adding that extra layer of complexity.
Assumes no market changes
Backtesting relies on the assumption that the past data is somewhat indicative of the future to come. The farther we go back into the past, the less realistic the assumption will be.
Most arbitrage opportunities available to high-frequency traders during the early 2000s are no longer available today, so using that decade to backtest high-frequency strategies will not give us any information about its present performance. A few years back, there were plenty of market-making opportunities in the cryptocurrency exchanges that are no longer viable today.
Depending on the specific industry or asset, it might not be a good idea to use data prior to the start of the COVID-19 pandemic.
Common mistakes to avoid in Backtesting
“An unsophisticated forecaster uses statistics as a drunken man uses lamp-posts – for support rather than for illumination.”Andrew Lang
Introduce Lookahead Bias
The most straightforward case of lookahead bias is when your set of rules uses data that would not already be available during live trading. In an absurd case, you could use tomorrow’s closing price to determine today if you want to buy an asset now. In live trading, you are, by definition, unable to do so.
A backtesting simulation can be as bad a representation of reality as you want it to be.
There are fewer extreme cases in which lookahead bias is introduced into the backtest. Using the most recent closing price both as the execution price and as part of a technical indicator is a subtle mistake that is frequently made. A standard solution is to assume that orders get filled at the next open.
This problem consists in choosing a set of assets that are available today and backtesting them. Especially with longer timeframes, you are selecting assets guaranteed not to have disappeared. In other words, if you do not incorporate delisted stocks into your backtests, you are assuming that it is impossible for you to have bought into Enron or Lehman Brothers.
If not correctly adjusted for sampling errors, you might choose only from a set of over or underperforming assets depending on your criteria.
For example, if you decide to do a backtest with the 100 biggest US companies, measured by market cap, you’re introducing a sampling mistake. There is no way around it: only companies that have over-performed in the past will be a part of this group, meaning that you have chosen mainly winners.
Also known in the statistical community as “p-hacking” or “significance chasing,” data snooping refers to the practice of looking at the results of a test and changing the parameters of the model to improve the results.
Most often than not, you will end up having a trading strategy that is perfectly capable of explaining past noise in the data but utterly useless in detecting patterns that may also be found in future prices.
Data snooping is closely related to the concept of lookahead bias. If you change the parameters of your strategy based on the results you obtained, you are using information that would otherwise not have been available at the time of decision-making.
In a statistical sense, overfitting occurs when the practitioner creates a set of rules or functions almost perfectly aligned with past data.
For example, a simple strategy where you buy a stock whenever its current price is greater than its N-day moving average. Conversely, you sell the asset when the contrary occurs. You might be tempted to find the time window of the moving average that maximizes the strategy’s return for the period. To do so, you create a function that iterates from 2 to 100, compares all the results, and returns the highest one. You are now guilty of overfitting a strategy.
Let’s think about it for a few seconds and combine two relatively simple assumptions:
- We know that most trading ideas have no real edge and only have a random probability of outperforming their benchmark.
- On the other hand, if we optimize any trading idea, we can be confident that we will always arrive at least one that yields the best returns.
Thus, the optimized set of parameters probably only has a random probability of outperforming its benchmark.
This point encapsulates all the remaining frequent mistakes that you can make. If our broker charges 0.1% of the notional, but we choose to backtest without any fees, we are doing a poor job at emulating the actual conditions given to us. If our trades have an average return of 0.05% and our fees stay at 0.1%, we will learn what asymptotically converging to 0 means.
As you might already have guessed, doing extensive backtests and obtaining promising results is by no means a reflection of the future performance of a trading strategy. We have covered most of the things every practitioner worth its salt has to consider to extract as much information as possible from this technique.
Algorithmic trading is both a science and an art form that deals with finding a probabilistic edge. A strategy that looks great on backtests is not a guarantee of future success, but not backtesting a strategy is almost a certain guarantee of future losses.