Source: How to Build a Backtesting Engine in Python Using Pandas | by Jachowski | Medium
Jachowski
Apr 22, 2022
A simple way to build an easily scalable backtesting engine to test your Trading Systems in Python using only Pandas and Numpy
Backtesting is a crucial step in designing your Trading Systems, I would say that it is the crucial step given that it assesses the viability of your strategies.
Just imagine: Earth, 2050. The first flying car ever is released on the markek but it’s never been tested. Would you buy it? I think (hope) no.
This simple analogy intends to highlight the importance of backtesting: before investing through a whatever algorithmic model, test it, again and again, even if your favourite financial guru on YouTube says that a certain strategy will provide a 100% return in less than a year.
Believe in what you see, not in what they tell you to see.
In this sense, it’s not the best idea to use a pre-built engine for backtesting using libraries such as Backtrader for many reasons: you can’t neither properly see what is going on in there nor modify it as much as you want.
Remember, the second principle of the Zen of Python states that “Explicit is better than implicit”. If you can build explicit functions by your own instead of using black-box pre-built ones, go for it.
Oh, and the third principle says that “Simple is better than complex”. Let’s see how easily you can backtest your strategies with Pandas.
The Idea
This is what we’re going to do:
- Import the libraries
- Import stock data
- Define a trading strategy
- Define a market position function
- Define a backtesting function
Let’s get into code stuff!
1. Import the Libraries
Let’s import the three libraries we need. Said and done:
import numpy as np
import pandas as pd
import yfinance as yf
2. Import Stock Data
Let’s download 20 years of Amazon (ticker AMZN) stock data.
amzn = yf.download('AMZN', '2000-01-01', '2020-01-01')
3. Define a Trading Strategy
In this case, we’re going to test one of the most popular strategies: the Double Moving Averages Crossover.
First of all, we have to define two Simple Moving Averages. That’s how:
def SMA(array, period):
return array.rolling(period).mean()
That is, this function has three arguments:
- dataset is the dataframe that contains the stock data we previously imported (AMZN stock data),
- array is the series we will apply the function on (Close Prices) and
- period is the lenght of our moving averages (e.g. 14 and 200 days).
The function returns a sliding window (.rolling()
) of a desired lenght ((period)
) of an array (array
) on which it is computed the arithmetic mean (.mean()
).
Let’s define the two moving averages we will use. The first is the shorter-term (14 days), while the second is the longer-term (200 days):
sma14 = SMA(amzn['Close'], 14)
sma200 = SMA(amzn['Close'], 200)
This is what we get:
Now, we need to define the entry rules and exit rules of our strategy, which are the crossover and the crossunder, respectively.
In other words, we get an:
- entry (buy) signal when the shorter-term moving average (14 days) crosses above the lower-term moving average (200 days)
- exit (sell) signal when the shorter-term moving average (14 days) crosses below the longer-term (200 days).
def crossover(array1, array2): return array1 > array2 def crossunder(array1, array2): return array1 < array2
And after that we assign crossover to the enter rules and crossunder to the exit rules:
enter_rules = crossover(sma14, sma200)exit_rules = crossunder(sma14, sma200)
Basically, we obtain two boolean series (True or False):
enter_rules
is True whenever sma14 > sma200 whileexit_rules
is True whenever sma 14 < sma200.
Hence, looking at the images above of the series sma14 and sma200, we expect to find False on the enter_rules
on the 13th of October, 2000, since 33.5714 < 51.9385, i.e. sma14 < sma200.
Let’s check for it:
check = enter_rules[enter_rules.index == '2000-10-13']
print(check)
This is the starting point.
Now we fly. But not with that never tested flying car.
4. Define a Market Position Function
Here, we’re going to create a function that defines the ongoing trades: to achieve this, we will create a switch that:
- turns on if
enter_rules
is True andexit_rules
is False and - turns off if
exit_rules
is True.
Here it is the function:
def marketposition_generator(dataset, enter_rules, exit_rules):
dataset['enter_rules'] = enter_rules
dataset['exit_rules'] = exit_rules
status = 0
mp = []
for (i, j) in zip(enter_rules, exit_rules):
if status == 0:
if i == 1 and j != -1:
status = 1
else:
if j == -1:
status = 0
mp.append(status)
dataset['mp'] = mp
dataset['mp'] = dataset['mp'].shift(1)
dataset.iloc[0,2] = 0
return dataset['mp']
It takes three arguments:
- dataset is the dataframe that contains the stock data we previously imported (AMZN stock data),
- enter_rules is the boolean series containing the entry signals and
- exit_rules is the boolean series containing the exit signals.
On the first two rows we copy on our dataset the exit and the entry rules. status
is the switch and mp
is an empty list that will be populated with the resulting values of status
.
At this point, we create a for loop with zip
that works like a… ye, a zipper, enabling us to do a parallel iteration on both enter_rules
and exit_rules
simultaneously: it will return a single iterator object with all values finally stored into mp
that will be:
mp
= 1 (on) wheneverenter_rules
is True andexit_rules
is False andmp
= 0 (off) wheneverexit_rules
is True.
Note: in Python, True corresponds to 1 but here, in the if j == -1
statement related to the exit_rules
, True is -1. Later on it will be clear the reason of that.
In the last three lines, we add mp
to our dataset, we forward shift its values by one period so that the trade starts the day after we received the signal and in the last line we substitute the nan value, subsequent to the shift operation, with 0. The function returns the mp
series.
5. Define a Backtesting Function
Last step. We’re close to the end, hang on!
First of all, we have to define some parameters such as:
- COSTS: fixed costs per trade (i.e. transactions’ fee)
- INSTRUMENT: type of instrument (1 for stocks, 2 for futures, etc.)
- OPERATION_MONEY: initial investment
- DIRECTION: long or short
- ORDER_TYPE: type of order (market, limit, stop, etc.)
- ENTER_LEVEL: entry price
COSTS = 0.50
INSTRUMENT = 1
OPERATION_MONEY = 10000
DIRECTION = "long"
ORDER_TYPE = "market"
ENTER_LEVEL = amzn['Open']
We’re assuming that:
- COSTS: every operation will cost us 50 cents, 25 to buy and 25 to sell
- INSTRUMENT: the system will be tested on a stock (AMZN)
- OPERATION_MONEY: the initial capital is 10k dollars
- DIRECTION: the strategy will be tested for long trades
- ORDER_TYPE: the strategy will process market orders
- ENTER_LEVEL: the entry price corresponds to the open price
And here it is the best part:
Let’s analyze the function line by line.
From line 3 to line 5 we add the two boolean series and the market position function to the dataset.
Note: In the previous note, I told you that everything would have been clear: in the lambda function of exit_rules
, all values equal True are assigned to -1 while False values are assigned to 0. Thanks to that, marketposition_generator
runs wonderfully.
From line 7 to line 12 we define market orders for stocks:
- In lines 7–9 we define the
entry_price
: if the previous value ofmp
was zero and the present value is one, i.e. we received a signal, we open a trade at the open price of the next day; - In lines 10–12 we define
number_of_stocks
, that is the amount of shares we buy, as the ratio between the initial capital (10k) and theentry_price
;
In line 14 we forward propagate the value of the entry_price
;
In lines 16–17 we round number_of_stocks
at the integer value and forward propagate its value as well;
In line 20 we associate the label 'entry'
to 'events_in'
every time mp
moves from 0 to 1;
From line 22 to line 27 we define the long trades:
- In line 24 we compute
open_operations
, i.e. the profit; - In line 25 we adjust the previous computation of
open_operations
whenever we exit the trade: whenever we receive an exit signal, the trade is closed the day after at the open price. Here, round turn costs are included;
From line 28 to line 33 we replicate for short trades what was said for long trades: to test short trades you just have to set DIRECTION = ‘short'
;
In line 35 we assign open_operations
equal 0 whenever there is no trade in progress;
In line 36 we associate the label 'exit'
to 'events_out'
every time mp
moves from 1 to 0, i.e. we receive an exit signal;
In lines 37–38 we associate the value of open_operations
to operations
only when we’re exiting a trade, otherwise nan
: by doing so, it will be very easy to aggregate data;
In line 39 we define the equity_line
for close operations and in line 40 it is defined the equity_line
for open operations;
In line 42 we save the resulting dataset in a csv file.
Let’s call the function and inspect the results.
COSTS = 0.50 INSTRUMENT = 1 OPERATION_MONEY = 10000 DIRECTION = "long" ORDER_TYPE = "market" ENTER_LEVEL = amzn['Open']trading_system = apply_trading_system(amzn, DIRECTION, ORDER_TYPE, ENTER_LEVEL, enter_rules, exit_rules)
These are two long trades registered by the Trading System:
To check if the Trading Strategy— the Double Moving Averages Crossover — produced profitable long trades in the time period considered for that stock, you can just digit:
net_profit = trading_system['closed_equity'][-1] - OPERATION_MONEY
print(round(net_profit, 2))
A return of almost 500% in 20 years. Not suprising considered that Amazon stock increased by 2400% in those 20 years and we used a trend-following strategy.
That’all for this article. Hope you’ll find it helpful.
Let me know if you could be interested in seeing extensions of this backtesting engine, for example how to implement limit orders.
In case you need clarification or you have advices, feel free to contact me on Telegram: