Source: Pandas OHLC: Convert Tick by Tick Data to OHLC Data | IBKR Quant
Trade data in its raw form is tick by tick data, where each tick represents a trade. It is very useful, but it has far too much noise. In this blog, we convert this tick by tick (TBT) data into an OHLC (Open, High, Low, and Close) format using the resample function of the Pandas library.
We cover:
- What is Tick by Tick Data?
- What is OHLC Data?
- Converting tick by tick data to OHLC data
- Step 1 – Import pandas package
- Step 2 – Load the data
- Step 3 – Resample the data
What is Tick by Tick Data?
Before we go any further let’s understand what tick by tick data is. A single transaction between a buyer and a seller is represented by a tick.
In other words, when a buyer and a seller do the transection of the number of stocks on an agreed-upon price, it is represented by a tick. Multiple transactions of this type can happen under a second and each of them would be represented by a tick.
The tick by tick data looks like this.
This data was downloaded from First Rate Data
The first column of the data is the date and time at which the trade occurred. The second column is the last traded price (LTP) and the third column is the number of bitcoins traded in that particular transaction that is the last traded quantity (LTQ).
And the last column is the exchange code. For this tutorial, we will use the bitcoin data of the fourth of February 2021. This data was downloaded from FirstRate Data.
We can add a section of a couple of suggested reads about the existing blogs on data, preprocessing, etc.
We can also add one paragraph about Pandas here and link it to our blog.
What is OHLC Data?
As we now know, the tick by tick data is very high-frequency data. What if we want to quickly check the moment of the price in 1 min, 10 mins or 1 day?
We would have to check each tick manually to check the price moment. This sounds burdensome, but it can actually be done very quickly if we summarise the data into open, high, low, and close prices.
Now we will walk through the whole process of converting the tick by tick data into OHLC format using the resample function from the pandas library.
Converting tick by tick data to OHLC data
One can convert tick by tick data to OHLC data using the following steps:
Step 1 – Import pandas package
Pandas is a popular Python package that is most widely used to handle tabular data. Pandas is used for important functions such as data wrangling, data manipulation, data analyses etc.
import pandas as pd
import_pandas.py hosted with ❤ by GitHub
Step 2 – Load the data
Data is stored in my working directory with the name ‘BTC_2021-02-04.csv’. We are setting the Date time column as the index. As we saw earlier, the data is without a header. Hence we would add the header to the data while importing it. Thus importing and adding header takes place in the same line of code.
# Reading the data
data = pd.read_csv('BTC_2021-02-04.csv', names=['Date_Time', 'LTP', 'LTQ', 'Exchage Code'], index_col=0)
# Convert the index to datetime
data.index = pd.to_datetime(data.index, format='%Y-%m-%d %H:%M:%S:%f')
# Print the last 5 rows
data.head()
read_tick_data_csv_file.py hosted with ❤ by GitHub
This is what the data frame looks like:-
Step 3 – Resample the data
We will now use the resample method of the pandas library. The resample method allows us to convert tick by tick data into OHLC format. We shall resample the data every 15 minutes and then divide it into OHLC format.
If you want to resample the data into smaller timeframes (milliseconds/microseconds/seconds), use L for milliseconds, U for microseconds, and S for seconds.
# Resample LTP column to 15 mins bars using resample function from pandas
resample_LTP = data['LTP'].resample('15Min').ohlc(_method='ohlc')
# Resample LTQ column to 15 mins bars using resample function from pandas
resample_LTQ = data['LTQ'].resample('15Min').sum()
resample_LTP.py hosted with ❤ by GitHub
A snapshot of tick-by-tick data converted into OHLC format can be viewed with the following line of code:-
# Print the last 5 rows of resampled LTP
resample_LTP.head()
LTP_head.py hosted with ❤ by GitHub
# Print the last 5 rows of resampled LTQ
resample_LTQ.head()
LTQ_head.py hosted with ❤ by GitHub
You may concatenate ask and bid price to have a combined data frame.
# Concatenate resampled data
resample_data = pd.concat([resample_LTP, resample_LTQ], axis=1,)
# Print the last 5 rows
resample_data.head()
resample_concat.py hosted with ❤ by GitHub
Conclusion
This blog described a quick way of computing the OHLC using TBT data. This can be applied across different assets and one can devise different strategies based on the OHLC data.
We can also plot charts based on OHLC and generate trade signals. Some other ways the data can be used are to build technical indicators in python or compute risk-adjusted returns.
Want to learn about algorithmic trading? Check out Quantra’s learning track on Algorithmic Trading for Everyone which is a set of 7 courses and covers a wide variety of topics such as Day Trading, Machine Learning, etc. Be sure to check it out!
For additional insight on this topic, see the full article here: https://blog.quantinsti.com/tick-tick-ohlc-data-pandas-tutorial/.