## Introduction

First of all, I need to clarify there is no trading strategy without risk and no trading strategy can guarantee a profit. However, it makes sense to apply strategies where the expected return is positive, in other words, to be more likely to make money than to lose. In this post, we will provide an example of a trading strategy based on pairs trading concepts.

## The Assumption

We believe that in the market, there exist some cryptocurrencies that follow others with a lag equal to `k`

. For this analysis, we will examine the k equal to **1-hour**, but feel free to try different values of `k`

. So, the idea is that if there is a “Leader and Follower” pair of cryptos, then we may predict the movement of the “Follower” based on the movement of the “Leader”, is the leader goes up at time `t=0`

then we expect the “Follower” to go up in the next `k`

interval and vice versa.

## Analysis

We will get the 1-hour data of all cryptocurrencies’ close prices for the last 3.5 months. For all pairs (X,Y), we will test if (Xt,Yt-1) or (Xt-1,Yt) is co-integrated. For this analysis, we work with the log prices. Thus, if we identify the leaders and the followers, we expect that when there is an increase in X then the Y will follow and the strategy is to buy Y when X diverges significantly from its equilibrium. Of course, there are many risks like the chance that the X will never converge back to its equilibrium. Maybe there is an equilibrium again but because the correction occurred only to X and the Y never followed etc. You can find the **cleaned data** here and you can follow along with the tutorial. For this analysis, we will work in Python.

## Coding

Let’s get our hands dirty. Because there are many cryptos with a value less than 1 dollar, and the log price becomes negative, we will multiply all the prices by 100 like representing the prices to cents instead of dollars. Notice that the time is in UNIX Timestamp.

# http://web.pdx.edu/~crkl/ceR/Python/example14_3.py df = pd.read_csv("clean_data.csv") df.set_index('open_time', inplace=True) df = df*100 # get the log prices and remove the first not NA row # in order to have the same period of data with the leader (lag(1)) log_prices = np.log(df).dropna(how='any') log_prices = log_prices.iloc[1:] # get the log prices of the leader, where by leaders we refer to the lag 1 leaders_log_prices = np.log(df.shift(1)).dropna(how='any') # remove the USDT suffix from the column names log_prices.columns = for c in log_prices.columns] leaders_log_prices.columns = for c in leaders_log_prices.columns] log_prices

Try to find co-integrated pairs by taking into account only the pairs which are highly correlated (>0.95) Apply the Co-Integration Test and run the linear regression without intercept i.e log(leader) = beta x log(follower).

my_list = [] Leader = [] Follower = [] Cointegration = [] Beta = [] for i in log_prices.columns: for j in log_prices.columns: if (i!=j and log_prices[i].corr(leaders_log_prices[j])>0.95): Leader.append(j) Follower.append(i) my_list.append(log_prices[i].corr(leaders_log_prices[j])) Cointegration.append(ts.coint(leaders_log_prices[j], log_prices[i])[1] ) Beta.append(np.linalg.lstsq(log_prices[i].values[:,np.newaxis], leaders_log_prices[j].values, rcond=None)[0][0]) output = pd.DataFrame({'Leader':Leader, 'Follower':Follower, 'Value':my_list, 'Cointegration':Cointegration, 'Beta':Beta}) # keep only the cointegrated parts p-value 0.01 output = output.loc[output.Cointegration<0.01] # remove the negative Betas output = output.loc[output.Beta>0]

We come up with 277 co-integrated pairs which satisfy the following conditions:

- Co-integration test p-value<0.01
- Correlation>0.95

We can plot all the spreads which are of the form **Spread = Leader – Beta x Follower**

for i in range(sample.shape[0]): plt.figure(figsize=(12,6)) plt.plot(leaders_log_prices[sample.iloc[i]['Leader']] - log_prices[sample.iloc[i]['Follower']]*sample.iloc[i]['Beta']) plt.axhline(y=0.0, color='r', linestyle='-') plt.title(sample.iloc[i]['Leader'] + "-" + str(sample.iloc[i]['Beta']) + " X " + sample.iloc[i]['Follower']) plt.show()

Example of some co-integrated pairs

## The Takeaway

Now, you can apply a strategy based on statistical arbitrage. You can even go in two directions, by goings long the one asset and going short for the other. Generally speaking, you can try many different strategies based on the assumption of the Leaders and Followers. In the real world, when you apply high-frequency trading you have to pay the transaction fees where most of the time is higher than the expected returns. So, this strategy may work BUT assuming that there are no fees ðŸ™‚

Happy trading!