CEU eTD Collection (2022); Ligeti, Mate: A Quantitative Approach - Analysis and Enhancement of a Simple Trading Strategy based on K-Means Clustering (Capstone Summary)

CEU Electronic Theses and Dissertations, 2022
Author Ligeti, Mate
Title A Quantitative Approach - Analysis and Enhancement of a Simple Trading Strategy based on K-Means Clustering (Capstone Summary)
Summary The Client provided the Project with clear task description:
• Find a quantitative strategy idea on public ‘quant’ forums or websites (e.g.: Quant News,
Quant Net, QuantConnect, Wilmott, NuclearPhynance etc.)
• Build the strategy in Python
• Analyze (backtest) it on historical data for a set of major FX-es
• Evaluate its performance and potential for further use (including enhancement of the strategy with different configurations and ML tools)
The article we chose was K-Means Clustering and Creating a Simple Trading Rule for Smoother
Returns (Bergstrom 2018) published on QuantNews. It has described a simple strategy in which we take the volume and volatility data of an asset (SP500 eMini futures in the article), and cluster the trading days with K-Means clustering into volatility/volume groups. We assume that middle volatility/volume clusters indicate superior returns for the following day. We always invest 1$, hold it for exactly 1 day, then close the position. We enter Long, whenever our select cluster “gives signal”.
As a first step, we replicated the strategy to validate the published results and see if it still holds for FX instruments.
We saw that for certain instruments this
‘ ;goldilocks 019; volatility cluster may present itself, but the results were inconsistent across assets – we needed further research.
Our initial approaches involved sophisticating the clusters and making a comparison of the K-Means
(automated) clusters with manually set statistical quartile clusters (later referred to as ‘expanding’ method). In this method we had classified all our historical trading days into 4 volume and 4 volatility groups by comparing the days’ said metrics to expanding quartile limits of prior trading days.
The same results occurred – we had good performing clusters, but not consistently in the same volatility range across assets. Noticing that the original strategy always went long on signal, we tried to further sophisticate the model by adding sign (+/-) predictions for the model. Now we did not only enter market, when we were in the select volatility/volume range, but also only when we got either positive (entering Long) or negative (entering Short) predictions. This resulted in ambiguous results again – for Long position the filter albeit limiting the best performing cluster, it also seemed to improve more clusters’ performance in similar volume/volatility ranges.
However it didn’t work vice versa. Short signals produced best results in unexpected clusters and great losses in others.
Together with the Client we decided to:
1. Examine the methods (manual clustering based on expanding quartiles, K-Means, random forest, linear regression and boosting) separately and not in combinations (see ML +/- filtering for manual clusters)
2. Instead of looking at results in a cluster level, we used all methods for prediction and let them go L/S based upon their predictions
3. Use single PnL metric to compare each methods, instead of looking at cluster performances
– because we use clusters’ historical performances to decide on a L/S entry, this single PnL metric should hold all the clusters’ individual performances
For our final analysis we have built 5 different models: besides the original K-Means, manual clustering (‘expanding’), 3 Machine Learning models – Elastic Net, Random Forest and Boosting.
To produce predictions for the non-ML/half-ML methods (K-Means was only used for clustering, not to predict returns) we have set historic performances of clusters as an expected value – thus retrieving predictions. The PnLs were then calculated as:
𝑚 ;𝑎 461;ℎ&#x 1d452;𝑚 𝑎ǔ 61;𝑖&#x 1d450;𝑎 𝑙 𝑠ǔ 56;𝑔&#x 1d45b; (+/−) 𝑜𝑓 𝑜𝑢𝑟 𝑝ǔ 5f;𝑒&#x 1d451;𝑖 𝑐ǔ 61;𝑖&#x 1d45c;𝑛 𝑠 × 𝑎ǔ 50;𝑡&#x 1d462;𝑎 𝑙 𝑝ǔ 5f;𝑖&#x 1d450;𝑒 𝑐Ȑ e;𝑎 d45b;𝑔& #x1d452; = 𝑃ǔ 5b;𝐿
We compared results on equity (PnL) curves and calculated model performances by 3 PnL metrics:
1. Simple (non-adjusted) annualized returns
2. Risk-adjusted returns (Sharpe-ratio)
3. Drawdown-adjusted returns (Martin ratio or Ulcer Performance Index)
After our enhancements so far – increasing clusters, adding squares and cross product to features, implementing 5 different models (including 3 Machine Learning tools), differentiating Long/Short entries – we have attempted to improve results further by trying to engineer more volatility features
(see ‘boost2’ model).
After these analyses were automatized we have run the simulation across 7 major pairs of the US dollar to see if we can recognize some patterns across assets.
Supervisor Schindele, Ibolya
Department Economics MSc
Full texthttps://www.etd.ceu.edu/2022/ligeti_mate.pdf

Visit the CEU Library.

© 2007-2021, Central European University