AI and Machine Learning in Quant Trading… Is it all hype? Is it a revolution? What are some issues?

Henry Booth
4 min readMay 13, 2021

Here I describe how AI and Machine Learning are used and how it will affect quantitative trading in the next 10 years.

First, there is a difference between Artificial Intelligence and Machine Learning. AI is the whole collective conception of a computer being able to think like a human. Whereas, Machine Learning is a subset of AI, and is the ability of a machine to learn from data with no explicit programming.

AI and Machine Learning are hot topics in quant trading and can feel new areas. But, while perceived as magic to some, both are rooted in mathematics. Machine Learning techniques are statistically driven and have been used by quants for a long time.

Advances in computer processing power, availability of big data and media attention have created hype. Some say we’re right at the peak of inflated expectation according to the Gartner hype curve.

Machine Learning is most effective at improving parts of the trade life-cycle process, such as data processing & modelling, forecasting & signal research, risk management and execution.

Data processing & modelling have benefitted from Machine Learning. It has made the accumulation and exploration of data far easier. ML allows a quant to look at far more data in a shorter period.

Alternative data will grow over the next ten years, especially when you consider the quantity of data we create. The World Economic forum believes we will create 463 Exabyte’s per day by 2025[1]! The internet only created one Exabyte a day in 2012[2]… An Exabyte is a 1 byte followed by 18 zeros!! One poll found 69% of funds are already using alternative data.[3] Machine Learning used is with alternative data to find new signals or enhance existing ones.

There are many examples of how alternative data is used. In one well-known example, a fund used flight tracking data to predict a merger[4]. In others, satellite imagery is being used to assess crop yields in commodity trading. Credit card data and footfall data are being used in equities. While sentiment analysis appears to be a productive predictor.

Alternative data is attractive, but for ML to be effective, data sets need to be very large with a long history. Any ML algo is only as good as the data we feed it, so it needs to be high-quality data. Many big data sets are only a couple years old and can be incomplete / inaccurate, so provide little predictive value.

Because of this, some argue whether the insights are valuable and the low signal-to-noise ratio makes it difficult to build a model. Credit card data will not show if there was a sale on, which caused the increased spending, and so unlikely to lead to an uptick in profits, for example.

Another concern is privacy, how the data was gathered and who has the rights to the data. This theme has been growing over the years and with Apple’s latest update even more prevalent.

Machine Learning has had and will continue to impact forecasting and finding new patterns. It could discover unknown factors. However, they would likely still need to be grounded in an underlying economic factor, which are well known, so chances here are slim. That said, Machine Learning increases the scale a quant can work at, like the scale of data they consider, as mentioned, or the scale of research they do. For example, Machine Learning could be better at combining non-linear signals or pooling many weak predictors.

Within forecasting, Deep learning, a form of Machine Learning, is having a big effect because it has excellent prediction power. But we struggle to understand how this predictive power is created, which can prove an issue for internal analysis. Being able to interpret and explain the model is key for compliance, investor confidence, and risk analysis.

In the future, we’ll see more Machine Learning algos taking actions, in particular in trade execution. Reinforcement learning, another type of ML, is being used to model a multi-agent approach in trade execution on a microstructure level, analysing the limit order book. In fact, reinforcement learning is trending nowadays for many aspects of quant trading, including portfolio construction & optimisation, as well as different clustering and prediction problems.

A major hurdle for ML is the complexity and scale of financial markets. Financial markets are a highly complex multi-agent system with billions of interactions between humans and algos. ML models have difficulty going beyond a dozen agents so far. This is the biggest reason why we don’t have a fully autonomous ML strategy. When you add in the non-static nature of markets, it makes it almost impossible.

Building good ML models for non-trivial problems in quant trading with the ever changing market dynamics is hard. You need to have large amounts of high quality data (which may not even exist given the changes in market dynamics with new financial products, new regulation, and new algos), a “good” model, and matching hyper parameters. It is very easy to go wrong, very hard to get right.

So, we’re a long way from having a fully automated ML based quant strategy that can do the entire investment process in a hands-off manner. If someone suggests they’ve done it, it’s likely too good to be true!

Nonetheless, it’s an exciting time for Machine Learning, it will continue to make a tremendous impact in quantitative trading over the next 10 years. Particularly, on individual parts of the investment process, like forecasting, modelling or execution.

How do you think Machine Learning will impact quant trading over the coming years?

[1] https://www.weforum.org/agenda/2019/04/how-much-data-is-generated-each-day-cf4bddf29f/

[2] https://www.zmescience.com/science/how-big-data-can-get/

[3] https://www.hedgeweek.com/2020/05/04/285283/hedge-funds-use-alternative-data-tipped-surge-new-industry-study-finds

[4] https://www.marketwatch.com/story/the-explosion-of-alternative-data-gives-regular-investors-access-to-tools-previously-employed-only-by-hedge-funds-2019-09-05

--

--