The Evolution of Statistical Arbitrage: Rise of Alternative Data and Shorter Holding Periods

Henry Booth
DataDrivenInvestor
Published in
16 min readJul 5, 2023

--

Photo by Joshua Sortino on Unsplash

Quantitative trading has long relied on statistical arbitrage, which uses complex mathematical models to spot and exploit fleeting price differences between related financial assets. Over the last decade, as a headhunter specialising in this field, I’ve observed significant shifts in the stat-arb trading landscape, with two notable trends coming to the forefront: the growing use of alternative data and the trend towards shorter holding periods.

In this LinkedIn blog post, we explore the evolution of statistical arbitrage, from its inception featuring straightforward pairs trading and mean reversion strategies primarily based on technical data to today’s sophisticated methodologies. Our discussion will focus on the rising significance of alternative data sources and the transition from holding periods of one to two weeks to predominantly intraday operations in the context of stat-arb strategies.

The Early Days of Statistical Arbitrage

  • What is Statistical Arbitrage?

Statistical arbitrage, commonly abbreviated as Stat Arb, is a quantitative investment approach typically utilised by hedge funds. This strategy involves using intricate mathematical models to detect trading prospects originating from market inefficiencies. It operates on the concept that when the price of interrelated securities strays from its usual correlation, there’s a high probability it will revert to its average over time. Factors such as mispricing, market sentiment, or temporary supply-demand imbalances could cause this divergence based on any statistical measure like correlation or cointegration.

Statistical arbitrage trading techniques primarily focus on the evaluation of technical data, including historical price and trading volume information. Quantitative models sift through massive historical data to spot patterns and relationships, primarily identifying short-term mispricings and inefficiencies, which are subsequently leveraged for profit.

Traders practising statistical arbitrage adopt a long position in the undervalued security and a short one in the overvalued counterpart, betting on the convergence of prices. This technique results in a market-neutral strategy that depends less on overall market movements and more on the relative price fluctuations of the involved securities.

Various forms and timelines can accommodate statistical arbitrage, from high-frequency trading (HFT), where positions are held for exceedingly short periods, to medium-frequency trading (MFT) strategies, where positions could be held for days or weeks. The strategy is commonly used in MFT, where patterns are considered more reliable than HFT due to potential noise. The adoption of HFT, in turn, assists MFT by enabling the swift execution of trades, usually within milliseconds or microseconds. This is crucial, considering the targeted price discrepancies are typically minimal and fleeting.

Statistical arbitrage, while theoretically a low-risk strategy due to its market-neutral aspect, has risks. These can emanate from model overfitting, where predictions based on historical data may not sustain in the future, and from major market shocks that can disturb statistical relationships. Crucial updates hitting the market, such as earnings reports, unique dividends, or legal proceedings, are instances of stock market fluctuations that can disrupt short-term statistical correlations.

The strategy requires three critical factors: predictability, as success is unlikely without the ability to foresee price movements; volatility, since statistical arbitrage is ineffective in low volatility scenarios such as those observed in 2016 and requires price movement for success; and dispersion, dispersion in price and variance of ideas and viewpoints and dispersion of price movements. The strategy thrives on differing opinions; it needs some to believe that prices will continue their trend while others expect them to revert. Meaning price movements need to be different in relation to others. They can’t all move up or all down. Stat-arb requires one up and one down!

  • How quick is quick?

The definition of HFT or MFT and their respective typical holding periods is not set in stone. Over a decade ago, HFT was associated with intraday or quicker holding periods — seconds, minutes, and sub-second durations. In contrast, medium frequency was anything from days to weeks, typically with an average holding period of one or two weeks. This categorisation has evolved, which now includes HFT, Intraday, MFT, and Low-Frequency Trading (LFT).

HFT is ultra-fast trading measured in seconds, milliseconds and microseconds, the majority being sub-second. Intraday is anything from a minute upwards, 15 minutes, an hour, holding up to 6/8 hours to the end of the day. Nothing overnight, as the name suggests. MFT is days to weeks but can and does include intraday. Some strategies hold for minutes and hours out to a few days. The pure traditional stat-arb average holding period is within 1 to 3 weeks. LFT is, for me, anything holding longer than a month.

  • Capacity

Speed can’t be discussed without also linking to capacity and the two are intertwined. Trading strategies differ in their capacities based on their holding periods and execution speed.

Capacity in trading is the maximum volume of stocks, securities, or commodities a system can effectively handle without notably impacting the market price. This is affected by the market’s liquidity, the size of a trader’s orders, their risk tolerance, and capital base.

HFT and intraday strategies, operating at high speeds with short holding periods, typically have constrained capacities. This is due to their immediate impact on market prices. If you were to order $1bn worth of Apple stock suddenly, the market would move against you so fast that any alpha you’d predicted would disappear with the slippage and execution costs. HFT is more light-footed, in and out quickly in small amounts.

Conversely, MFT and LFT strategies have higher capacities. MFT allows larger orders to be gradually executed, reducing immediate market impact. LFT strategies, spanning a month or longer, can accommodate substantial order sizes as trades are distributed over longer periods, thus reducing market impact and increasing capacity even further.

  • Traditional stat-arb techniques
Photo by Ravi Sharma on Unsplash

Traditional techniques of stat arb encompass a variety of strategies. For instance, pairs trading, a widely used and straightforward method during the early days of statistical arbitrage, involves identifying pairs of highly correlated assets, such as stocks of companies within the same industry, like Coca-Cola and Pepsi. Traders closely watch these pairs, waiting for their price relationship to diverge from the historical norm. The corresponding strategy would involve purchasing the underperforming asset while short-selling the overperforming one in anticipation of their prices eventually reverting to the historical average.

Mean reversion is another prevalent stat-arb strategy. It operates on the premise that price movements of financial instruments are typically mean-reverting. This implies that when prices significantly stray from their historical averages, they are likely to revert to these averages over time. Traders who apply mean reversion strategies seek assets experiencing temporary price deviations and place trades on the expectation of these prices eventually returning to their historical levels. Mean reversion can be an independent strategy or can underpin pairs trading strategies.

Index arbitrage is another tactic to leverage price discrepancies between index futures and their underlying stocks. Suppose futures are priced higher or lower than the index. In that case, traders may engage in simultaneous long and short positions in the futures and the underlying stocks, profiting from the anticipated price convergence. Speed is crucial for this strategy.

Exchange-Traded Fund (ETF) arbitrage revolves around exploiting the price differences between an ETF and the underlying assets it represents. Traders can create or redeem ETF shares to benefit from the price difference and earn a risk-free profit.

It’s worth noting that the definition of statistical arbitrage and its included strategies aren’t universally standardised. Every quantitative portfolio manager contributes their unique interpretation, skills, and market perspectives. Some may adopt a two-week holding period, while others opt for just a few days. While some focus on cash US equities, others diversify their portfolio with equities and futures. This diversity is the essence of market dynamics and its zero-sum nature. The strategies above serve as a simple starting point for understanding the complexities of statistical arbitrage.

Having established a foundational understanding of statistical arbitrage and its historical context, we are now poised to delve into two prominent trends I’ve observed: the shortening of holding periods and the ascent of alternative data. We will explore the possible causes behind these trends and discuss their potential implications in finance.

  • Shortening of Holding Periods

Stat Arb strategies have witnessed a significant change in their holding periods over the years, transitioning from a typical duration of one to two weeks to predominantly intraday to few days timeframes. A combination of factors such as heightened competition, technological advancements, and the growing demand for rapid execution has influenced this shift.

With more market participants stepping into this field and alpha signal decay setting in, strategies have progressively adapted to shorter holding periods. It is rare to encounter a pure stat-arb strategy maintaining positions beyond two weeks. Most operate within one to five days, but an increasing proportion of these strategies gravitate towards intraday trading with minimal to no overnight holding.

This change has sparked a convergence of styles among different trading groups. Traditionally, HFT groups like Tower and Jump mainly focused on ultra-HFT strategies such as market making and index arbitrage. Their primary edge was speed, though they undoubtedly incorporated some form of statistical arbitrage. On the other hand, quant firms like WorldQuant and Cubist typically covered horizons of one to two weeks. Over time, these distinct approaches have melded. HFT groups have ventured into intraday and short-term strategies of a few days, while medium-frequency firms have also infiltrated the shorter-term intraday domain in their search for alpha.

As more participants occupy the stat-arb landscape, the alpha diminishes as it gets arbitraged away, migrating further towards shorter-term strategies. It’s important to note that statistical arbitrage doesn’t lend itself to longer-term holding periods. As it primarily relies on technical data like price and volume, the relevance of these factors tends to diminish as the investment horizon extends beyond a month. Beyond this timeframe, price movements become more like noise as fundamental data such as earnings reports, financial statements, and economic indicators impact prices more than statistical anomalies. It is in the MFT to LFT space where both fundamental and alternative data become more useful over technical data.

  • Factors driving this trend

The financial sector faces the twin challenges of escalating competition and the need for accelerated trade execution. Quantitative trading, in particular, is witnessing increased rivalry, with a burgeoning number of players applying quantitative techniques to exploit mispricings and inefficiencies. This surge in competition has led to temporary price discrepancies, once available over extended periods, becoming increasingly short-lived, necessitating rapid identification and execution of trades to take advantage of fleeting opportunities.

In the face of technological progress, high-frequency trading has asserted itself as a leading approach. Enhanced computing power and upgraded trading infrastructure have enabled market participants to process and analyse immense volumes of data and execute trades at unmatched velocities. HFT’s capacity to pinpoint minor price differences within fractions of seconds has significantly impacted financial markets, further diminishing holding periods. This ongoing technological evolution empowers quantitative portfolio managers and researchers to develop intricate statistical arbitrage strategies characterised by increasingly shorter holding periods.

  • Implications of shorter holding periods for quant PMs and researchers:

The transition towards shorter holding periods in stat arb carries several implications for quantitative portfolio managers and researchers. Foremost, the growing emphasis on speed and real-time decision-making mandates that traders remain updated with the newest technology and retain a state-of-the-art trading infrastructure. The arms race for alpha starts with the very technology you go into it with.

Secondly, as holding periods contract, research endeavours increasingly concentrate on discovering and capitalising on more detailed market patterns and inefficiencies. Techniques using Ai and deep learning are pushing pattern recognition to new heights.

Finally, the need for more sophisticated risk management and execution algorithms becomes critical to successfully negotiate the challenges tied to intraday trading and mitigate the impact of transaction costs on returns. There is no point in building the world’s best prediction machine if the market slips away by the time you react and execute, and your execution costs eat your alpha.

The shortening of holding periods was always evident in HFT trading. Groups battled to get quicker and quicker. They’d dig 1000km ditches straighter just to shave seconds off their execution between NYC and Chicago. They pay big money to co-locate their servers next to the exchange. They went from Java to C++, then to FPGA, and even microwave technology, all to be quicker. Now, the speed race is basically won by a couple of big HFT prop firms; as the cost of entry becomes far too great, a similar game is playing out in the MFT stat arb world with shortening holding periods.

However, unlike HFT, which primarily focuses on increasing speed, MFT is all about improving speed and predictive accuracy. While HFT hinges on speed and technical data, MFT leans on prediction and technical data. As competition increases in the MFT arena, it paves the way for the next significant trend — the surge of alternative data!

The Rise of Alternative Data

Alternative data utilisation has seen a rapid surge recently. The industry’s projected expenditure hit $1.7 billion in 2020, indicating a sevenfold jump from just five years before.

The proliferation of Internet usage, the growth of social media, the advent of the Internet of Things, and technological advances facilitating data creation and storage are key drivers behind the exponential increase in alternative data. By the end of 2025, The World Economic Forum believes we will create 400 times more data per day than in 2012…

Alternative data refers to information not readily available through conventional financial sources like financial statements, analyst reports, or market price data. These non-traditional data sources provide additional insights into market behaviour, enabling traders to identify unique trading opportunities and gain a competitive edge.

  • Examples of alternative data sources:
  • Social media sentiment: The advent of social media platforms like Twitter, Facebook, and Reddit has opened up a vast repository of user-generated content reflecting public sentiment towards companies, products, and market trends. By analysing social media sentiment, traders can gauge investor sentiment and anticipate potential market movements, allowing them to make more informed trading decisions. Or they ignore it and get squeezed out, like in Gamestop.
  • Satellite imagery: Satellite imagery provides valuable information about various economic activities, such as the level of construction, traffic patterns, and even the number of cars in a retailer’s parking lot. By analysing this data, traders can gain insights into a company’s performance, sales, or supply chain dynamics, which can, in turn, help inform their trading strategies.
  • Credit card transactions: Aggregated credit card transaction data offers insights into consumer spending habits, allowing traders to monitor trends and assess the health of specific companies, sectors, or the broader economy. This information can be especially valuable in predicting earnings announcements or understanding the competitive dynamics within a particular industry.
  • Web Traffic and App Usage Data: Data from website traffic, mobile application usage, and online platforms can offer insights into consumer behaviour, brand popularity, and potential sales trends. For example, increased visits to a retailer’s website or an uptick in app downloads could signal stronger-than-expected quarterly results. However, be wary of betting on page views to avoid having the next pets.com on your book!
  • News Sentiment Analysis: Natural language processing (NLP) techniques can analyse news articles and press releases to extract sentiment about a particular company or sector. Changes in sentiment can potentially be used to predict future price movements.
  • Geolocation Data: Data from smartphones and GPS devices can reveal patterns in consumer behaviour, such as foot traffic to a retail store or visits to a particular location, which can indicate a business’s popularity or potential sales. I know one strategy analysed the footfall into every Starbucks in the USA. Over the quarters, it could see in real-time whether there were more customers or fewer than the previous quarter, and so predict an earnings miss or beat.
  • Weather Data: Weather patterns can influence consumer behaviour and impact agriculture, retail, and energy operations. For example, hot weather could boost sales for a clothing retailer or impact crop yields for a farming company. An area Citadel supposedly excels in with a team of weather scientists predicting weather patterns.
  • Others include; E-commerce Data, Supply Chain Data, Public Records, Healthcare Data and more.
  • Alternative Data Usage

As the landscape of statistical arbitrage evolved, the focus expanded beyond traditional technical data, and market participants began exploring alternative data to enhance their trading strategies.

Alternative data has been used in the LFT and multi-factor trading space for many years. Here, alternative data nicely dovetails with fundamental data to enhance insights. But there is a growing trend of combining alternative and technical data in the MFT stat-arb world. Alternative data isn’t used in HFT. Knowing how many people walked into a Starbucks is ultimately pointless in ultra-fast trading.

Incorporating alternative data into statistical arbitrage has significantly diversified the strategies and techniques available to quant PMs and researchers in MFT. By tapping into these new data sources, traders can uncover new signals, develop more robust models, and improve their ability to generate alpha.

The use of alternative data has led to the creation of entirely new trading strategies and enhanced existing ones, allowing for the identification of more subtle and complex relationships between financial instruments and providing additional risk management opportunities.

However, acquiring alternative data doesn’t automatically generate an overflow of returns. As per Bloomberg, some data sets may be less immediately valuable. As Chris Longworth, a senior scientist at GAM Systematic in the U.K, notes, accessing better data is just part of the picture. Equally important is how this data is incorporated into models and how the resulting uncertainties are handled.

  • Factors driving this trend

While the surge can be attributed to enhanced data availability, advancements in analytics, and the pursuit of superior informational edge, I have an underlying personal theory linking this rise with the evolving dynamics within the teams implementing these strategies.

Photo by Ahmed Sellami on Unsplash

In quantitative trading, senior PMs hold the reins of time-tested stat-arb models. These models, honed over years or even decades, are akin to a Formula One car — carefully engineered and relentlessly fine-tuned for maximum performance. However, in this race for alpha, PMs are wary about sharing the blueprints of their “Formula One cars.” They understandably guard the intellectual property of their strategies, quite rightly limiting junior researchers’ access to avoid the risk of their proprietary methods being taken and replicated or used competitively elsewhere.

While the senior PMs are engrossed in enhancing their high-performance trading models, an interesting shift is observed among junior researchers. They are progressively focusing on novel datasets, especially in cash equities, spurred by the seniors’ justified protective stance over traditional stat-arb strategies. Keen to deliver alpha, junior researchers relish the opportunity to dive into the untapped potential of alternative data. Leveraging the latest modelling techniques and machine learning methodologies, they can extract valuable insights, creating a refreshing alternative to routine tasks like portfolio construction, data cleaning, risk analysis, or execution-type research work.

The trend is driven, in part, by the simple fact a senior PM doesn’t want their junior too close to their original strategy, so rather than give them historical price data sets, they give them alternative data sets to see if any value can be found.

Another factor is that, initially, low-frequency and multi-factor strategies found the most utility in alternative data, allowing for more accurate long-term forecasts. For example, satellite data determining the frequency of cars in Walmart parking lots could predict earnings and guide investment strategy. Traditional stat-arb focused on exploiting short-term market price inefficiencies in technical data and previously considered such alternative data irrelevant. However, this viewpoint is evolving.

In recent years, stat-arb is increasingly blending with alternative data. It’s like tweaking the Formula One car’s engine to run on a novel fuel mixture, aiming for short-term price inefficiencies while keeping an eye on potential long-term shifts. This trend emerged particularly during the low-volatility environment of 2016–2018, where groups sought to incorporate alternative data to supplement their work.

The real breakthrough lies in machine learning’s application in statistical arbitrage, particularly deep learning. These algorithms, capable of discerning complex patterns and relationships in data, facilitate the identification of temporary mispricings and market inefficiencies more efficiently. As deep learning algorithms’ insatiable appetite for data grows, we can expect even greater use of alternative data, pushing the boundaries of quantitative trading towards exciting, uncharted territories.

Conclusion

“It is not the smartest or strongest that survive, but the ones most adaptable to change that survive.”

The landscape of statistical arbitrage has evolved dramatically over the past decade, with the increased incorporation of alternative data sources and the shortening of holding periods as two key trends shaping the field in different ways. These changes have brought challenges and opportunities for quant PMs and researchers, requiring them to adapt and innovate to stay competitive.

Integrating alternative data into stat-arb strategies has expanded the range of techniques available to market participants, allowing them to uncover new trading opportunities and improve the overall effectiveness of their models. However, the shift towards shorter holding periods has also emphasised the need for speed, real-time decision-making, excellent execution and advanced risk management.

Statistical arbitrage, in its broadest sense, is here to stay. Traders will continuously seek statistical patterns that offer trading opportunities. However, the conventional understanding of stat-arb as a stand-alone strategy must be updated. Maintaining a competitive edge in today’s market necessitates incorporating additional data, employing new techniques, shorter holding periods, and leveraging advancements in machine learning and other technologies.

As the world of statistical arbitrage continues to change, it is crucial for quant PMs and researchers to remain agile and embrace the opportunities presented by this evolving landscape. By staying at the forefront of technology and continually refining their skills, they can harness the full potential of alternative data, develop cutting-edge strategies, and ultimately succeed in this highly competitive and dynamic industry.

Strategies using alternative data might now be described as ‘data arbitrage’.

What are you seeing?

I hope you like this and found it helpful.

If you’ve made it this far, please do me a massive favour and ‘Like’ / ‘clap.’

If you loved it, please share it with your network.

For more insights on the quant trading space, follow me on LinkedIn. https://www.linkedin.com/in/henry-booth-quantheadhunter/

Considering a move or need personalised guidance? Contact us today to take the next step in your career journey.

https://www.quantlink.co.uk/

Further Reading:

https://www.ft.com/content/d92e7f30-b4da-48b2-8fff-fd59fda751e1

https://www.fintechnews.org/how-hedge-funds-use-alternative-data-to-make-investments/

https://www.globenewswire.com/news-release/2023/04/19/2649810/0/en/Alternative-Data-Global-Market-Report-2023-Rising-Demand-From-Hedge-Funds-Bolsters-Sector.html

https://consent.yahoo.com/v2/collectConsent?sessionId=3_cc-session_9f07d6ae-5eae-4a42-9312-67512ca13ce1

https://www.cityam.com/after-ai-update-bloomberg-looks-to-boost-terminals-with-more-alternative-data/

https://www.bloomberg.com/news/articles/2022-12-16/quant-traders-are-big-winners-in-this-year-s-market-turmoil?leadSource=uverify%20wall

Thanks to R. Brown for his advice & feedback, as well as his ideas on the three things stat-arb needs.

Subscribe to DDIntel Here.

DDIntel captures the more notable pieces from our main site and our popular DDI Medium publication. Check us out for more insightful work from our community.

Register on AItoolverse (alpha) to get 50 DDINs

Support DDI AI Art Series: https://heartq.net/collections/ddi-ai-art-series

Join our network here: https://datadriveninvestor.com/collaborate

Follow us on LinkedIn, Twitter, YouTube, and Facebook.

--

--