Although the term alternative data or “alt-data” has been used in hedge funds and proprietary firms for quite some time, there has been a spike in interest by retail investors in recent years. Nowadays, having reliable and novel sources of alternative data has become a major concern of almost all professional market participants.
In this article, I’ll cover the main aspects of alternative data, and its usefulness for algorithmic trading, in addition to covering a broad list of use cases that researchers have implemented in the past.
What is alternative data in finance?
Alternative data is an umbrella term frequently used to describe datasets that are not traditional in nature, and used for the purpose of estimating the future price change of an asset or the performance of a company. Examples are geolocation data, satellite images, credit card transactions, and website usage metrics, among many others.
Alternative data has gained popularity in the last decade thanks to the decrease in costs of processing power and the overall increase in the rate that data gets created. Nowadays, almost all relevant participants in the quantitative finance industry have incorporated some type of alternative data.
Given the fierce competition, there’s is little value to be extracted from traditional data sources, such as historical price series or balance sheets, and proprietary firms have to creatively develop new sources of alpha in order to stay relevant.
Examples of alternative data in finance
Given the fact that “alternative data” is an umbrella term used for nontraditional data, there are countless types of alternative data. Hedge funds are known to have used mobile geolocation, credit card transactions, website usage metrics, AppStore downloads, satellite images, social media sentiment, GitHub activity, flight tracking, among other types of data in order to anticipate price changes in the stock market and predict the future performance of companies.
In the following section, I’ll discuss each of these data types in more detail, in addition to describing its common use cases.
Geolocation (foot traffic)
Mobile geolocation data is commonly used as a proxy for the number of people visiting physical retail stores. This is useful not only to estimate a company’s revenue before its release but also as a major explanatory variable for pairs trading strategies.
If properly sampled, the data can be extrapolated to all the retail stores of a given company and become an excellent estimator of the total number of people that potentially purchased a product or service from the company. Of course, in order to arrive at a final revenue projection, researchers have to also estimate the percentage of visitors who make a purchase, in addition to estimating the average purchase value.
Website usage and App Store Analytics
Website usage data is analogous to geolocation data in the sense that it is an excellent proxy to estimate the number of clients a company has over a period of time. In its simplest form, researchers use different methods to estimate the number of total daily visitors to websites.
Website usage data is not only useful for estimating the performance of companies that have online stores but also for estimating user growth of SaSS products. For example, it is common practice to use this alternative data to create statistical arbitrage strategies with streaming companies. In order to yield more reliable forecasts, researchers also follow the number of installations that each service has on different app stores, like Rokku Channels, Google Play, or Apple App Store.
Credit Card Transactions and Email Receipts
Credit card transaction data is one of the most reliable data sources for estimating the average purchase value of a company’s client. Financial market participants use this data to forecast revenue figures before they are disclosed.
When mixed with geolocation data, researchers are able to estimate with astounding precision the number of potential clients visiting retail stores and their average purchase value. Having said that, they still have to make an assumption regarding the percentage of visitors that become buyers.
Due to the difficulty of having access to this type of data, in addition to cleaning and organizing the raw transactions in order to extract value from them, most hedge funds resort to paying subscriptions to one of the multiple data vendors available.
Bloomberg’s Second Measure is a well know data vendor providing credit card transaction data.
Satellite images are a very valuable source of information for multiple asset classes. If creatively used, satellite imagery can be a source of very valuable information. For example:
- Parking Lots: by counting the number of cars on parking lots, researchers can estimate the number of visitors of shopping centers, which can be used as proxies of the performance of supermarkets, retail stores and malls.
- Crop Yields: if used in conjunction with live meteorological data, satellite images can be used to estimate world production of most commodities. If supply and demand curves for the commodity are modeled, its future price can be estimated.
- Factory Output: satellite images are used to estimate the rate of production of manufacturers all over the world. It is known that this data has been previously used to estimate the number of cars produced by Tesla. In fact, RS Metrics is an analytics company that provides this data for all major EV manufacturers.
This type of data has gained huge popularity in recent years thanks to the advances in machine learning and the increase in processing power of todays GPU’s. During the 90’s and the early 2000’s, satellite images could only be used for trading purposes by companies with vasts amounts of resources at their disposal.
If you’re interested in this type of data, check out OneAtlas by Airbus, whis is a reliable and well known source used in the hedge fund industry.
Social media sentiment
Social media platforms are scraped by many proprietary firms in order to estimate the sentiment towards companies, cryptocurrencies, and even commodities. As is the case with most alternative data sources, social media can be used in many ways:
- Estimate Momentum: by analyzing time series of mentions, opinions or sentiment of an asset, traders can use live data to anticipate trends in the market. This idea is so widely spread that there are countless tutorials on youtube that teach how to scrape Reddit threads in order to try to spot future price changes. Given the popularity of this datasource, there is little value to be extracted.
- Due to the fact that Twitter has become one of the most popular news sources, it is common practice to use its API (Application Programming Interface) in order to anticipate to major news events before everyone else. Algorithms are created to automatically go long or short on a given asset depending on whether the sentiment of a news event is positive or negative.
Flight data has been used to anticipate acquisitions made by the world’s most famous investor, Warren Buffet. Given the fact that said billionaire resides in a city with only 500,000 inhabitants (Omaha, NE), it was very likely that most private jets landing at the city’s airport were somehow related to a meeting with Mr. Buffet. By finding out who the passengers of the plane were before the deals became publicly known, hedge funds could react in anticipation of the news becoming public.
Github is the most popular platform used by programmers to work in teams and keep track of the changes in the source code of their projects. The master version of the source code is hosted on Github, and the changes are updated via “commits”.
Since most cryptocurrencies are open source and use Github as its version control system, proprietary firms can keep track of the activity in a project. The most common metrics used to estimate a project’s activity and growth are the number of collaborators, commits (changes) and stars (similar to an upvote or like).
The number of job openings available in a given region or by a specific company is used as a proxy for future growth. It not only is used for estimating the future performance of a company but also to infer its geographical expansions. Companies like LinkUp are well known for supplying this data to hedge funds.
Is trading with alternative data profitable?
Trading with alternative data can definitely be profitable depending on the underlying data, its reliability, and its scarcity. Researchers continually come up with novel approaches and ideas to exploit and extract value from alternative data.
Alternative data strategies suffer from alpha decay, which is nonother than the loss of predictive power of a model over time. This is caused by the adoption over time by more market participants. Given enough investors using the same underlying strategy, the alpha will converge to zero. Having said that, the fact that satellite images are already a well-known data source does not mean that quants won’t be able to ideate new profitable strategies based on them.
Renaissance Technologies, the most profitable hedge in the history of the American stock market, is well known for making extensive use of alternative datasets in order to achieve its average annualized returns of 66% between the years 1988 and 2018.
How is alternative data created?
Alternative data has been created for decades, but it is only in recent years that it started to play a major role in the financial industry. For example, credit card transactions data has been created every second for decades whether or not it was used for trading purposes.
Due to the technical requirements for storing cleaning and organizing huge amounts of alternative data, it is usually outside the scope of most trading firms to be able to assign their own in-house resources to generate the data. This is the reason behind the increase of alternative data vendors.
Having said that, there are also countless examples of hedge funds that scrape data from the internet in order to have their own proprietary datasets. The rationale behind creating and storing alternative datasets in-house is the increased probability of having an edge that is not available to the rest of the market participants. It is common practice to have a small team of backend developers in charge of scraping, cleaning, storing, and transforming the raw data points into actionable information that can be incorporated into the firm’s trading strategies.
As we can see, there are countless opportunities for incorporating alternative data into trading strategies. In addition, coming up with novel and creative approaches to publicly known and available data is also a source for generating alpha.
Trading only based on OHLCV data, and a few indicators based on them will most probably not be enough to find reliable and robust strategies. The spike in interest for alternative data and the subsequent search for new sources will only keep increasing in the years to come.