Having access to reliable, clean, and live data is one of the most important requirements for developing profitable strategies. Although many data vendors only cater to big hedge funds with generous yearly data budgets that are outside of the possibilities of small proprietary firms and retail investors, there are countless fintech companies that have emerged in the last couple of years that offer investment-grade data to the general public.
In this article, I’ll go through the most relevant and interesting data sources available to retail investors. Although most of these also have paid subscriptions, I only listed the ones that also have good free subscriptions.
I organized the data into categories, but some sources offer more than one type of data and could very well have been classified in another category.
In summary, these are the best data sources freely available for algorithmic trading:
When it comes to stock market data Polygon.io, Alpaca, Yahoo Finance, The Kenneth French Data Library, Damodaran Data, SimFin, and EODData are the most relevant vendors available. In the case of cryptocurrencies, the most popular providers are Binance, Coinbase, and CoinmarketCap. For economic data, the most reliable statistics are available in FRED, The US Bureau of Labor Statistics, The Bureau of Economic Analysis, and the World Bank. Last but not least, Finnhub, Tiingo, Nasdaq, FlightRadar, and Reddit are websites to consider when it comes to alternative data.
Stock Market Data
Having easy access to up-to-date market data for all major exchanges around the world is almost a requirement for setting up an efficient research pipeline. Although the following list is not exhaustive, it sure is extensive:
Polygon.io is one of the most popular data vendors used by programmers in the financial industry. Because of its API, good documentation, and access to stocks, options, forex, and cryptos, Polygon is definitely an option to keep in mind.
- Website: Link
Alpaca is a relatively new stockbroker that specializes in catering to the algorithmic trading community. It has a very simple user interface, but a very powerful, extensively tested, and well-documented API. It not only offers up to 1Minute intraday data for all US-based exchanges but also for an everyday increasing number of cryptocurrencies.
It also offers quotes and trades streaming via WebSockets and a few alternative data streams, like for example news data from Benzinga.
Due to its popularity and its ease of use, yahoo finance is the most used source of historical data used by students and hobbyists. Although not always reliable, also offers fundamental data, option chains, statistics, and sentiment data for most companies. Yahoo Finance makes it very easy to download all the data in CSV format. Additionally, although it is not officially maintained, it has an open API that can be accessed by libraries available in most programming languages.
Kenneth French Data Library
The Kenneth French Data Library offers up to date data regarding the Fama-French 3 Factor Model, in addition to having information regarding related models derived from the seminal paper that revolutionized the financial markets in 1993. It does not offer programmatic access
- Website: Link
Aswath Damodaran is one of the most prestigious professors and professionals when it comes to valuation. You can find updated information regarding betas (both levered and unlevered), cost of capital, cash flow and growth rate estimation, multiples valuation, and option pricing models. The website does not offer programmatic access.
- Website: Link
SimFin offers reliable and up-to-date fundamental information. It is very easy to use, has an intuitive user interface, and even offers an API. Additionally, non-programmers can download SimFin’s data in bulk and load it into Excel.
- Website: Link
EODData is a service that allows users to download mainly End Of Day data from a broad list of companies. Although intraday prices are behind a paywall, the free access to daily prices is still advantageous due to its reliability.
- Website: Link
Binance not only is the most liquid cryptocurrency exchange in the world, but also has one of the most reliable and functional API’s in the crypto available community. In addition to having intraday data and websocket streaming, its API also allows for accessing tick data.
Much like Binance, Coinbase offers an API with intraday cryptocurrency data and Websockets for live access to market data.
Coinmarketcap is widely known due to its breath of information available. It not only has a very nice user interface for exploring cryptocurrencies, exchanges and blocks in the blockchain, but also allows for programmatic access via its API. It has useful tools (like price conversion), listings of all major exchanges, global market data, and blockchain exploration, amongst many more. Although it also has a paid version, the free version is also worth it.
Alternative data has had a spike in interest in recent years, and for good reason. Hedge funds and proprietary firms have introduced novel datasets, even ones that seem to be unrelated to financial markets to the naked eye. The following list goes through a few reliable and interesting alternative data vendors.
Finnhub is one of the most interesting fintech data startups that I’m aware of right now. It sources its data from a broad repertoire of data vendors and makes it accessible via a single, consistent, and reliable API.
In addition to offering institutional-grade information regarding stock markets, bonds, ETFs, fundamentals, and estimates, they also offer a broad range of alternative data: Social sentiment, USPTO Patents, H1-B Visa Applications, and Senate Lobbying, among many more.
In addition to offering traditional datasets like fundamental data, and end of day prices, Tiingo shines when it comes to their News API. It covers more than 65.000 equity tickers, 4.000 cryptocurrencies, and 75 currencies and offers historical data dating back to 1995. The free plan allows for 500 requests per hour, which is more than enough for most usecases.
Nasdaq Data has one of the most extensive data catalogs available in the industry. Although most of the datasets are behind a paywall, their interface allows to filter them and only keep the ones that are free to access. Since the catalog is simply too big to describe, it’s best if you go ahead and take a look at it yourself.
- Data Catalog: Link
Live flight data might seem like an odd addition to this list at first, but hedge funds have used this type of data in the past. By keeping an eye on private jet landings in Omaha (Nebraska), savvy traders could anticipate potential deals made by Warren Buffet before they were disclosed.
- Website: Link
Since the r/WallStreetBets event with GME, everyone is aware of Reddit’s relevance when it comes to investor sentiment. It is also good to know that Reddit offers a generous API, which makes the task of estimating trends a lot easier.
- API: Link
The Federal Reserve Bank of St. Louis Database is one of the most widely used sources for economic research and is widely cited in prestigious journals. It not only offers a wide range of datasets for free but they are maintained and kept up to date. It has extensive information regarding labor statistics, census data, economic activity, commodity prices, international trade, and interest rates, just to name a few.
Unlike many other public data vendors, FRED offers an easy-to-use Excel Plugin. Programmers can also access the data programmatically via an API.
US Bureau of Labor Statistics
As you might have guessed, this source offers very detailed employment and wage data. Although rather limited and poorly documented, you can also access the information via an API
Bureau of Economic Analysis
BEA holds interesting data from multiple categories, such as consumer spending, income and saving, and investment in fixed assets, among others. It even has a “special topics” category that gathers data from arts and culture economic activity, intellectual property, space economy, marine production, and many others.
As is the case with other public sector databases, the BEA offers an API, but its documentation is a little bit outdated by today’s standards.
The WB is one of the most widely cited data sources in econometric journals. Although not all datasets are updated on a regular basis, the sheer number of data freely available is enough to keep have it as one of the go-to sources for modeling. It features a nice user interface for time series plotting, in addition to having an API.