Stock Ticker Symbols for Machine Learning and Artificial Intelligence

I received a few questions from multiple followers regarding a working list of all traded stocks. This is generally followed by an explanation that they want to apply machine-learning or artificial intelligence to predict stock prices. My initial response is always, that sounds like fun, let's start with a data-set that is a bit smaller first and ideally one that you are familiar with.

Capturing ALL traded stocks would become unwieldy rather quickly. Most stock data services have limits even on their paid plans. Using funnhub.io as an example, their most expensive plan at \$500 a month only allows for 300 API calls a minute. That is 216,000 API calls a day to retrieve data and a quick look at gurufocus.com indicates around 120,000 stocks, and I'm not certain that is all of them. At 300 API call an hour it would take half a day to get the latest information and most likely you would want to run multiple API calls on at least some of those symbols. The only way you are going to get faster data than that is to spend many thousands of dollars and if you have that kind of money you wouldn't be asking me how to get all the ticker symbols.

As stated above, let's make something a bit more reasonable. Everyone is familiar with the New York Stock Exchange (NYSE), and whether they know it by its actual name or not, the National Association of Securities Dealers Automated Quotation System (Nasdaq). So let's start with these, and the very first thing you are going to need is a list of all their associated tickers symbols.

If you are just getting started with the stock market and do not yet have a Robinhood account, you can collect a free stock from Robinhood by following this link. Robinhood.com

Now we could scrape this data and indeed if we wanted to constrain our search to the S&P 500 only we could get its list of symbols in three lines of python

import pandas as pd
table=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = table[0]

df.head()

	Symbol	Security	SEC filings	GICS Sector	GICS Sub-Industry	Headquarters Location	Date first added	CIK	Founded
0	MMM	3M Company	reports	Industrials	Industrial Conglomerates	St. Paul, Minnesota	1976-08-09	66740	1902
1	ABT	Abbott Laboratories	reports	Health Care	Health Care Equipment	North Chicago, Illinois	1964-03-31	1800	1888
2	ABBV	AbbVie Inc.	reports	Health Care	Pharmaceuticals	North Chicago, Illinois	2012-12-31	1551152	2013 (1888)
3	ABMD	ABIOMED Inc	reports	Health Care	Health Care Equipment	Danvers, Massachusetts	2018-05-31	815094	1981
4	ACN	Accenture plc	reports	Information Technology	IT Consulting & Other Services	Dublin, Ireland	2011-07-06	1467373	1989

Scraping the entire list would be quite cumbersome however as most websites that maintain an entire list break that list into multiple pages. Interestingly enough the website nasdaqtrader.com keeps a list of all symbols.

df = pd.read_csv('ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqtraded.txt', sep='|', index_col=False)

df.head()

	Nasdaq Traded	Symbol	Security Name	Listing Exchange	Market Category	ETF	Round Lot Size	Test Issue	Financial Status	CQS Symbol	NASDAQ Symbol	NextShares
0	Y	A	Agilent Technologies, Inc. Common Stock	N		N	100.0	N	NaN	A	A	N
1	Y	AA	Alcoa Corporation Common Stock	N		N	100.0	N	NaN	AA	AA	N
2	Y	AAA	Listed Funds Trust AAF First Priority CLO Bond...	P		Y	100.0	N	NaN	AAA	AAA	N
3	Y	AAAU	Perth Mint Physical Gold ETF	P		Y	100.0	N	NaN	AAAU	AAAU	N
4	Y	AACG	ATA Creativity Global - American Depositary Sh...	Q	G	N	100.0	N	N	NaN	AACG	N

In order to use this list we need to know some things about it. First lets understand the Listing Exchange Column. This column gives us information regarding what exchange the stock was actually listed on. We can interpret this information using the below list.

A = NYSE MKT
N = New York Stock Exchange (NYSE
P = NYSE ARCA
Z = BATS Global Markets (BATS)
V = Investors' Exchange, LLC (IEXG)

Since we are specifically looking at stocks, we need to remove our ETF's.

no_ETFs = df['Listing Exchange']=='N'
df = df[no_ETFs]
df.head()

	Nasdaq Traded	Symbol	Security Name	Listing Exchange	ETF	Round Lot Size	Test Issue	Financial Status	CQS Symbol	NASDAQ Symbol	NextShares
0	Y	A	Agilent Technologies, Inc. Common Stock	N	N	100.0	N	NaN	A	A	N
1	Y	AA	Alcoa Corporation Common Stock	N	N	100.0	N	NaN	AA	AA	N
9	Y	AAIC	Arlington Asset Investment Corp Class A (new)	N	N	100.0	N	NaN	AAIC	AAIC	N
10	Y	AAIC$B	Arlington Asset Investment Corp 7.00%	N	N	100.0	N	NaN	AAICpB	AAIC-B	N
11	Y	AAIC$C	Arlington Asset Investment Corp 8.250% Seies C...	N	N	100.0	N	NaN	AAICpC	AAIC-C	N

Next, we need to filter out all the test securities using the Test Issue Column.

no_Tests = df['Test Issue']=='N'
df = df[no_Tests]
df.head()

	Nasdaq Traded	Symbol	Security Name	Listing Exchange	ETF	Round Lot Size	Test Issue	Financial Status	CQS Symbol	NASDAQ Symbol	NextShares
0	Y	A	Agilent Technologies, Inc. Common Stock	N	N	100.0	N	NaN	A	A	N
1	Y	AA	Alcoa Corporation Common Stock	N	N	100.0	N	NaN	AA	AA	N
9	Y	AAIC	Arlington Asset Investment Corp Class A (new)	N	N	100.0	N	NaN	AAIC	AAIC	N
10	Y	AAIC$B	Arlington Asset Investment Corp 7.00%	N	N	100.0	N	NaN	AAICpB	AAIC-B	N
11	Y	AAIC$C	Arlington Asset Investment Corp 8.250% Seies C...	N	N	100.0	N	NaN	AAICpC	AAIC-C	N

This list provides various differences in securities by adding a period or dollar sign and denoting this variance after that indicator. In the list above you can see that AAIC is list 3 times. The second and third have \$B and \$C respectively. This indicates that AAIC has multiple classes, normal, class B, and class C stocks. These types of annotation can also denote common stock, preferred stock, etc. For this, we really only need to look at the base ticker symbol.

removeDollar = df['Symbol'].str.contains('$', regex=False) == False
df = df[removeDollar]

removePeriod = df['Symbol'].str.contains('.', regex=False) == False
df = df[removePeriod]

df.head()

	Nasdaq Traded	Symbol	Security Name	Listing Exchange	ETF	Round Lot Size	Test Issue	Financial Status	CQS Symbol	NASDAQ Symbol	NextShares
0	Y	A	Agilent Technologies, Inc. Common Stock	N	N	100.0	N	NaN	A	A	N
1	Y	AA	Alcoa Corporation Common Stock	N	N	100.0	N	NaN	AA	AA	N
9	Y	AAIC	Arlington Asset Investment Corp Class A (new)	N	N	100.0	N	NaN	AAIC	AAIC	N
15	Y	AAN	Aarons Holdings Company, Inc. Common Stock	N	N	100.0	N	NaN	AAN	AAN	N
18	Y	AAP	Advance Auto Parts Inc Advance Auto Parts Inc W/I	N	N	100.0	N	NaN	AAP	AAP	N

Now we need to put the ticker symbols into a list.

symbols = df['Symbol'].to_list()
print(len(symbols))
symbols[:10]

2537





['A', 'AA', 'AAIC', 'AAN', 'AAP', 'AAT', 'AB', 'ABB', 'ABBV', 'ABC']

We now have a rather large list of ticker symbols and better yet, we can parse information from finnhub.io under 10 minutes rather than taking 12 hours. Yahoo finance has a 2000 API hit per hour limit, so I am only going to grab the first 100 ticker symbols.

First we are going to write a function to retrieve the last close price of each stock in our list.

def getLastAdjustedClose(symbol):
    import datetime
    end = datetime.date.today()
    start = datetime.date.today() - datetime.timedelta(days=7)
    df = pdr.DataReader(symbol, "yahoo", start, end).reset_index()
    return(df['Adj Close'].tail(1))

Next, I am going to loop through our entire list and get the last adjusted close price

import pandas_datareader as pdr   
import datetime

symbol_dict = dict()
for symbol in symbols[:100]:
    adjclose = getLastAdjustedClose(symbol)
    symbol_dict[symbol] = adjclose

pd.DataFrame.from_dict(symbol_dict).round()

	A	AA	AAIC	AAN	AAP	AAT	AB	ABB	ABBV	ABC	...	ALG	ALK	ALL	ALLE	ALLY	ALSN	ALTG	ALUS	ALV	ALX
4	111.0	19.0	3.0	63.0	144.0	29.0	32.0	27.0	101.0	100.0	...	136.0	48.0	100.0	113.0	29.0	40.0	9.0	10.0	89.0	287.0

1 rows × 100 columns

Stock Ticker Symbols for Machine Learning and Artificial Intelligence

Jeremiah Nelson

Jeremiah Nelson

As always, open an interactive version of this notebook at binder.org

And don't forget to claim your free stock

Quick and Simple Stock Technical Analysis using Jupyter

Using Docker to Edit your Ghost Blog Theme

Quick and Simple Stock Technical Analysis using Jupyter

As always, open an interactive version of this notebook at binder.org

And don't forget to claim your free stock

Subscribe to Make Time Labs

Subscribe to Make Time Labs