Stock Ticker Symbols for Machine Learning and Artificial Intelligence

I received a few questions from multiple followers regarding a working list of all traded stocks. This is generally followed by an explanation that they want to apply machine-learning or artificial intelligence to predict stock prices. My initial response is always, that sounds like fun, let's start with a data-set that is a bit smaller first and ideally one that you are familiar with.

Capturing ALL traded stocks would become unwieldy rather quickly. Most stock data services have limits even on their paid plans. Using funnhub.io as an example, their most expensive plan at \$500 a month only allows for 300 API calls a minute. That is 216,000 API calls a day to retrieve data and a quick look at gurufocus.com indicates around 120,000 stocks, and I'm not certain that is all of them. At 300 API call an hour it would take half a day to get the latest information and most likely you would want to run multiple API calls on at least some of those symbols. The only way you are going to get faster data than that is to spend many thousands of dollars and if you have that kind of money you wouldn't be asking me how to get all the ticker symbols.

As stated above, let's make something a bit more reasonable. Everyone is familiar with the New York Stock Exchange (NYSE), and whether they know it by its actual name or not, the National Association of Securities Dealers Automated Quotation System (Nasdaq). So let's start with these, and the very first thing you are going to need is a list of all their associated tickers symbols.

If you are just getting started with the stock market and do not yet have a Robinhood account, you can collect a free stock from Robinhood by following this link. Robinhood.com

Now we could scrape this data and indeed if we wanted to constrain our search to the S&P 500 only we could get its list of symbols in three lines of python

import pandas as pd
table=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = table[0]

df.head()
Symbol Security SEC filings GICS Sector GICS Sub-Industry Headquarters Location Date first added CIK Founded
0 MMM 3M Company reports Industrials Industrial Conglomerates St. Paul, Minnesota 1976-08-09 66740 1902
1 ABT Abbott Laboratories reports Health Care Health Care Equipment North Chicago, Illinois 1964-03-31 1800 1888
2 ABBV AbbVie Inc. reports Health Care Pharmaceuticals North Chicago, Illinois 2012-12-31 1551152 2013 (1888)
3 ABMD ABIOMED Inc reports Health Care Health Care Equipment Danvers, Massachusetts 2018-05-31 815094 1981
4 ACN Accenture plc reports Information Technology IT Consulting & Other Services Dublin, Ireland 2011-07-06 1467373 1989

Scraping the entire list would be quite cumbersome however as most websites that maintain an entire list break that list into multiple pages. Interestingly enough the website nasdaqtrader.com keeps a list of all symbols.

df = pd.read_csv('ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqtraded.txt', sep='|', index_col=False)

df.head()
Nasdaq Traded Symbol Security Name Listing Exchange Market Category ETF Round Lot Size Test Issue Financial Status CQS Symbol NASDAQ Symbol NextShares
0 Y A Agilent Technologies, Inc. Common Stock N N 100.0 N NaN A A N
1 Y AA Alcoa Corporation Common Stock N N 100.0 N NaN AA AA N
2 Y AAA Listed Funds Trust AAF First Priority CLO Bond... P Y 100.0 N NaN AAA AAA N
3 Y AAAU Perth Mint Physical Gold ETF P Y 100.0 N NaN AAAU AAAU N
4 Y AACG ATA Creativity Global - American Depositary Sh... Q G N 100.0 N N NaN AACG N

In order to use this list we need to know some things about it. First lets understand the Listing Exchange Column. This column gives us information regarding what exchange the stock was actually listed on. We can interpret this information using the below list.

  • A = NYSE MKT
  • N = New York Stock Exchange (NYSE
  • P = NYSE ARCA
  • Z = BATS Global Markets (BATS)
  • V = Investors' Exchange, LLC (IEXG)

Since we are specifically looking at stocks, we need to remove our ETF's.

no_ETFs = df['Listing Exchange']=='N'
df = df[no_ETFs]
df.head()
Nasdaq Traded Symbol Security Name Listing Exchange Market Category ETF Round Lot Size Test Issue Financial Status CQS Symbol NASDAQ Symbol NextShares
0 Y A Agilent Technologies, Inc. Common Stock N N 100.0 N NaN A A N
1 Y AA Alcoa Corporation Common Stock N N 100.0 N NaN AA AA N
9 Y AAIC Arlington Asset Investment Corp Class A (new) N N 100.0 N NaN AAIC AAIC N
10 Y AAIC$B Arlington Asset Investment Corp 7.00% N N 100.0 N NaN AAICpB AAIC-B N
11 Y AAIC$C Arlington Asset Investment Corp 8.250% Seies C... N N 100.0 N NaN AAICpC AAIC-C N

Next, we need to filter out all the test securities using the Test Issue Column.

no_Tests = df['Test Issue']=='N'
df = df[no_Tests]
df.head()
Nasdaq Traded Symbol Security Name Listing Exchange Market Category ETF Round Lot Size Test Issue Financial Status CQS Symbol NASDAQ Symbol NextShares
0 Y A Agilent Technologies, Inc. Common Stock N N 100.0 N NaN A A N
1 Y AA Alcoa Corporation Common Stock N N 100.0 N NaN AA AA N
9 Y AAIC Arlington Asset Investment Corp Class A (new) N N 100.0 N NaN AAIC AAIC N
10 Y AAIC$B Arlington Asset Investment Corp 7.00% N N 100.0 N NaN AAICpB AAIC-B N
11 Y AAIC$C Arlington Asset Investment Corp 8.250% Seies C... N N 100.0 N NaN AAICpC AAIC-C N

This list provides various differences in securities by adding a period or dollar sign and denoting this variance after that indicator. In the list above you can see that AAIC is list 3 times. The second and third have \$B and \$C respectively. This indicates that AAIC has multiple classes, normal, class B, and class C stocks. These types of annotation can also denote common stock, preferred stock, etc. For this, we really only need to look at the base ticker symbol.

removeDollar = df['Symbol'].str.contains('$', regex=False) == False
df = df[removeDollar]

removePeriod = df['Symbol'].str.contains('.', regex=False) == False
df = df[removePeriod]

df.head()
Nasdaq Traded Symbol Security Name Listing Exchange Market Category ETF Round Lot Size Test Issue Financial Status CQS Symbol NASDAQ Symbol NextShares
0 Y A Agilent Technologies, Inc. Common Stock N N 100.0 N NaN A A N
1 Y AA Alcoa Corporation Common Stock N N 100.0 N NaN AA AA N
9 Y AAIC Arlington Asset Investment Corp Class A (new) N N 100.0 N NaN AAIC AAIC N
15 Y AAN Aarons Holdings Company, Inc. Common Stock N N 100.0 N NaN AAN AAN N
18 Y AAP Advance Auto Parts Inc Advance Auto Parts Inc W/I N N 100.0 N NaN AAP AAP N

Now we need to put the ticker symbols into a list.

symbols = df['Symbol'].to_list()
print(len(symbols))
symbols[:10]
2537





['A', 'AA', 'AAIC', 'AAN', 'AAP', 'AAT', 'AB', 'ABB', 'ABBV', 'ABC']

We now have a rather large list of ticker symbols and better yet, we can parse information from finnhub.io under 10 minutes rather than taking 12 hours. Yahoo finance has a 2000 API hit per hour limit, so I am only going to grab the first 100 ticker symbols.

First we are going to write a function to retrieve the last close price of each stock in our list.

def getLastAdjustedClose(symbol):
    import datetime
    end = datetime.date.today()
    start = datetime.date.today() - datetime.timedelta(days=7)
    df = pdr.DataReader(symbol, "yahoo", start, end).reset_index()
    return(df['Adj Close'].tail(1))

Next, I am going to loop through our entire list and get the last adjusted close price

import pandas_datareader as pdr   
import datetime

symbol_dict = dict()
for symbol in symbols[:100]:
    adjclose = getLastAdjustedClose(symbol)
    symbol_dict[symbol] = adjclose

pd.DataFrame.from_dict(symbol_dict).round()
    
A AA AAIC AAN AAP AAT AB ABB ABBV ABC ... ALG ALK ALL ALLE ALLY ALSN ALTG ALUS ALV ALX
4 111.0 19.0 3.0 63.0 144.0 29.0 32.0 27.0 101.0 100.0 ... 136.0 48.0 100.0 113.0 29.0 40.0 9.0 10.0 89.0 287.0

1 rows × 100 columns

As always, open an interactive version of this notebook at binder.org

And don't forget to claim your free stock