This project is a machine learning model that uses PyTorch to analyze and trade stocks using real-time market data with Alpaca API. The model was trained using 2 year minute by minute historical data scraped from Polygon.io. This model is specifically optimized for short-term trades (day-trading) on NASDAQ.
This ML Stock Trading Script includes a data_scraper directory which includes the code used to collect and organize 2 year minute by minute historical data for all stocks on NASDAQ. Each stock's historical data is stored in a csv file under it's own directory in historcial_data.
Using the historcal data we then calculate the daily ATR (Average True Range) of each stock, which allows us to determine the stocks with the highest volatility. By choosing the stock with highest volatility we ensuring the stock is suitable for short-term trading.
After forking this repository, the first thing you should do is pip install all packages in requirements.txt. This can be acomplished by running the following command in the terminal.
$ pip install -r requirements.txtAfter all the requirements have been sucessfully installed create an secrets.env file within the data_scraper directory and include the following code and replace <API_KEY> with your API key from Polygon.io.
POLYGON_API_KEY = "<API_KEY>"Now cd into the data_scrapper directory. When the program, polygon_scraper.py, is run it will create a csv file with following naming format: SYMBL-startDate-endDate.csv. To run the program run the following command in the terminal.
$ python3 polygon_scraper.pyNote there are multiple directories which each having a dedicated purpose. There are a total of 2 directories within this project.
The data_scraper directory includes the code used to collect and organize 2 year minute by minute historical data for all stocks on NASDAQ.
The real_time_data directory includes three files, config.py, news_stream.py, and stock_stream.py. The config.py file includes link to the Alapaca API's websocket URL as well as the Alpaca API keys used by news_stream, and stock_stream. The news_stream.py file uses websockets to stream all market related & financial news from Benzinga.com in real-time. Benzinga is a financial news outlet that is brokers' primary source of market news. The stock_stream.py file uses websockets to stream all real-time data for NASDAQ. It is important to note that the program only runs when the markets are open (9:30 am to 4:00 pm EST) otherwise, the program will simply give an warning: WARNING: NASDAQ Market is CLOSED.
The tickers_dataset directory includes multiple files, Ticker_Data_Filtering.py, Filtered_NASDAQ_Listings.csv, Raw_NASDAQ_Listings.csv, Symbols_First_Half.csv, and Symbols_Second_Half.csv. The Ticker_Data_Filtering.py file processes the Raw_NASDAQ_Listings.csv to filter out all unessceary data and creates a new csv file Filtered_NASDAQ_Listings.csv. This Filtered_NASDAQ_Listings.csv is now split in half into two seperate files for processing: Symbols_First_Half.csv, and Symbols_Second_Half.csv.