Forecasting News Coverage
Forecasting is a journey. We learned a lot along the way. We made mistakes (we still do). In our blog we try to document those learnings as they happen.
In Predicto, it’s important to iterate and experiment with different model architectures, data, ideas — fast. This lead us to develop a no-code forecasting tool that allows anyone to experiment with Deep Learning forecasting using their own data. This post is a continuation of Continuous Integration & Experimentation.
Today, we’ll go through an example that showcases the simplicity of datafloat.ai — the No-Code AI Forecasting Platform and how easy it makes it for anyone to create Deep Learning explainable forecasting models with a few clicks — no deep learning experience required. Datafloat is the platform that powers hundreds of models used by Predicto and generates hundreds of forecasts daily.
Problem definition
Our goal for this post is to forecast the expected news coverage of Nasdaq companies. In Predicto, we gather news for all Nasdaq-100 companies daily and perform news analysis. We believe it would be useful to forecast the total number of articles released for a company in the next 2 weeks. It might help us understand when something out of the ordinary happens. We also believe that in order to achieve this, we can use data such as: past daily articles released, stock price and stock trading volume. We do have all this data in our database, and we update it daily. For our case study, we’ll focus on Tesla.
Datafloat solution
Let’s see how the process of creating a new Deep Learning model looks like in Datafloat.
Step 1. We select the type of model we want to train. In our case we want to generate forecasts.
Step 2. We select the data source. Here we have the option to provide either CSV files, or to provide a database connection string. In this case we choose to use our Predicto database. Datafloat will connect and load the database schema which will allow us to generate a model template from any database table combination. Using a database makes it easy to automate future daily forecasts as our data update daily.
Step 3. Now we are ready to create our model template. Model templates can be used to generate one ore more models with different parameters. Let’s see how this is done.
In the first section (Step 3a below), we provide some generic model parameters. You can leave the defaults here or you can tweak them. The most important part is the Lookback horizon
and Lookahead horizon
parameters. Those parameters indicate the samples size and the forecast size. So a lookback horizon = 45 and a lookahead horizon = 15 means that we want our deep learning model to use the last 45 days in order to forecast the next 15 days (we are talking about days because samples in our case are represented in days — it could be hours/minutes/years or whatever, depending on sampling rate of our data).
The second section is where the magic happens (Step 3b below). Datafloat loads our database schema and we are able to select the tables we want to use, and of course our features just be clicking on them.
For every table we choose to use, we need to define:
- a
TimeIndex
column which will be used to join our tables and to construct our data sets (required). - one or more
Feature
columns, or aPrediction
column (required). - a filtering clause to keep only the data we are interested in (optional).
So in the example below, we are instructing Datafloat to:
- Use data from table
ArticleDailyStats
whereStock=TSLA
and date is after2020–01–01
. - Use data from table
StockHistoryData
whereStock=TSLA
and date is after2020–01–01
. - Forecast the
TotalArticles
column by using featuresTotalArticles
,SClose
,SVolume
.
We can add any other features (now or later) that we think might have a predictive power for forecasting the total number of articles, but for now we’ll keep it simple.
Datafloat is responsible for fetching all this data to train our model, and generate new forecasts when the time comes.
Step 4. We are now ready to create a model from our template! By clicking “Create Model”, Datafloat will pick up the request and will start the model’s training on the cloud. For more details on how model training happens on the cloud, you can read Continuous Integration & Experimentation.
Depending on the size of our data and our model’s layers, it might take from a few minutes to several hours. When ready, we are notified and we are now able to inspect it. We are presented with graphs that show how our model fits on training data, and how it performs on the validation set that was kept aside.
If everything looks good, then we are ready to use our deep learning model and actually do some forecasting! Which leads us to…
Step 5. Schedule and inspect a forecast. The only thing we need to do here is to give a date. Datafloat will automatically connect to the database, extract the required input data, feed it to our trained model and then generate a forecast.
Datafloat also provides an API that allows to automate new forecasts to run in scheduled times.
Once the forecast is scheduled, Datafloat will retrieve our model, will connect to our database and extract the input data (45 days of news coverage, stock price and volume), will feed it to our model and generate the forecast for the next 15 days, as we instructed it while designing our model in step 3. Every forecast comes with confidence intervals that indicate how confident our model is about this prediction.
Datafloat also generates an explainability section for every forecast. This section allows us to get an idea of the features and time periods that influenced the model’s forecast. For a detailed explanation of how this works, please read Explaining Financial Deep Learning Forecasts | by Predicto | The Startup | Medium. In our forecast, it appears that the biggest influence came from past total articles pattern and interestingly some influence from the stock’s trading volume (see Step 5c).
Step 6. As your data evolves and new measurements are saved to our database, Datafloat will update every forecast’s evaluation automatically with error metrics (see Step 6a below). So after a few days we can revisit our forecast to see how accurate our prediction was.
Datafloat also offers evaluation metrics per model over time which allows us to monitor overall model’s performance across all generated forecasts. For more details on this, please refer to Monitoring Forecasting Models at Scale and Understanding Deep Learning Forecasts over Time. The platform also allows scheduled model retraining as our data evolves.
What’s next?
We hope you enjoyed this article. This was a brief introduction to Datafloat.ai, the No-Code AI Forecasting Platform. Datafloat is a platform that was created to satisfy the increasing demands of Predicto. We quickly realized that it can be used to forecast anything at scale and it is also the perfect tool for model experimentation. We plan to keep extending it with new functionality and new deep learning tasks.
For any question you might have, leave a comment or get in touch!
Stay safe and see you soon!