This article describes our process for creating the HydroForecast seasonal model for an individual basin. The seasonal horizon looks into weeks and months ahead.
This model has a 10-day time step and has been implemented to provide forecasts out 12 months into the future.
Using our neural network approach, we train a “base” model across ~470 sites in North America. The goal of this model is to learn general hydrological principles at a variety of basins. This model is driven by historical weather from ECMWF’s ERA5, satellite observations of snow and vegetation from MODIS and VIIRS sensors, drainage characteristics, and snow water equivalent from NSIDC’s SNODAS. Note that this model is not trained on any weather forecast data; only on historical reanalysis from ERA5. In a later step, this allows us to force the model with different forecast traces and obtain a realistic and accurate hydrological response.
This base reanalysis model is the starting point for models designed for a specific basin. Using a machine learning technique called transfer learning, we adjust the base model parameters by “tuning” the model to a specific site. This tuned model generally shows improved accuracy across a broad range of metrics. After tuning, the model is very good at understanding the hydrological conditions of the basin (such as snowpack or soil moisture) and it knows how future weather translates into future flows.
To create seasonal scenarios operationally, we force the model with historical weather patterns. We call this the analog trace approach.
At each forecast issue time, the analog model combines its understanding of the current conditions of the basin with possible weather patterns which come from 40 years (1982-2021) of historical weather patterns from ERA5 and operational weather forecasts. The combination of these two pieces of information creates 40 scenarios that simulate realistic future flows, one for each historical year. Each of these 40 scenarios is represented by a full distribution of forecasted flows as described in the plot below.
To create our final forecast, we have two strategies that depend on the forecast horizon:
Our research shows that in some basins, we can improve the forecast in the first 30 days by creating what we call synthetic traces instead of analog traces. For forecast horizons over 30 days, we always use the analog trace approach described in the previous section.
In this approach, we generate 40 streamflow scenarios by forcing the model with 40 weather traces generated from the most recent GEFS weather forecast. Any aggregation of distributions is then done over all scenarios instead of just a selection of scenarios.
The traces shown in the “traces” view are the means from each scenario. Within the first 30 days, they can be analog or synthetic while in days 31+ they are always analog. The “distribution” view aggregates distributions of the scenarios (not only their means) for a more complete picture of the possible flows. This aggregation differs slightly between analog and synthetic traces and forecast horizons:
Note that the 90% confidence interval derived from the quantiles will almost certainly be wider than the range of trace means, with the exception of analog traces for days 1-30. For analog traces for days 1-30 we use only a subset of the scenarios so it’s possible that the most extreme scenarios will not be included in the final distribution.
In the final post-processing step, we replace our seasonal forecasted quantiles for days 1-10 with the quantiles forecasted by our short-term model. This is done to take advantage of the increased accuracy from our short-term model in those first 10 days. As a result of this, since our short-term model doesn’t produce traces, there can be a disconnect between the trace view and the distribution view in the first 10 days of our forecast.