Deep learning is becoming an increasingly important part of hydrological forecasting, often achieving prediction accuracies that are significantly better than traditional models (1, 2, 3). Despite its successes, deep learning is still new to the field of hydrology and is not widely utilized outside of academia. There are still many unanswered questions about how these models perform. These concerns are especially important to hydropower utilities and others who manage our water systems and keep them healthy. Traditional hydrology models are based (sometimes loosely) on physical principles; such models have been used for years and are trusted by the water management community.
As deep learning models are not explicitly based on physical laws, it can be harder to trust them, even when the accuracy improvements look as good as they do. Given the potential for deep learning to substantially benefit water management by providing more accurate predictions and forecasts, it is crucial that we better understand the strengths and weaknesses of this new approach. The research laid out here furthers that understanding by testing whether it is possible to improve streamflow model accuracy by combining a traditional hydrology model with a deep learning approach.
One of the simplest ways to incorporate physical context (although not physical principles, per se) into machine learning or deep learning is post-processing, which is where the outputs of a conceptual or physics-based model are used as inputs into a machine learning model. Post-processing can be used to improve model predictions and also to help understand deficiencies in different types of models.
To test post-processing for hydrologic prediction, we used the CAMELS dataset from NCAR. This dataset includes daily streamflow observations over 30 years from 672 basins in the continental United States. We benchmarked the Sacramento Soil Moisture Accounting model with a Snow-17 interface (SAC-SMA), which is the base rainfall-runoff model that is an integral part of the NOAA National Weather Service flood forecasting system, against a Long Short Term Memory deep learning network (LSTM). Both of these models were benchmarked against post-processing, which is where SAC-SMA states and outputs along with weather data were fed as inputs into the LSTM.
Model inputs to both SAC-SMA and the LSTM include weather data (daily precipitation, temperature, radiation), and additional inputs into the LSTM also include catchment attributes related to soil, climate, vegetation. The SAC-SMA model was calibrated to each basin using the CAMELS observation data. More details of the experiment can be found in this paper by Nearing et al. (2020; https://eartharxiv.org/53te4).
We measured model performance using three metrics: correlation (higher values are better), total bias (zero is best), and the average error in timing of peak flow events (lower values are better). The takeaway from this benchmarking exercise is that for overall correlation and Peak Timing there is a lot to be gained by adding an LSTM to an existing SAC-SMA model, but adding SAC-SMA does not improve the LSTM's correlation or peak timing. This means the LSTM on its own is modeling the majority of the key information required for accurate predictions.
There was substantially less improvement to the LSTM by adding SAC-SMA related inputs than there was improvement to SAC-SMA by post-processing with the LSTM. There was a small improvement over the standard LSTM in terms of overall bias - bias was reduced in 313 of 531 basins (59%). This improvement is because the LSTM is not constrained by conservation laws, whereas SAC-SMA is conservative.
So, under what conditions do we see improvements from combining deep learning and conceptual modeling in this simple way? Mapping improvements in individual CAMELS catchments across the US does not yield much insight into where we see improvements over the standalone SAC-SMA. We can expect improvements to the correlation between simulated and observed hydrographs (Nash Sutcliffe Efficiency) and also to the timing of peak flow events in most basins across the country, however post processing does not preserve the overall water balance, and the total longterm flow bias is not always improved due to post-processing. This indicates a need for embedding explicit physical principles (e.g., conservation of mass) into the deep learning model.
The figure below shows correlations between different states of the SAC-SMA model and improvements among forecasts in the CAMELS basins between (i) the standalone models (SAC-SMA and LSTM w/o SAC-SMA) vs. (ii) the post-processing model. Darker colors indicate strong correlations or anti-correlations between different hydrologic states and basins where deep learning provides strong benefit.
The traditional hydrology model, SAC-SMA, improved in basins with high snow water equivalent (i.e., snow-dominated basins) and also in basins with high potential evapotranspiration (i.e., water-limited basins). Kratzert et al., (2019) showed how and why the LSTM is effective at modeling snow and snowmelt processes without training on snow data directly, and we see that effect here.
The LSTM was improved in basins with higher lower zone soil moisture content, which are likely basins with higher base flow percentages. This means that the conceptual SAC-SMA model provides information that the data-driven LSTM lacks about subsurface hydrological processes, which makes sense because we lack training data directly about subsurface processes.
Figure Key: SWE = Snow Water Equivalent, RAIM = Rain + Snowmelt, PET = Potential Evapotranspiration, ET = Total Evapotranspiration, MOD_RUN = Predicted Streamflow, LZ variables are components of the Lower Zone Soil Moisture, and UZ variables are components of the Upper Zone Soil Moisture, ADIMC = Water Content Related to Impervious Areas.
Deep learning provides significantly more accurate streamflow simulations, on average, than traditional hydrology models that are based on physical principles like conservation of mass. However, because deep learning lacks these physical principles, there are certain potential weaknesses. There are currently several ongoing efforts in hydrology (e.g., 4, 5, 6) to combine physics with machine learning, and the post-processing example provided here is one of the simplest approaches. Nevertheless, this simple experiment shows that there is significant potential for combining transitional hydrology models with deep learning, and that this combination will depend on (i) what target variables we are interested in (e.g., long term water balance vs. short term hydrograph correlations) and (ii) the dominant hydrological regimes of the target catchments (e.g., snow-dominated vs. baseflow driven, etc.).
For more information, see a full write-up of this experiment at eartharxiv.org/53te4.