Being able to predict, with reasonable certainty, how a river will behave in the future is critical to preparing communities and ecosystems to handle it. Making sure that model is accurate, and accounts for things like changes in climate and land use, requires a lot of computing power.
Dan Myers, a graduate student in the department of geography at Indiana University Bloomington, used IU’s high performance computing resources to ensure his hydrologic models of rivers are as accurate as possible.
“Anyone can create a model and make a guess that could be terribly wrong,” Myers said. “We want to make sure that what we’re predicting is something that’s reasonable. We take existing data, like the elevation of the landscape, the types of soil, the different land uses, and also the historic climate patterns – air temperatures, rainfall, and snowfall – then use the outputs from massive global climate models that other researchers have run to try to predict what’s going to happen in the atmosphere throughout the 21st century.”
Anyone can create a model and make a guess that could be terribly wrong. We want to make sure that what we’re predicting is something that’s reasonable.
Myers has run many thousands of different models using Big Red 3 and Carbonate, studying different watersheds and making sure his findings are consistent. These models could help inform the design of future urban stormwater systems to account for future flooding. Or if the models show a rise in water temperatures, they can help fisheries managers understand that they may need to change their management to accommodate.
Myers fine-tunes the models he creates by evaluating them based on historic data, so he can show that his models are doing a fairly good job of predicting stream flow. In doing this, though, he noticed a problem.
“When researchers make environmental models, they will often take the time period of the historical stream flow data and they’ll divide it into two parts,” Myers said. “One part they use to fine-tune the models, and they call that the calibration data set. The second part is an independent set of data, that they then use to test the model and ensure it’s performing well with new data. They call that the validation data set. When researchers choose time periods for those data sets, they usually just pick a number. For example, maybe they’ll use the historical data from 1960 to 1980 for the calibration data set, and 1981 to 1999 for the validation. The problem that my research identified is that there could have been a dam built on the river, or the soft soils that absorb rain water could have been replaced with parking lots. All these different land uses cause the flow of the stream to become so much different when it rains. When you’re fine-tuning a model to one time period of data, and evaluating it based on another time period of data, you could get different results just because the data are different.”
Such practices can have a large effect on the accuracy of model results. Myers used this experience to outline recommendations for other researchers to help avoid this glitch. Namely, to be sure to determine if anything happened to the river being studied between the calibration and validation time periods. Myers also suggests taking a careful look at the mean and median stream flows and any outliers, like extreme floods or really low flows, to ensure the data are more consistent between the two time frames.
Such discoveries and predictions would not have been possible without the help of IU’s high performance computers, Myers said.
“The IU supercomputing resources have allowed us to finish up this study,” he said. “It took about a year and a half between when we started and when the results were published. But without them, if I were running the models on my desktop computer, I would have hardly made any progress. I’d be a PhD student for a hundred years.”