Drawing
of water by Leonardo da Vinci
“Occam's Razor”, derived by the Franciscan
theologian of same name, is
the time tested principle of parsimony, economy, or succinctness used in
problem-solving. It states that “among
competing hypotheses, the hypothesis with the fewest assumptions should be
selected.” Leonardo Da Vinci in his never ending quest for beauty and truth had
a similar perspective: “Simplicity is
the ultimate sophistication.”
Unfortunately, in modeling, there is often a natural
bias to ignore this principle, with a preference for advanced modeling
approaches over simpler alternatives, as if increasing complexity increases
model performance. Advanced numerical
modeling is used in virtually every engineering and scientific discipline, such
as airplane design, climate modeling, bridge design, and hydrological modeling. Its theoretical underpinnings date back to
John Von Neumann, widely considered the greatest mathematician of the 20th
century and father of the modern day computer.
By discretizing the spatial domain into a grid consisting of discrete
cells or elements, numerical models have the capability of representing natural
temporal and spatial complexity of real-world systems. These powerful models can realistically
represent the physics of extremely complex phenomenon like climate which varies
over both space and time.
Source: United States Geological Survey
However, by their very
nature, these models are extremely data intensive, often requiring thousands,
tens of thousands or even millions of inputs, which translates into
significantly higher development time and costs. The logical question is why a simpler and
therefore less expensive model would not suffice, particularly when it provides
an acceptable degree of accuracy, and in some cases, similar or even superior accuracy
to a far more complex model?
The assumption that a
more complex model is always inherently superior to a model that is
mathematically simpler and/or has fewer input variable requirements not only
contradicts the keen intellect of Da Vinci and Occam’s Razor, it also
contradicts common sense. The reality is
that any model is only as good as its data; the old adage “garbage in – garbage
out” is as true in mathematical modeling as it is in any other endeavor where
the outcome is a byproduct of the inputs.
In applications where the data is sparse and/or has extreme variability
and uncertainty, or the phenomenon of interest is complex, even the more
“advanced” model will often provide less than desired accuracy. In addition,
even the most complex model will not work if its underlying physical and/or
mathematical assumptions do not match the physical system of interest. Alternatively, many simple models adequately
capture the essential physics to simulate or predict the system behavior of
interest to the accuracy necessary while requiring much less data and
development time.
When selecting an appropriate model, some basic questions that should first be answered include:
- What are the objectives of the modeling analysis?
- What models are theoretically capable of meeting the necessary objectives, including simulating the system behavior of interest and achieving the necessary degree of accuracy at a sufficient level of confidence?
- What data and information are necessary and available for performing and benchmarking the modeling?
- What models can be used with the available data and information?
- What are the relative advantages and disadvantages of each candidate model, including anticipated performance, development time, validation, ease of use, and updating?
Some of the answers to
these questions are not straight forward.
For example, there is often uncertainty as to how well a model will
perform until at least some preliminary modeling simulations have been
performed. Furthermore, as part of the
process of accepting or rejecting a model, the common modeling protocol of “validation”
arises, where a model’s prediction capability is assessed by attempting to accurately
reproduce historical events. However,
even when a “good” historical match or validation is achieved, it is almost
always limited to a relatively short historical period that may or may not be
representative of future conditions. This
leads to the common Wall Street refrain that “past performance is no guarantee
of future success”.
What can be said with
confidence, and will be demonstrated in future blogs, is that a much simpler
model will often provide the same level of accuracy as a significantly more
complex model and yet cost many thousands, even tens of thousands of dollars
less. Modelers should always begin by
asking whether they can achieve the required modeling objectives with a
relatively simple model. At the very
least, a simple model can generate preliminary results and provide insights
into the nature of the problem and data requirements, which will support possible
future advanced modeling efforts. In
other cases, the simple model will be all that is necessary to achieve the
required modeling objectives at the level of accuracy and confidence required
while avoiding unnecessary costs. There
are of course situations where a significantly more complex model is absolutely
necessary for providing the prediction capability necessary. However, model complexity should only be
increased as is warranted, but no further.
As the great Albert
Einstein, who knew a thing or two about mathematical models, famously observed:
"Everything should be made as
simple as possible, but not simpler."
Einstein with his seemingly simple but profound
equation, considered by many the most elegant and beautiful in history.