Sunday, October 5, 2014

Simplicity versus Complexity in Mathematical Modeling

Drawing of water by Leonardo da Vinci

“Occam's Razor”, derived by the Franciscan theologian of same name, is the time tested principle of parsimony, economy, or succinctness used in problem-solving.  It states that “among competing hypotheses, the hypothesis with the fewest assumptions should be selected.” Leonardo Da Vinci in his never ending quest for beauty and truth had a similar perspective: Simplicity is the ultimate sophistication.”

Unfortunately, in modeling, there is often a natural bias to ignore this principle, with a preference for advanced modeling approaches over simpler alternatives, as if increasing complexity increases model performance.  Advanced numerical modeling is used in virtually every engineering and scientific discipline, such as airplane design, climate modeling, bridge design, and hydrological modeling.  Its theoretical underpinnings date back to John Von Neumann, widely considered the greatest mathematician of the 20th century and father of the modern day computer.   By discretizing the spatial domain into a grid consisting of discrete cells or elements, numerical models have the capability of representing natural temporal and spatial complexity of real-world systems.  These powerful models can realistically represent the physics of extremely complex phenomenon like climate which varies over both space and time.    

Source:  United States Geological Survey

However, by their very nature, these models are extremely data intensive, often requiring thousands, tens of thousands or even millions of inputs, which translates into significantly higher development time and costs.   The logical question is why a simpler and therefore less expensive model would not suffice, particularly when it provides an acceptable degree of accuracy, and in some cases, similar or even superior accuracy to a far more complex model?  

The assumption that a more complex model is always inherently superior to a model that is mathematically simpler and/or has fewer input variable requirements not only contradicts the keen intellect of Da Vinci and Occam’s Razor, it also contradicts common sense.  The reality is that any model is only as good as its data; the old adage “garbage in – garbage out” is as true in mathematical modeling as it is in any other endeavor where the outcome is a byproduct of the inputs.   In applications where the data is sparse and/or has extreme variability and uncertainty, or the phenomenon of interest is complex, even the more “advanced” model will often provide less than desired accuracy. In addition, even the most complex model will not work if its underlying physical and/or mathematical assumptions do not match the physical system of interest.  Alternatively, many simple models adequately capture the essential physics to simulate or predict the system behavior of interest to the accuracy necessary while requiring much less data and development time.

When selecting an appropriate model, some basic questions that should first be answered include:
  • What are the objectives of the modeling analysis?
  • What models are theoretically capable of meeting the necessary objectives, including simulating the system behavior of interest and achieving the necessary degree of accuracy at a sufficient level of confidence?
  • What data and information are necessary and available for performing and benchmarking the modeling?
  • What models can be used with the available data and information?
  • What are the relative advantages and disadvantages of each candidate model, including anticipated performance, development time, validation, ease of use, and updating?
Some of the answers to these questions are not straight forward.  For example, there is often uncertainty as to how well a model will perform until at least some preliminary modeling simulations have been performed.  Furthermore, as part of the process of accepting or rejecting a model, the common modeling protocol of “validation” arises, where a model’s prediction capability is assessed by attempting to accurately reproduce historical events.  However, even when a “good” historical match or validation is achieved, it is almost always limited to a relatively short historical period that may or may not be representative of future conditions.  This leads to the common Wall Street refrain that “past performance is no guarantee of future success”.   

What can be said with confidence, and will be demonstrated in future blogs, is that a much simpler model will often provide the same level of accuracy as a significantly more complex model and yet cost many thousands, even tens of thousands of dollars less.  Modelers should always begin by asking whether they can achieve the required modeling objectives with a relatively simple model.  At the very least, a simple model can generate preliminary results and provide insights into the nature of the problem and data requirements, which will support possible future advanced modeling efforts.  In other cases, the simple model will be all that is necessary to achieve the required modeling objectives at the level of accuracy and confidence required while avoiding unnecessary costs.  There are of course situations where a significantly more complex model is absolutely necessary for providing the prediction capability necessary.  However, model complexity should only be increased as is warranted, but no further.  
As the great Albert Einstein, who knew a thing or two about mathematical models, famously observed:  

"Everything should be made as simple as possible, but not simpler."

Einstein with his seemingly simple but profound equation, considered by many the most elegant and beautiful in history.