Sunday, October 5, 2014

Simplicity versus Complexity in Mathematical Modeling

Drawing of water by Leonardo da Vinci

“Occam's Razor”, derived by the Franciscan theologian of same name, is the time tested principle of parsimony, economy, or succinctness used in problem-solving.  It states that “among competing hypotheses, the hypothesis with the fewest assumptions should be selected.” Leonardo Da Vinci in his never ending quest for beauty and truth had a similar perspective: Simplicity is the ultimate sophistication.”

Unfortunately, in modeling, there is often a natural bias to ignore this principle, with a preference for advanced modeling approaches over simpler alternatives, as if increasing complexity increases model performance.  Advanced numerical modeling is used in virtually every engineering and scientific discipline, such as airplane design, climate modeling, bridge design, and hydrological modeling.  Its theoretical underpinnings date back to John Von Neumann, widely considered the greatest mathematician of the 20th century and father of the modern day computer.   By discretizing the spatial domain into a grid consisting of discrete cells or elements, numerical models have the capability of representing natural temporal and spatial complexity of real-world systems.  These powerful models can realistically represent the physics of extremely complex phenomenon like climate which varies over both space and time.    

Source:  United States Geological Survey

However, by their very nature, these models are extremely data intensive, often requiring thousands, tens of thousands or even millions of inputs, which translates into significantly higher development time and costs.   The logical question is why a simpler and therefore less expensive model would not suffice, particularly when it provides an acceptable degree of accuracy, and in some cases, similar or even superior accuracy to a far more complex model?  

The assumption that a more complex model is always inherently superior to a model that is mathematically simpler and/or has fewer input variable requirements not only contradicts the keen intellect of Da Vinci and Occam’s Razor, it also contradicts common sense.  The reality is that any model is only as good as its data; the old adage “garbage in – garbage out” is as true in mathematical modeling as it is in any other endeavor where the outcome is a byproduct of the inputs.   In applications where the data is sparse and/or has extreme variability and uncertainty, or the phenomenon of interest is complex, even the more “advanced” model will often provide less than desired accuracy. In addition, even the most complex model will not work if its underlying physical and/or mathematical assumptions do not match the physical system of interest.  Alternatively, many simple models adequately capture the essential physics to simulate or predict the system behavior of interest to the accuracy necessary while requiring much less data and development time.

When selecting an appropriate model, some basic questions that should first be answered include:
  • What are the objectives of the modeling analysis?
  • What models are theoretically capable of meeting the necessary objectives, including simulating the system behavior of interest and achieving the necessary degree of accuracy at a sufficient level of confidence?
  • What data and information are necessary and available for performing and benchmarking the modeling?
  • What models can be used with the available data and information?
  • What are the relative advantages and disadvantages of each candidate model, including anticipated performance, development time, validation, ease of use, and updating?
Some of the answers to these questions are not straight forward.  For example, there is often uncertainty as to how well a model will perform until at least some preliminary modeling simulations have been performed.  Furthermore, as part of the process of accepting or rejecting a model, the common modeling protocol of “validation” arises, where a model’s prediction capability is assessed by attempting to accurately reproduce historical events.  However, even when a “good” historical match or validation is achieved, it is almost always limited to a relatively short historical period that may or may not be representative of future conditions.  This leads to the common Wall Street refrain that “past performance is no guarantee of future success”.   

What can be said with confidence, and will be demonstrated in future blogs, is that a much simpler model will often provide the same level of accuracy as a significantly more complex model and yet cost many thousands, even tens of thousands of dollars less.  Modelers should always begin by asking whether they can achieve the required modeling objectives with a relatively simple model.  At the very least, a simple model can generate preliminary results and provide insights into the nature of the problem and data requirements, which will support possible future advanced modeling efforts.  In other cases, the simple model will be all that is necessary to achieve the required modeling objectives at the level of accuracy and confidence required while avoiding unnecessary costs.  There are of course situations where a significantly more complex model is absolutely necessary for providing the prediction capability necessary.  However, model complexity should only be increased as is warranted, but no further.  
As the great Albert Einstein, who knew a thing or two about mathematical models, famously observed:  

"Everything should be made as simple as possible, but not simpler."

Einstein with his seemingly simple but profound equation, considered by many the most elegant and beautiful in history.

Monday, September 8, 2014

The Art and Science of Mathematical Modeling

                           Drawing of Water Lifting Device by Leonardo da Vinci

The renown social psychologist Daniel Kahneman, co-winner of the Nobel Prize for Economics in 2002, argues that humans are hard wired to find cause and affect relationships, even when one does not exist (e.g., “clustering illusion” like random bombing of London by Germany in World War II). Humans take comfort in believing that we can discover patterns or relationships that can be used to accurately predict, exploit, and even change future outcomes.  We have a proven history of constructing mathematically-based computer models that by executing a series of instructions accurately simulate and predict systems of interest, even complex ones.  However, despite our best attempts, models remain gross simplifications of complex real-world systems, and are thus inherently prone to prediction errors. 

The huge disparity in complexity between the real world and our most advanced mathematical models is illustrated in the book “Probably Approximately Correct”, written by the pre-eminent computational theorists, Dr. Leslie Valiant.  In it, he recounts the following story:

“In 1947 John von Neumann, the famously gifted mathematician, was keynote speaker at the first annual meeting of the Association for Computing Machinery.  In his address, he said that future computers would get along with just a dozen instruction types, a number known to be adequate for expressing all of mathematics.  He went on to say that one need not be surprised at this small number, since 1,000 words were known to be adequate for most situations in real life, and mathematics was only part of life, and a very simple part at that.  The audience reacted with hilarity.  This provoked von Neumann to respond:  “If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is.” 

Still, mathematical models are developed, sometimes at significant cost and effort for a very good reason; because even with relatively simple assumptions and mathematics, they often predict much better than “dead reckoning” or “gut feelings”.  The benefits include higher returns and security with reduced costs and risks, which transfers enormous benefits to the user and often society.  Models when developed and used properly limit inherent human biases and emotions and provide a more objective framework for simulating and understanding a system of interest.  To this end, we use models in a variety of real-world applications, including forecasting of things like weather, forest fires, and water demand.  Sometimes the models are extremely accurate; other times, we are striving only to increase our forecast from a “best guess” to a nominal improvement, where even a small increase in forecasting accuracy produces significant benefits over the long run.  
Modeling is often as much of an art as it is a science.  On the science side, a fundamental understanding of the governing physical laws or processes is necessary for not only constructing an acceptable model, but also understanding its limitations, and the conditions under which it may fail.  There are often multiple approaches and varying levels of complexity that may achieve similar modeling performance and capability.  Being able to construct an appropriate model that produces desired performance often requires a combination of expertise, experience, and flexibility.  A dose of humility also helps. 

 Like mathematics and computational theory, modeling is evolving, and as prediction needs and expectations change, new creative modeling methodologies will continue to emerge that improve and expand modeling capabilities.  Many models today are capable of providing high prediction accuracy for complex systems, which can help humans better manage resources, even helping avert possible adverse outcomes, including disasters.   In water resources, modeling has helped minimize potentially devastating consequences like saltwater intrusion in Orange County, California and land subsidence in Venice, Italy.   At its best, modeling can help safeguard our environment, our water, even our cultural and artistic treasures.