RE: [Swarm-Modelling] comparing models

swarm-modeling
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Swarm-Modelling] comparing models

From:	Marshall, James A R
Subject:	RE: [Swarm-Modelling] comparing models
Date:	Wed, 3 Sep 2003 10:49:20 +0100
Hi,
  it seems to me that everyone is so far talking about statistical
validation and similar techniques based on observing the model's outputs and
comparing with the target system (although I've only skimmed Michael's long
email). Glen rightly points out that because a model is able to reproduce
some observable results from the target system that it is not necessarily a
correct model, and it may not even be useful for its intended purpose (which
is likely to be predictive). I like to come at the problem from a slightly
different angle... a "good" model is one with explanatory power: presumably
your model will be of interest because the predictive results it produces
are in some way non-intuitive (otherwise why have the model?). In which case
a good approach is to dissect the model to try and explain how these
non-intuitive results are produced, formulate an explanation and relate it
back to the target system to see if this explanation is reasonable and
understandable. If it's not then it may be that the model does not
accurately capture some important aspect of the target system, but if you're
lucky the explanation may provide you with an insight into how the target
system really works, rather than how you think it works. This is a nice
situation to be in because your knowledge has been increased not solely
through the prediction of a non-intuitive result, but through a better
understanding of the system you're modelling.
  Unfortunately the more complicated/complex your model the harder it may be
to analyse and formulate explanations for observed behaviour. I've found it
a good approach for the simpler models I've worked with though.
  All this is summarised in what I consider to be a very important paper by
Di Paolo et al. which I think others on the list have already recommended:

Di Paolo, E. A., Noble, J. & Bullock, S. (2000) Simulation models as opaque
thought experiments. In M. Bedau, J. McCaskill & S. Packard (Eds.),
Proceedings of the seventh international conference on artificial life (pp.
497-506). Cambridge MA: MIT Press.

        James

---
Dr James A R Marshall
Complex Systems Modelling Group (COSMIC)
Department of Earth Science and Engineering
Imperial College London
Tel: +44 (0)20 7594 7493
Fax: +44 (0)20 7594 7444
Container World Project - http://www.ese.ic.ac.uk/research/containerworld/



-----Original Message-----
From: Michael McDevitt [mailto:address@hidden
Sent: 02 September 2003 21:46
To: address@hidden
Subject: Re: [Swarm-Modelling] comparing models






An extract from a paper I wrote for the Navy earlier this year that
generally pertains to any modeling effort:

      “Model  validation  is  “substantiation  that  a computerized model
within its domain of applicability possesses a satisfactory range of
accuracy consistent with the intended application of the model” (Schlesinger
et al. 1979). Model verification is
often defined as “ensuring that the computer program of the computerized
model and its implementation are correct”. Model accreditation determines if
a model satisfies specified criteria according to a specified process.


      A  related  topic  is  model  credibility.  Model  credibility  is
concerned  with developing the confidence needed by (potential) users in a
model and in the information derived from the model that they are willing to
use the model and the derived
information.” (Sargent, 1999)


      “Conceptual  model  validity  is  determining  that  (1)  the
theories and assumptions underlying the conceptual model are correct, and
(2) the model representation of the problem entity and the model’s
structure, logic, and mathematical and causal
relationships are “reasonable” for the intended purpose of the model.”

Ø     “The primary validation techniques used for these evaluations are face
validation and traces. Face validation has experts on the problem entity
evaluate the conceptual model to determine if it is correct and reasonable
for its purpose. This usually
requires examining the flowchart or graphical model, or the set of model
equations.  The use of traces is the tracking of entities through each
sub-model and the overall model to determine if the logic is correct and if
the necessary accuracy is
maintained.”  (Sargent, 1999)


      Whatever  tests  and  techniques are applied, the litmus test for
validation is that the conceptual model is valid with respect to the problem
and purpose for the model. “Conceptual model validity is determining that
(1) the theories and assumptions
underlying  the  conceptual  model  are  correct,  and  (2) the model
representation of the problem entity and the model’s structure, logic, and
mathematical and causal relationships are “reasonable” for the intended
purpose of the model.” (Sargent, 1999)
Remember,  “all  models  are  wrong,  some are useful” (Sterman, 2000).  A
model should be a simplification of reality that provides insight.  Creating
a full fidelity (scale) model of reality is usually prohibitively expensive
and even impossible in most
case.  Regardless it may not provide any additional utility for the added
expense.


      The  theories  and assumptions underlying the model should be tested
using mathematical analysis and statistical methods on problem entity data
when available. Examples of theories and assumptions are linearity,
independence, stationary, and Poisson
arrivals.  Examples of applicable statistical methods are fitting
distributions to data, estimating parameter values from the data, and
plotting the data to determine if they are stationary. In addition, all
theories used should be reviewed to ensure they
were applied correctly.”


      “Computerized model verification is defined as ensuring that the
computer programming and implementation of the conceptual model is correct.”

      “The major factor effecting verification is whether a simulation
language or a higher level programming language such as C, or C++ is used.
The use of a special-purpose simulation language generally will result in
having fewer errors than if a
general-purpose simulation language is used, and using a general-purpose
simulation language will generally result in having fewer errors than if a
general purpose higher level language is used. (The use of a simulation
language also usually reduces the
programming time required and the flexibility.)” (Sargent, 1999)

      Determining  that  the  computer  simulation,  software,  programming
and algorithms faithfully represent the intended conceptual model accurately
enough for the purpose of the effort is the next step.  Any number of tests
can be used to compare the
simulation’s  behavior  with  real system behavior.  Are the mathematical
functions and relationships logically correct and result in the postulated
behavior.  Is the model a black box model with hidden algorithms that
produce results that appear to mimic
reality but which have no logical or causal relationship to reality?


      Before a model can be accredited for use, for the purpose for which it
was designed, it must satisfy the sponsor that it is credible and that it is
operationally valid under most circumstances as required by its application.


      “Operational  validity is concerned with determining that the model’s
output behavior has the accuracy required for the model’s intended purpose
over the domain of its intended applicability.” (Sargent, 1999)  This is
done using different approaches
depending  upon  whether  the  system  being  modeled  is  “observable”.
Observable implies that there are appropriate data available for analysis.
The following table summarizes the basic approaches to determining a model’s
operational validity.  Both
approaches are usually required to foster confidence that the model is
useful for its intended purpose.

      “’Comparison’  means  comparing/testing  the  model  and  system
input-out  behaviors,  and  “explore  model  behavior”  means  to  examine
the  output  behavior  of  the model using appropriate validation techniques
and usually includes parameter
variability-sensitivity  analysis.  Various  sets  of  experimental
conditions from the domain of the model’s intended applicability should be
used for both comparison and exploring model behavior. To obtain a high
degree of confidence in a model and its
results, comparisons of the model’s and system’s input/output behaviors for
several different sets of experimental conditions are usually required.”
(Sargent, 1999)

      “Data  are  needed  for  three  purposes: (1) for building the
conceptual model, (2) for validating the model, and (3) for performing
experiments with the validated model.  In model validation we are concerned
only with the first two types of data.”
Computer model verification requires the third type. (Sargent, 1999)

      Various validation techniques (and tests) can be used in model
validation and verification. The techniques can be used either subjectively
or objectively. By “objectively,” we mean using some type of statistical
test or mathematical procedure, e.g.,
hypothesis tests and confidence intervals. A combination of techniques is
generally used.

      John Sterman “Checklist for the Model Consumer” includes twenty 20
questions that aid in determining a model’s utility as part of the
validation process:
 
|--------------------------------------------------------------------------|
     | 1.    What is the problem at hand?
|
 
|--------------------------------------------------------------------------|
     | 2.    What is the problem addressed by the model?
|
 
|--------------------------------------------------------------------------|
     |3.     What is the boundary of the model?
|
 
|--------------------------------------------------------------------------|
     |4.     What factors are endogenous? Exogenous? Excluded?
|
 
|--------------------------------------------------------------------------|
     |5.     Are soft variables included?
|
 
|--------------------------------------------------------------------------|
     |6.     Are feedback effects properly taken into account?
|
 
|--------------------------------------------------------------------------|
     |7.     Does the model capture possible side effects, both harmful and
|
     |beneficial?
|
 
|--------------------------------------------------------------------------|
     |8.     What is the time horizon relevant to the problem?
|
 
|--------------------------------------------------------------------------|
     |9.     Does the model include as endogenous components those factors
that |
     |may change significantly over the time horizon?
|
 
|--------------------------------------------------------------------------|
     |10.    Are people assumed to act rationally and to optimize their
|
     |performance?
|
 
|--------------------------------------------------------------------------|
     |11.    Does the model take non-economic behavior (organizational
|
     |realities, non-economic motives, political factors, cognitive
limitations)|
     |into account?
|
 
|--------------------------------------------------------------------------|
     |12.    Does the model assume people have perfect information about the
|
     |future and about the way the system works, or does it take into
account   |
     |the limitations, delays, and errors in acquiring information that
plague  |
     |decision makers in the real world?
|
 
|--------------------------------------------------------------------------|
     |13.    Are appropriate time delays, constraints, and possible
bottlenecks |
     |taken into account?
|
 
|--------------------------------------------------------------------------|
     |14.    Is the model robust in the face of extreme variations in input
|
     |assumptions?
|
 
|--------------------------------------------------------------------------|
     |15.    Are the policy recommendations derived from the model sensitive
to |
     |plausible variations in its assumptions?
|
 
|--------------------------------------------------------------------------|
     |16.    Are the results of the model reproducible? Or are they adjusted
|
     |(add factored) by the model builder?
|
 
|--------------------------------------------------------------------------|
     |17.    Is the model currently operated by the team that built it?
|
 
|--------------------------------------------------------------------------|
     |18.    How long does it take for the model team to evaluate a new
|
     |situation, modify the model, and incorporate new data?
|
 
|--------------------------------------------------------------------------|
     |19.    Is the model documented? Is the documentation publicly
available?  |
 
|--------------------------------------------------------------------------|
     | 20.   Can third parties use the model and run their own analyses with
it?|
 
|--------------------------------------------------------------------------|





 
Figure 13: Twenty Questions to Assess Usefulness (Sterman, 1991)


      The following techniques from Sargent (1999) are commonly used for
validating and verifying the sub-models and overall model.


      “Animation: The model’s operational behavior is displayed graphically
as the model moves through time. For example, the movements of parts through
a factory during a simulation are shown graphically.


      Comparison  to  Other  Models:  Various results (e.g., outputs) of the
simulation model being validated are compared to results of other (valid)
models. For example, (1) simple cases of a simulation model may be compared
to known results of analytic
models, and (2) the simulation model may be compared to other simulation
models that have been validated.


      Degenerate  Tests:  The  degeneracy  of the model’s behavior is tested
by appropriate selection of values of the input and internal parameters. For
example, does the average number in the queue of a single server continue to
increase with respect to
time when the arrival rate is larger than the service rate?


      Event Validity: The “events” of occurrences of the simulation model
are compared to those of the real system to determine if they are similar.
An example of events is deaths in a fire department simulation.


      Extreme Condition Tests: The model structure and output should be
plausible for any extreme and unlikely combination of levels of factors in
the system; e.g., if in process inventories are zero, production output
should be zero.


      Face  Validity:  “Face  validity”  is  asking  people  knowledgeable
about  the  system whether the model and/or its behavior are reasonable.
This technique can be used in determining if the logic in the conceptual
model is correct and if a model’s
input-output relationships are reasonable.


      Fixed Values: Fixed values (e.g., constants) are used for various
model input and internal variables and parameters. This should allow the
checking of model results against easily calculated values.


      Historical  Data Validation: If historical data exist (or if data are
collected on a system for building or testing the model), part of the data
is used to build the model and the remaining data are used to determine
(test) whether the model behaves
as the system does. (This testing is conducted by driving the simulation
model with either samples from distributions or traces.


      Historical  Methods: The three historical methods of validation are
rationalism, empiricism, and positive economics. Rationalism assumes that
everyone knows whether the underlying assumptions of a model are true. Logic
deductions are used from these
assumptions  to  develop the correct (valid) model. Empiricism requires
every assumption and outcome to be empirically validated. Positive economics
requires only that the model be able to predict the future and is not
concerned with a model’s assumptions
or structure (causal relationships or mechanism).


      Internal  Validity:  Several  replications  (runs)  of  a  stochastic
model  are  made to determine the amount of (internal) stochastic
variability in the model. A high amount of variability (lack of consistency)
may cause the model’s results to be
questionable and, if typical of the problem entity, may question the
appropriateness of the policy or system being investigated.


      Multistage  Validation: Naylor and Finger (1967) proposed combining
the three historical methods of rationalism empiricism, and positive
economics into a multistage process of validation. This validation method
consists of (1) developing the model’s
assumptions on theory, observations, general knowledge, and function, (2)
validating the model’s assumptions where possible by empirically testing
them, and (3) comparing (testing) the input-output relationships of the
model to the real system.


      Operational  Graphics:  Values of various performance measures, e.g.,
number in queue and percentage of servers busy, are shown graphically as the
model moves through time; i.e., the dynamic behaviors of performance
indicators are visually displayed
as the simulation model moves through time.


      Parameter  Variability–Sensitivity  Analysis: This technique consists
of changing the values of the input and internal parameters of a model to
determine the effect upon the model’s behavior and its output. The same
relationships should occur in the
model as in the real system. Those parameters that are sensitive, i.e.,
cause significant changes in the model’s behavior or output, should be made
sufficiently accurate prior to using the model. (This may require iterations
in model development.)


      Predictive  Validation: The model is used to predict (forecast) the
system behavior, and then comparisons are made between the system’s behavior
and the model’s forecast to determine if they are the same. The system data
may come from an operational
system or from experiments performed on the system, e.g., field tests.


      Traces: The behavior of different types of specific entities in the
model are traced (followed) through the model to determine if the model’s
logic is correct and if the necessary accuracy is obtained.


      Turing Tests: People who are knowledgeable about the operations of a
system are asked if they can discriminate between system and model outputs.”
(Sargent, 1999)





      References:


Sargent, R., (1999) Validation and Verification of Simulation Models
[Electronic Version], Proceedings of the 1999 Winter Simulation Conference,
Retrieved September 10, 2002, from
http://www.informs-cs.org/wsc99papers/prog99.html

Schlesinger, et al. (1979). Terminology for Model Credibility, Simulation,
32, 3, pp. 103–104.

Sterman, J., (1991). A Skeptics Guide to Computer Models  [Electronic
Version] Retrieved April 5, 2002, from
http://sysdyn.mit.edu/sdep/Roadmaps/RM9/D-4101-1.pdf


Sterman, J., (2000) Business Dynamics, Systems Thinking and Modeling for a
Complex World, Boston, Massachusetts: Irwin McGraw-Hill








All the Best,

Mike McDevitt
Senior Analyst & Modeler
CACI Dynamic Systems Inc.
858-695-8220 x1457


 

                      address@hidden

                      .com                     To:       address@hidden

                      Sent by:                 cc:

                      address@hidden        Subject:  Re:
[Swarm-Modelling] comparing models         
                      warm.org

 

 

                      09/02/2003 01:20

                      PM

                      Please respond to

                      modelling

 

 





Andy Cleary writes:
 > >So, as to your questions about which techniques are best, just pick a
 > >few, do the work, write down the results.  Pick a few more, do the
 > >work, write down the results.  Etc.  If a sizable sampling of
 > >techniques (e.g. 3 statistical, 2 from feature extraction, 1
 > >state-space reconstruction, 2 in signal analysis) all give you a
 > >certain result (e.g. model 1 and model 2 lead to the same
 > >conclusions), then it may be worth pointing that out to some audience.
 >
 > I don't disagree with you, but if you tried selling this as "validation"
to
 > people used to *physics*, you would not get very far.
 >
 > Or to make it more concrete, *I* have not gotten very far in the same
 > circumstances...

Can you give a list of the validation techniques you have used and,
perhaps, a breakdown of which ones were mildly successful and which
ones were definitely not successful?

--
glen e. p. ropella              =><=                           Hail Eris!
H: 503.630.4505                              http://www.ropella.net/~gepr
M: 971.219.3846                               http://www.tempusdictum.com

_______________________________________________
Modelling mailing list
address@hidden
http://www.swarm.org/mailman/listinfo/modelling



_______________________________________________
Modelling mailing list
address@hidden
http://www.swarm.org/mailman/listinfo/modelling
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Swarm-Modelling] comparing models, gepr, 2003/09/02
- Re: [Swarm-Modelling] comparing models, Andy Cleary, 2003/09/02
  - Re: [Swarm-Modelling] comparing models, gepr, 2003/09/02
- Re: [Swarm-Modelling] comparing models, Michael McDevitt, 2003/09/02
  - Re: [Swarm-Modelling] comparing models, gepr, 2003/09/02
- RE: [Swarm-Modelling] comparing models, Marshall, James A R <=
Prev by Date: [Swarm-Modelling] decision modelling
Next by Date: Re: [Swarm-Modelling] decision modelling
Previous by thread: Re: [Swarm-Modelling] comparing models
Next by thread: [Swarm-Modelling] Swarm contribution
Index(es):
- Date
- Thread