[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Swarm-Modelling] comparing models
From: |
Michael McDevitt |
Subject: |
Re: [Swarm-Modelling] comparing models |
Date: |
Tue, 2 Sep 2003 13:46:05 -0700 |
An extract from a paper I wrote for the Navy earlier this year that generally
pertains to any modeling effort:
“Model validation is “substantiation that a computerized model
within its domain of applicability possesses a satisfactory range of accuracy
consistent with the intended application of the model” (Schlesinger et al.
1979). Model verification is
often defined as “ensuring that the computer program of the computerized model
and its implementation are correct”. Model accreditation determines if a model
satisfies specified criteria according to a specified process.
A related topic is model credibility. Model credibility is
concerned with developing the confidence needed by (potential) users in a
model and in the information derived from the model that they are willing to
use the model and the derived
information.” (Sargent, 1999)
“Conceptual model validity is determining that (1) the theories
and assumptions underlying the conceptual model are correct, and (2) the model
representation of the problem entity and the model’s structure, logic, and
mathematical and causal
relationships are “reasonable” for the intended purpose of the model.”
Ø “The primary validation techniques used for these evaluations are face
validation and traces. Face validation has experts on the problem entity
evaluate the conceptual model to determine if it is correct and reasonable for
its purpose. This usually
requires examining the flowchart or graphical model, or the set of model
equations. The use of traces is the tracking of entities through each
sub-model and the overall model to determine if the logic is correct and if the
necessary accuracy is
maintained.” (Sargent, 1999)
Whatever tests and techniques are applied, the litmus test for
validation is that the conceptual model is valid with respect to the problem
and purpose for the model. “Conceptual model validity is determining that (1)
the theories and assumptions
underlying the conceptual model are correct, and (2) the model
representation of the problem entity and the model’s structure, logic, and
mathematical and causal relationships are “reasonable” for the intended purpose
of the model.” (Sargent, 1999)
Remember, “all models are wrong, some are useful” (Sterman, 2000). A
model should be a simplification of reality that provides insight. Creating a
full fidelity (scale) model of reality is usually prohibitively expensive and
even impossible in most
case. Regardless it may not provide any additional utility for the added
expense.
The theories and assumptions underlying the model should be tested
using mathematical analysis and statistical methods on problem entity data when
available. Examples of theories and assumptions are linearity, independence,
stationary, and Poisson
arrivals. Examples of applicable statistical methods are fitting distributions
to data, estimating parameter values from the data, and plotting the data to
determine if they are stationary. In addition, all theories used should be
reviewed to ensure they
were applied correctly.”
“Computerized model verification is defined as ensuring that the computer
programming and implementation of the conceptual model is correct.”
“The major factor effecting verification is whether a simulation language
or a higher level programming language such as C, or C++ is used. The use of a
special-purpose simulation language generally will result in having fewer
errors than if a
general-purpose simulation language is used, and using a general-purpose
simulation language will generally result in having fewer errors than if a
general purpose higher level language is used. (The use of a simulation
language also usually reduces the
programming time required and the flexibility.)” (Sargent, 1999)
Determining that the computer simulation, software, programming and
algorithms faithfully represent the intended conceptual model accurately enough
for the purpose of the effort is the next step. Any number of tests can be
used to compare the
simulation’s behavior with real system behavior. Are the mathematical
functions and relationships logically correct and result in the postulated
behavior. Is the model a black box model with hidden algorithms that produce
results that appear to mimic
reality but which have no logical or causal relationship to reality?
Before a model can be accredited for use, for the purpose for which it
was designed, it must satisfy the sponsor that it is credible and that it is
operationally valid under most circumstances as required by its application.
“Operational validity is concerned with determining that the model’s
output behavior has the accuracy required for the model’s intended purpose over
the domain of its intended applicability.” (Sargent, 1999) This is done using
different approaches
depending upon whether the system being modeled is “observable”.
Observable implies that there are appropriate data available for analysis. The
following table summarizes the basic approaches to determining a model’s
operational validity. Both
approaches are usually required to foster confidence that the model is useful
for its intended purpose.
“’Comparison’ means comparing/testing the model and system
input-out behaviors, and “explore model behavior” means to examine the
output behavior of the model using appropriate validation techniques and
usually includes parameter
variability-sensitivity analysis. Various sets of experimental conditions
from the domain of the model’s intended applicability should be used for both
comparison and exploring model behavior. To obtain a high degree of confidence
in a model and its
results, comparisons of the model’s and system’s input/output behaviors for
several different sets of experimental conditions are usually required.”
(Sargent, 1999)
“Data are needed for three purposes: (1) for building the conceptual
model, (2) for validating the model, and (3) for performing experiments with
the validated model. In model validation we are concerned only with the first
two types of data.”
Computer model verification requires the third type. (Sargent, 1999)
Various validation techniques (and tests) can be used in model validation
and verification. The techniques can be used either subjectively or
objectively. By “objectively,” we mean using some type of statistical test or
mathematical procedure, e.g.,
hypothesis tests and confidence intervals. A combination of techniques is
generally used.
John Sterman “Checklist for the Model Consumer” includes twenty 20
questions that aid in determining a model’s utility as part of the validation
process:
|--------------------------------------------------------------------------|
| 1. What is the problem at hand?
|
|--------------------------------------------------------------------------|
| 2. What is the problem addressed by the model?
|
|--------------------------------------------------------------------------|
|3. What is the boundary of the model?
|
|--------------------------------------------------------------------------|
|4. What factors are endogenous? Exogenous? Excluded?
|
|--------------------------------------------------------------------------|
|5. Are soft variables included?
|
|--------------------------------------------------------------------------|
|6. Are feedback effects properly taken into account?
|
|--------------------------------------------------------------------------|
|7. Does the model capture possible side effects, both harmful and
|
|beneficial?
|
|--------------------------------------------------------------------------|
|8. What is the time horizon relevant to the problem?
|
|--------------------------------------------------------------------------|
|9. Does the model include as endogenous components those factors that
|
|may change significantly over the time horizon?
|
|--------------------------------------------------------------------------|
|10. Are people assumed to act rationally and to optimize their
|
|performance?
|
|--------------------------------------------------------------------------|
|11. Does the model take non-economic behavior (organizational
|
|realities, non-economic motives, political factors, cognitive
limitations)|
|into account?
|
|--------------------------------------------------------------------------|
|12. Does the model assume people have perfect information about the
|
|future and about the way the system works, or does it take into account
|
|the limitations, delays, and errors in acquiring information that plague
|
|decision makers in the real world?
|
|--------------------------------------------------------------------------|
|13. Are appropriate time delays, constraints, and possible bottlenecks
|
|taken into account?
|
|--------------------------------------------------------------------------|
|14. Is the model robust in the face of extreme variations in input
|
|assumptions?
|
|--------------------------------------------------------------------------|
|15. Are the policy recommendations derived from the model sensitive to
|
|plausible variations in its assumptions?
|
|--------------------------------------------------------------------------|
|16. Are the results of the model reproducible? Or are they adjusted
|
|(add factored) by the model builder?
|
|--------------------------------------------------------------------------|
|17. Is the model currently operated by the team that built it?
|
|--------------------------------------------------------------------------|
|18. How long does it take for the model team to evaluate a new
|
|situation, modify the model, and incorporate new data?
|
|--------------------------------------------------------------------------|
|19. Is the model documented? Is the documentation publicly available?
|
|--------------------------------------------------------------------------|
| 20. Can third parties use the model and run their own analyses with
it?|
|--------------------------------------------------------------------------|
Figure 13: Twenty Questions to Assess Usefulness (Sterman, 1991)
The following techniques from Sargent (1999) are commonly used for
validating and verifying the sub-models and overall model.
“Animation: The model’s operational behavior is displayed graphically as
the model moves through time. For example, the movements of parts through a
factory during a simulation are shown graphically.
Comparison to Other Models: Various results (e.g., outputs) of the
simulation model being validated are compared to results of other (valid)
models. For example, (1) simple cases of a simulation model may be compared to
known results of analytic
models, and (2) the simulation model may be compared to other simulation models
that have been validated.
Degenerate Tests: The degeneracy of the model’s behavior is tested by
appropriate selection of values of the input and internal parameters. For
example, does the average number in the queue of a single server continue to
increase with respect to
time when the arrival rate is larger than the service rate?
Event Validity: The “events” of occurrences of the simulation model are
compared to those of the real system to determine if they are similar. An
example of events is deaths in a fire department simulation.
Extreme Condition Tests: The model structure and output should be
plausible for any extreme and unlikely combination of levels of factors in the
system; e.g., if in process inventories are zero, production output should be
zero.
Face Validity: “Face validity” is asking people knowledgeable
about the system whether the model and/or its behavior are reasonable. This
technique can be used in determining if the logic in the conceptual model is
correct and if a model’s
input-output relationships are reasonable.
Fixed Values: Fixed values (e.g., constants) are used for various model
input and internal variables and parameters. This should allow the checking of
model results against easily calculated values.
Historical Data Validation: If historical data exist (or if data are
collected on a system for building or testing the model), part of the data is
used to build the model and the remaining data are used to determine (test)
whether the model behaves
as the system does. (This testing is conducted by driving the simulation model
with either samples from distributions or traces.
Historical Methods: The three historical methods of validation are
rationalism, empiricism, and positive economics. Rationalism assumes that
everyone knows whether the underlying assumptions of a model are true. Logic
deductions are used from these
assumptions to develop the correct (valid) model. Empiricism requires every
assumption and outcome to be empirically validated. Positive economics requires
only that the model be able to predict the future and is not concerned with a
model’s assumptions
or structure (causal relationships or mechanism).
Internal Validity: Several replications (runs) of a stochastic
model are made to determine the amount of (internal) stochastic variability
in the model. A high amount of variability (lack of consistency) may cause the
model’s results to be
questionable and, if typical of the problem entity, may question the
appropriateness of the policy or system being investigated.
Multistage Validation: Naylor and Finger (1967) proposed combining the
three historical methods of rationalism empiricism, and positive economics into
a multistage process of validation. This validation method consists of (1)
developing the model’s
assumptions on theory, observations, general knowledge, and function, (2)
validating the model’s assumptions where possible by empirically testing them,
and (3) comparing (testing) the input-output relationships of the model to the
real system.
Operational Graphics: Values of various performance measures, e.g.,
number in queue and percentage of servers busy, are shown graphically as the
model moves through time; i.e., the dynamic behaviors of performance indicators
are visually displayed
as the simulation model moves through time.
Parameter Variability–Sensitivity Analysis: This technique consists of
changing the values of the input and internal parameters of a model to
determine the effect upon the model’s behavior and its output. The same
relationships should occur in the
model as in the real system. Those parameters that are sensitive, i.e., cause
significant changes in the model’s behavior or output, should be made
sufficiently accurate prior to using the model. (This may require iterations in
model development.)
Predictive Validation: The model is used to predict (forecast) the
system behavior, and then comparisons are made between the system’s behavior
and the model’s forecast to determine if they are the same. The system data may
come from an operational
system or from experiments performed on the system, e.g., field tests.
Traces: The behavior of different types of specific entities in the model
are traced (followed) through the model to determine if the model’s logic is
correct and if the necessary accuracy is obtained.
Turing Tests: People who are knowledgeable about the operations of a
system are asked if they can discriminate between system and model outputs.”
(Sargent, 1999)
References:
Sargent, R., (1999) Validation and Verification of Simulation Models
[Electronic Version], Proceedings of the 1999 Winter Simulation Conference,
Retrieved September 10, 2002, from
http://www.informs-cs.org/wsc99papers/prog99.html
Schlesinger, et al. (1979). Terminology for Model Credibility, Simulation, 32,
3, pp. 103–104.
Sterman, J., (1991). A Skeptics Guide to Computer Models [Electronic Version]
Retrieved April 5, 2002, from
http://sysdyn.mit.edu/sdep/Roadmaps/RM9/D-4101-1.pdf
Sterman, J., (2000) Business Dynamics, Systems Thinking and Modeling for a
Complex World, Boston, Massachusetts: Irwin McGraw-Hill
All the Best,
Mike McDevitt
Senior Analyst & Modeler
CACI Dynamic Systems Inc.
858-695-8220 x1457
address@hidden
.com To: address@hidden
Sent by: cc:
address@hidden Subject: Re: [Swarm-Modelling]
comparing models
warm.org
09/02/2003 01:20
PM
Please respond to
modelling
Andy Cleary writes:
> >So, as to your questions about which techniques are best, just pick a
> >few, do the work, write down the results. Pick a few more, do the
> >work, write down the results. Etc. If a sizable sampling of
> >techniques (e.g. 3 statistical, 2 from feature extraction, 1
> >state-space reconstruction, 2 in signal analysis) all give you a
> >certain result (e.g. model 1 and model 2 lead to the same
> >conclusions), then it may be worth pointing that out to some audience.
>
> I don't disagree with you, but if you tried selling this as "validation" to
> people used to *physics*, you would not get very far.
>
> Or to make it more concrete, *I* have not gotten very far in the same
> circumstances...
Can you give a list of the validation techniques you have used and,
perhaps, a breakdown of which ones were mildly successful and which
ones were definitely not successful?
--
glen e. p. ropella =><= Hail Eris!
H: 503.630.4505 http://www.ropella.net/~gepr
M: 971.219.3846 http://www.tempusdictum.com
_______________________________________________
Modelling mailing list
address@hidden
http://www.swarm.org/mailman/listinfo/modelling