Re: [lmi] Product editor API

lmi
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Product editor API

From:	Greg Chicares
Subject:	Re: [lmi] Product editor API
Date:	Tue, 18 Oct 2005 15:32:36 +0000
User-agent:	Mozilla Thunderbird 1.0.2 (Windows/20050317)
On 2005-10-18 13:15 UTC, Vadim Zeitlin wrote:
> On Tue, 18 Oct 2005 02:46:29 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> On 2005-10-18 0:21 UTC, Vadim Zeitlin wrote:
> GC> > 
> GC> >  I'm designing the API of the MultiDimGrid class (suggestions for better
> GC> > names would be gratefully accepted) which is going to be used in the
> GC> > product editor window and I have a couple of questions:
> GC> > 
> GC> > 1. I suppose we're going to have an arbitrary number N of axis (i.e. not
> GC> >    limited nor fixed) and each axis can take different number of values.

I think I created some confusion by saying 'mutable' without first defining
the intended sense. There are at least three possible senses:

(A) A hardcoded constant changes. Today, a program says
  int const N = 5;
and tomorrow the source changes to
  int const N = 6;
where N might be the number of axes or the number of enumerators for one
particular axis. The design has mutated: the source changed.

(B) A constant read at initialization changes. The number of axes and the
number of enumerators for each axis are read from a configuration file.
The file looks like this today:
  <axes>
    <axis name=fruit>
      <enumerator>apple</enumerator>
      <enumerator>pear</enumerator>
      <enumerator>banana</enumerator>
    </axis>
    <axis name=vegetable>
      <enumerator>carrot</enumerator>
  ...
and tomorrow we add an enumerator for another fruit, or add a new axis
for types of bread. The design has mutated: the configuration file changed,
but the source code didn't. End users would be forbidden to change that
file. Only we could change it. Somehow, lmi would need to handle an axis
or enumerator added or removed this way; the advantage is that the editor's
source code wouldn't need to change.

(C) End users can change anything at run time. We don't want that.

Of these three, (A) would be meet our needs, but would be difficult to
maintain; (B) would be best; and (C) is not wanted at all for lmi. I'm
guessing that (B), the ideal, is practicable.

> GC> lmi needs some small number of axes. Small, but mutable, not fixed.
> 
>  I thought the axis would be fixed but could be disabled/hidden at the GUI
> level. I thought that changing the "cardinality" of the data model on the
> fly probably wouldn't be a good idea. But maybe we could swap just the data
> part of the control (e.g. when another item is selected in the tree).

End users can never add a new axis. They can disable axes that they
don't want to use for a particular entity.

End users can't change the number of enumerators for an axis, with one
important exception: they can change the number of 'Durations'. If an
entity varies by 'Gender' and 'Duration', e.g.,
  Gender: {Female, Male, Unisex}
  Duration: [0..3]
then they can change the upper 'Duration' this way
  Duration: [0..5]
(though they can't change the lower limit: it's always [0...N]), but
they can't change 'Gender':
  Gender: {Female, Other} // Not allowed.

> GC> Each axis can take on various values. The set of possible values for
> GC> each axis is mutable.
> 
>  During run-time? I.e. is it possible for this set to change (because of a
> program action, I realize that the user can't do it himself) while control
> is being used?

End users can change only the upper bound of 'Duration': they can change
only 'N' in [0..N]. The legacy editor permits this.

Explaining the rationale may make the design clearer. I speak of the
'enumerators' for each axis because they really are a fixed set for
all axes except 'Duration'. When data vary by 'Gender', if there are
data values for 'Female', then values for 'Male' are almost certainly
required. If data aren't required for the third 'Gender', which is
'Unisex', then it's not too harsh to require users to fill the 'Unisex'
slice with zeros: that convention greatly simplifies the design.

'Duration' is really a set of consecutive integers always starting with
zero and continuing through some integer N that's variable at run time.
The data model is more uniform if we agree to view this set of integers
as generating N+1 'enumerators': {'0','1',...,'N'}; and indeed we can
generate strings "0", "1" etc. and use those in the program. When I
speak of 'enumerators' abstractly, think of string lists, not C enums.

'Issue Age' is really the constant set of consecutive integers [0..99].
Again, we can think of it as a string list.

When end users run lmi, it's always in the context of a particular
client, who has exactly one 'Gender', exactly one 'Underwriting Class',
and so on. The client has an age, too. The database might have an entity
'Fee' that varies only by 'Gender' and 'Duration':
  'Female': 5 in the first year; 2 in the second; 0 thereafter
  'Male'  : 7 in the first year; 3 in the second; 0 thereafter
Suppose, just for the purpose of this discussion, that we regard ages
past 100 as unattainable. It would be possible to represent that entity
as a matrix
  {
    {5, 2, 0,...,0}, // 100 elements for 'Female'
    {7, 2, 0,...,0}  // 100 elements for 'Male'
  }
but we don't actually do that. Instead, recognizing that almost every
entity that varies by 'Duration' does so for only a few years, becoming
a constant after that few years, we represent it as
  {
    {5, 2, 0}, // 3 elements for 'Female'
    {7, 2, 0}  // 3 elements for 'Male'
  }
where the last element is implicitly duplicated forever.

The way these entities are accessed is through a call like
  std::vector<double> v = QueryDatabase("Male", [other axes], Age);
The length of the vector is always 100-Age. That function is sufficient
for all of lmi's needs: we always want only a vector representing values
only by 'Duration', never a matrix or anything else. Only the database
editor ever addresses the complete entity with all its axes.

> GC> Yet it might be a very good idea to let the program itself set them
> GC> dynamically upon initialization: then we could read axis values from a
> GC> configuration file, and wouldn't need to hardcode them in the source.
> 
>  Sure.

Good. That's (B) above.

> GC> >    The only question is whether these values can always be handled as
> GC> >    strings or is it important to keep type information for them, i.e.
> GC> >    also handle integer (and maybe real/float?) values?
> ...
> GC> Most axes are string-valued. Users would be baffled by integers: they'd
> GC> struggle to remember whether 'Female' is 0, 1, or something else.
> 
>  Of course, we do need to have string values [too]. The question is whether
> we want to have numeric values as well.
> 
> GC> But some axes are naturally integer, e.g., 'Duration'.
> 
>  Or "Age"... In fact this is why I initially thought about ints support:
> using an array of ~100 strings instead of 16..160 (this should be enough
> hopefully...) range doesn't seem very elegant.

Bear with me, please, as I try to reason through this.

Two varieties of elegance contend with each other here.

One variety: the "Issue Age" and "Duration" axes really mean sets of
consecutive integers starting with zero, so it's more elegant to
represent them by the highest number.

Another variety: there are about a half dozen axes, and most really
mean constant lists of strings; "Issue Age" and "Duration" can be
represented as lists of strings, and it's simpler to treat all axes
that same way. Then we may assert that simplicity is elegance,
though, as I think this through, I believe I'll lose this debate.

The first variety does appear to have the greater elegance. Yet, in
the context of actual practice, we find that the "Issue Age" axis is
seldom used, and "Duration" usually has only a small number of values;
so imposing the second variety of elegance (or, at least, simplicity)
does little actual violence.

However, the reason why "Duration" is usually small and "Issue Age"
is rarely used is that lmi has a different way of treating entities
that vary greatly across those axes. They're treated as 'tables'.
It just so happens that our industry has a common xml schema for
such 'tables', and software to support them. Any entity that fits
the 'table' paradigm well is likely to wind up in a 'table'. We use
many standard tables that are published by an international authority
in this common format, and it makes sense to keep using authoritative
tables that way because we don't have to worry about data-conversion
errors.

Yet we store much proprietary data in the same table format. That has
its advantages (we can share tables with other systems that use the
common format), but from the end user's POV it's a disadvantage. What
the end user sees in lmi's editor for such an entity is indexes into
these tables: e.g., female nonsmoker might be table number 123 in
table "mortality_rates", while male smoker might be table 126. The
user has to perform the indirection: make a note that table 123 is
wanted, then load a separate program to display that table. Users
would rather have lmi perform the indirection, and display the table's
contents directly in the editor. That enhancement would be welcomed
by important users.

Does that dispel the arguments in favor of using arrays of strings for
every axis? Well, it does eliminate the argument that the upper bounds
for the integral axes "Duration" and "Issue Age" are likely to be low.
I can still maintain that users are not likely to slice across these
axes in the editor: that viewing 100 durations by 100 ages for female
smokers is more useful than viewing values by gender and smoking
for age 53, duration 12. Yet I hesitate to impose my viewpoint on
users with respect to an enhancement we haven't yet made.

So I think you're right, and thank you for continuing to question this.
Design decisions made years ago have ossified, and it's become hard
for me to think beyond them. But it really is better to use integers
here, and a spin control would be more usable.

> GC> >    So should we handle integer axis specially or just treat everything 
> as
> GC> >    text?
> GC> 
> GC> I'd suggest keeping it simple, for now at least, and just using strings;
> GC> though others on the list are welcome to jump in with different ideas.
> 
>  Ok, so I'll use just strings for the axis values.

Oh...you just talked me out of that.

If just using strings for all axis values makes the task much simpler,
then that's OK, and we can fill two comboboxes with [0..N]. But if it's
not too much harder to let axes be either enumerations (fixed string
lists) or ranges of consecutive integers starting at zero, then that's
really much better.

> GC> > 2. Along the same lines, what about the values in the grid? Here I 
> suppose
> GC> >    we do want to have integers and reals and also strings. Is there
> GC> >    anything else?
[...]
>  To summarize the point (2): we'll allow all standard wxGrid types
> (including strings, ints, doubles, bools) but disallow setting the type for
> each cell individually, instead it will have to be selected for the entire
> grid.

Agreed.

> GC> > 3. I want to separate the data access from the GUI control
> ...
> GC> The only interface lmi needs, aside from the GUI, is read_file() and
> GC> write_file().
> 
>  Still, I think it would be unwise to directly read grid contents in
> read_file() and directly put data in the grid in write_file(). IMO an
> intermediate GUI-independent data structure providing an abstract interface
> is still needed. I.e. instead of
> 
>                  +------+                            +------+
>                  | File | <------------------------> | Grid |
>                  +------+                            +------+
> 
> I'd rather have
> 
>             +------+               +-------+               +------+
>             | File | <-----------> | Model | <-----------> | Grid |
>             +------+               +-------+               +------+
> 
> where "Model" isolates the file routines from the GUI stuff.
> 
> GC> The goal is not to write a general-purpose database that can serve data
> GC> to lmi.
> 
>  No, I understand this. I just propose to isolate the GUI code from this
> database, whatever it is. I think it would facilitate its maintenance
> because it would make it easier to test and to change.

OK, we don't disagree at all. The 'File' and 'Grid' must exist, and
whether there's a 'Model' in between is just an implementation detail
that I'll leave to your discretion. The way you propose to handle
that detail seems obviously correct.
[Prev in Thread]
Current Thread
[Next in Thread]
[lmi] Product editor API, Vadim Zeitlin, 2005/10/17
- Re: [lmi] Product editor API, Greg Chicares, 2005/10/17
  - Re[2]: [lmi] Product editor API, Vadim Zeitlin, 2005/10/18
    - Re: [lmi] Product editor API, Greg Chicares <=
    - Re[2]: [lmi] Product editor API, Vadim Zeitlin, 2005/10/18
    - Re[2]: [lmi] Product editor API: changing the axis dynamically, Vadim Zeitlin, 2005/10/19
    - Re: [lmi] Product editor API: changing the axis dynamically, Greg Chicares, 2005/10/19
Prev by Date: Re[2]: [lmi] Product editor API
Next by Date: Re[2]: [lmi] Product editor API
Previous by thread: Re[2]: [lmi] Product editor API
Next by thread: Re[2]: [lmi] Product editor API
Index(es):
- Date
- Thread