Re: [lmi] organization of XML actuarial tables

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] organization of XML actuarial tables

From:	Greg Chicares
Subject:	Re: [lmi] organization of XML actuarial tables
Date:	Thu, 26 Apr 2012 01:11:47 +0000
User-agent:	Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2

On 2012-04-24 15:35Z, Václav Slavík wrote:
> Hi,
> 
> there's one thing we didn't discuss about the new actuarial
> tables format: how should the tables be organized into files?

I'll answer your question more directly below, but first I'd like to
enunciate the principle that guides all particular answers: viz.,
we must introduce no error.

> Currently, there are just a few files (qx_cso.dat, qx_ins.dat, ...)
> with lots of tables in each of them; the tables are referenced by
> both filenames and table numbers in .database files. In the public
> data files, table numbers are globally unique, two different .dat
> files don't define the same table; I assume this is always the case.

It's not. Table numbers must be unique in any given '.dat' file, and
they are unique across all '.dat' files from SOA (their subdivision
between "qx_cso" and "qx_ins" is artificial and needless); but we
have a sizable non-public '.dat' file whose table numbers do collide
with table numbers in the public files.

> Also, multi-dimensional tables are represented as a series of simple
> tables in the SOA data files, plus an index in our .database file
> (DB_CurrCoiTable being a typical example; this is something we want
> to get rid of, replacing it with proper multi-dimensional tables).

That's the way SOA designed their format. In an actuary's operative
reality, "1980 CSO tables, age nearest birthday" is a gestalt; but
SOA stores it as nine slices, which aren't distinguished structurally
from slices of 1941 tables. Every time we want to use that 1980 table,
we have to code all nine slices; e.g., a typical product might have:

    double T7702COI[9] =
        {
         39,  37,  35,  // female   sm, ns, us
         45,  57,  41,  // male     sm, ns, us
        111, 109, 107,  // unisex   sm, ns, us
        };
    Add(database_entity(DB_Irc7702QTable, NDIMS, dims7702, T7702COI));

We want to get rid of that--to name the applicable gestalt instead
of specifying the way SOA sliced it up. One opportunity for a gross
but noticeable error beats nine opportunities for opaque errors.

> How should we organize XML files for the tables? Should I preserve this
> bundling of multiple tables into a single file? From an outsider's point
> of view, having one XML file per one table makes most sense — with multi-
> dimension tables such DB_CurrCoiTable put into single file and with the
> files named descriptively, rather than by a number.

We should do what makes the most sense from your "outsider's" POV.
You understand the data's meaning, because you thought to ask about
the problem domain. I suppose the SOA format's designers knew too
little about the solution domain and didn't think to ask about it.

> Assuming we want to use one file per table approach, there's also the
> issue of getting reasonable names and converting multiple SOA tables
> into a single multi-dimensional one. I don't think this can be
> automated easily, can it?

It cannot be automated. This is a matter of restoring structure that
was obliterated by fragmenting the gestalts.

> Would it be reasonable to do the conversion in two phases like this:
> 
> (1) Convert SOA tables to simple XML files, without multi-dimensional
> tables and using SOA table numbers, e.g. "35.table" or "qx_ins.35.table"
> (if the .dat file matters and multiple sources for the same table number
> may be used on the same machine). Remove binary SOA loader code. Nothing
> would change for the users, the UI would remain the same, only the data
> files would be replaced.
> 
> (2) Clean up the data files later: remove unneeded tables (?), merge
> multi-dimensional ones, change references to tables into .table file
> references and get rid of table numbers.

The guiding principle of ensuring freedom from error enters here.
I'm sure error is more likely if we perform the work in two steps.

> ? Or would it be less burden to do everything — including step (2)
> which involves updates to user data files — at once?

We should make the change all at once. (BTW, end users don't have
their own mortality-table files--we furnish them, so they're all
under our direct control. And they may occasionally change a table
number in the database, but that'll happen approximately zero times
in 2012.) Using the example above:

    double T7702COI[9] =
        {
         39,  37,  35,  // female   sm, ns, us
         45,  57,  41,  // male     sm, ns, us
        111, 109, 107,  // unisex   sm, ns, us
        };
    Add(database_entity(DB_Irc7702QTable, NDIMS, dims7702, T7702COI));

we shouldn't go through an intermediate phase like this:

    char* T7702COI[9] =
        {
         "table_39",  "table_37",  "table_35",  // female   sm, ns, us
...
        };
    Add(database_entity(DB_Irc7702QTable, NDIMS, dims7702, T7702COI));

(with collateral intermediate changes to the code that reads and writes
'database_entity' objects) when we ultimately want something like:

    Add(database_entity(DB_Irc7702QTable, "CSO1980ByGenderAndSmoker"));

The crucial motivation to make errors less likely; I think a one-step
approach will also save some work, but that's just a bonus. It's really
hard to adhere to a zero-defects standard when working on intermediate
changes that we know we're going to replace.

I'm thinking we should tackle this as follows. Write the new code that
will be added to the lmi repository, and unit test it, leaving the old
code in place. Then, wherever we call the old code, also call the new
code, and compare the results to make sure they're perfectly identical.
When all tests pass, we remove that validation scaffolding and the old
code, and just call the new code.

The difficult part of this strategy is ensuring complete coverage.
A query into a select table retrieves only a subset of the rates, but
we want to test them all. I'll have to give that more thought.

[Prev in Thread]

Current Thread

[Next in Thread]

[lmi] organization of XML actuarial tables, Václav Slavík, 2012/04/24
- Re: [lmi] organization of XML actuarial tables, Greg Chicares <=
  - Re: [lmi] organization of XML actuarial tables, Václav Slavík, 2012/04/26
    - Re: [lmi] organization of XML actuarial tables, Greg Chicares, 2012/04/26
    - Re: [lmi] organization of XML actuarial tables, Václav Slavík, 2012/04/27

Prev by Date: [lmi] organization of XML actuarial tables
Next by Date: Re: [lmi] organization of XML actuarial tables
Previous by thread: [lmi] organization of XML actuarial tables
Next by thread: Re: [lmi] organization of XML actuarial tables
Index(es):
- Date
- Thread