lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] actuarial tables format (was Re: Terse list of valuable projec


From: Greg Chicares
Subject: Re: [lmi] actuarial tables format (was Re: Terse list of valuable projects)
Date: Wed, 21 Mar 2012 20:57:46 +0000
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2

On 2012-03-21 18:24Z, Václav Slavík wrote:
> 
> On 12 Mar 2012, at 18:09, Greg Chicares wrote:
>> Today, we're using only those two axes. Going forward, we want to add
>> other axes; additional axes are limited to those already used in
>> 'dbindex.hpp'.
> ...
>> Many of these tables are shared among different "products". Some are quite
>> large, so we'd want to avoid duplicating identical tables. Thus, in the
>> "TgCOI" example above, 39, 37, etc. are pointers to tables; instead, we'd
>> want a single pointer to a single table that has more dimensions.
> 
> Would a format along the lines of the following [1] suit you?

Yes. (As your footnote points out, we don't yet need to choose final names
for the axes.)

> For a simple 1D table:
> 
> <table>
>     <age min="31" max="33">
>         <value>0.198</value>
>         <value>0.194</value>
>         <value>0.190</value>
>     </age>
> </table>

Perfect.

> For 2D select table:
> 
> <table>
>     <select period="3">
>         <age min="31" max="33">
>             <value>0.198</value>
>             <value>0.194</value>
>             <value>0.190</value>
>         </age>
>         <age min="31" max="33">
>             <value>0.198</value>
>             <value>0.194</value>
>             <value>0.190</value>
>         </age>
>         <age min="31" max="33">
>             <value>0.198</value>
>             <value>0.194</value>
>             <value>0.190</value>
>         </age>
>     </select>
> </table>
> 
> Notice that I make the age/duration the inner-most axis, I think that's
> consistent with sample.database and is generally the most reasonable;
> please let me know if I got this wrong.

I'm not quite sure, because the <value> triplets are all the same.
Let me answer in terms of table_256() in 'actuarial_table_test.cpp':

        //     1        2        3
        {0.00106 ,0.00140 ,0.00165 // 10
        ,0.00113 ,0.00148 ,0.00175 // 11
        ,0.00120 ,0.00157 ,0.00184 // 12
...
        ,0.06520 ,0.10486 ,0.13557 // 80

There, the rows are primary, and columns are secondary: that is,
an actuary who looks up values in this table would select a single
row (ignoring all other rows) and read values across the columns.
If a 12-year-old person is to be insured, we want this row vector:
        ,0.00120 ,0.00157 ,0.00184 // 12
OTOH, a column vector such as this:
              2
        0.00140
        0.00148
        0.00157
is never interesting. So I suppose this example might be written thus
(with some xml comments referring to class actuarial_table):

<table>
    <select period="3"/>           <!-- NOTE 1 -->
    <age min="10" max="80">        <!-- min_age(), max_select_age() -->
        <age_index="10">           <!-- NOTE 2 -->
            <value>0.00106</value>
            <value>0.00140</value>
            <value>0.00165</value>
        </age_index>
        <age_index="11">
            <value>0.00113</value>
            <value>0.00148</value>
            <value>0.00175</value>
        </age_index>
    ...

NOTE 1: instead of
  <table>
      <select period="3"/>
would
  <table type="select" select_period="3">
be preferable? In the public Society of Actuaries interface, it's this:
  ///   3    [unsigned] char: Table type: {A, D, S} --> {age, duration, select}
which is a property of the whole table.

[It became clear to me later that we're using "select" in two
different ways; but by then I had already written most of this
email, and balked at rewriting the whole thing...]

NOTE 2: I'm not sure <age_index> is the best way to express this,
because it seems that redundancy should be avoided:
    <age min="10" max="80">  <!-- This says the first age is "10",   -->
        <age_index="10">     <!-- so repeating "10" here seems wrong -->
(and <age_index="10"> doesn't look like good xml to me anyway),
but repeating the bounds each time
    <age min="10" max="80">
for age 10, 11, 12, ... 80 doesn't seem right to me either. For a given
row in the two-dimensional table, the age is bound to a value. If you
give me the 2-D table above and ask me someone's probability of death,
I have to ask "what age?"; if you say the age is 12, then I can say
the probability is 0.00120 ,0.00157 ,0.00184 for this year, next year,
and the year after that. (When replying that way, I don't consider
whether the age next year, 13, lies between some minimum and some
maximum--I know it must, or else there could be no location for
0.00157 in the table.)

Perhaps a textbook reference will be clearer, such as page 24 here:
  http://www.math.purdue.edu/~rcp/STAT472/Notes/select.pdf

> Another thing is that I prefer to make the axis name part of the markup
> (rather than using e.g. <axis type="select">). That's for validation:
> the latter couldn't have select-specific validation in XML Schema
> (analogous to the "version" attribute in the other schemas we discussed
> recently).

Yes, amenability to validation is an important design criterion.

The word "select" maps to multiple concepts in the solution domain.
A table can be "select" in the sense mentioned above:
  ///   3    [unsigned] char: Table type: {A, D, S} --> {age, duration, select}
and a row of that table such as
        //     1        2        3
        ,0.00120 ,0.00157 ,0.00184 // 12  <-- THIS ROW
gives rates for "select" age twelve (I called it "age_index" above,
but "select_age" would have been more descriptive; and now I guess
your <select> meant exactly that).

Given these two types of tables:
  aggregate tables, which vary by "attained" age (or duration) only
  select tables,    which vary by "select" age and duration
is it important to use a common name for both "age" axes, or
is it better to use different names in the markup? (There's a
third type, select-and-ultimate tables, but I hesitate to
introduce that complication at this point.) I think your answer
to that question is that it's better to distinguish the names;
otherwise, we have nothing more than Axis1, Axis2, ... and
xml attributes to say what the Axes mean, and that's weak xml.
I certainly agree (if I've understood your answer correctly).

> Consequently, the order in which the axes are nested matters,
> you could have <age> at the higher level and <select> under it.
> I think that's A Good Thing, even for additional axes.

Yes.

Just as an example with tags that would be meaningful to an
actuary, a select table might have major axis "select_age"
and minor axis "duration"; an attained-age table might have
a unique axis "attained_age"; and a durational table might
have a unique axis "duration". If it's troubling to use
"duration" for two different kinds of axes, then the minor
axis in a select table can be called "select_duration".

> Speaking of which, this is what a TgCOI-like larger table would look:
> 
> <table>
>     <smoker>
>         <item for="smoker">
>             <gender>
>                 <item for="female">
>                     <select period="2">
>                         <age min="31" max="33">
>                             <value>0.198</value>
>                             <value>0.194</value>
>                             <value>0.190</value>
>                         </age>
>                         <age min="31" max="33">...likewise...</age>
>                     </select>
>                 </item>
>                 <item for="male">...likewise...</item>
>                 <item for="unisex">...likewise...</item>
>             </gender>
>         </item>
>         <item for="nonsmoker">
>             <gender>
>                 <item for="female">
>                     <select period="2">
>                         <age min="31" max="33">...likewise...</age>
>                         <age min="31" max="33">...likewise...</age>
>                     </select>
>                 </item>
>                 <item for="male">...likewise...</item>
>                 <item for="unisex">...likewise...</item>
>             </gender>
>         </item>
>     </smoker>
> </table>
> 
> Notice that I didn't use <item for> under <select> in my previous example,
> because <select>'s child are well-defined: <select period="N"> has N child 
> <age>
> (or <duration>?) elements for sequential select values of 1, 2, ..., N.

Yes. In actuarial English: a select table has M rows, one for every
age from the minimum age to the maximum age, inclusive. And each of
those rows has N values, where N is what we call the "select period".
You ask whether the N values are "age" or "duration"; conventionally
we'd say "duration", although "age" isn't quite erroneous. In the
age-12 example above, the right-hand comments give a column vector
that we'd call "select age", and the top comment gives a column
vector we'd call "duration":

        //     1        2        3
        {0.00106 ,0.00140 ,0.00165 // 10
        ,0.00113 ,0.00148 ,0.00175 // 11
        ,0.00120 ,0.00157 ,0.00184 // 12

And there'a a special actuarial notation for the second column of
the third row: 0.00157 is the value for "[12] + 1".
  0.00120 is for [12] + 0
  0.00157 is for [12] + 1
  0.00184 is for [12] + 2
I.e., if a person age 12 is issued an insurance policy today and
is still alive two years from today, then the probability of being
dead three years from today is 0.184%. It's not incorrect to say
that 0.00157 is a rate for a thirteen-year-old who was insured
one year ago; but we'd always notate that as [12] + 1, so it is
clearer under that convention to think of "+ 1" as the duration
rather than 13 as the "attained" age.

> Likewise for <value>s under <age>, the min and max attributes describe them 
> sufficiently.

Yes. We choose not to contemplate "compact" tables like this:
  duration [0,5) : value 1.234
  duration [5,10): value 2.345
A table really might have values that happen to change only
quinquennially, but we'll specify each value:
  0 1.234
  1 1.234
  ...
choosing verbose simplicity over "compactness".

> The order and enumeration values for the "smoker" or "gender" axes,
> on the other hand, are not so clear [2]. And even if there is some
> universally used, commonly understood order in the field,

There is not.

> the order in the states enumeration is simply unrememberable and no
> human reader could make sense of it when studying a sub-table in the
> middle of that long list.

Yes. One...two...three...four...five...six: I think that's Connecticut.
Except some insurance companies count it as the seventh. It's really
unthinkable to page down through enough elements to reach Wyoming.

> For that reason, I prefer explicitness here: use <item> with the
> enumeration value to which the subtable under it applies.

Yes.

> This has the side-effect of making it possible to omit some values
> (e.g. have a table for only male & female data). Is that desirable?

Yes, that would often be useful.

> What do you think? Did I miss something important?

Yes: select-and-ultimate tables. Maybe I should ask you to look at
the textbook reference above first; it probably explains this more
clearly than I would. The lmi function table_256() cited above
gives an example.

> [2] I'm sure the values of "gender" are obvious to you and the values
> "unisex" or "unismoker" only surprise outsiders and only briefly.

Yes. A "unisex" hairdresser serves both male and female clients.
"Unisex" fashions are worn by members of both genders. "Unisex"
insurance rates are the same for male and female, as opposed to
gender-distinct rate tables. Mortality rates are generally
distinguished by tobacco use, smoker rates being higher than
nonsmoker; rates that aren't so distinguished are, by analogy
with "unisex", called "unismoke".



reply via email to

[Prev in Thread] Current Thread [Next in Thread]