[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lmi] Contemplating the infinite
From: |
Greg Chicares |
Subject: |
[lmi] Contemplating the infinite |
Date: |
Fri, 25 Jun 2010 01:26:49 +0000 |
User-agent: |
Thunderbird 2.0.0.24 (Windows/20100228) |
http://svn.savannah.gnu.org/viewvc?view=rev&root=lmi&revision=5011
| Use infinity in product files
It's worth mentioning the motivation for this change and describing its
scope and limits. The goal is essentially to write this:
<item>inf</item>
for infinity in product files instead of this:
<item>1797693134...4124858368</item>
where the latter number has over three hundred digits and is the largest
finite double-precision number (DBL_MAX in C).
The concept of infinity naturally arises because some charges apply only
up to a finite maximum--say, two percent of premium up to $1M--while
others apply without limit. Writing the largest representable number is
good enough because it always gives the correct answer; so does infinity.
Hitherto, we've maintained a program that creates product files, but we
want to throw that away and maintain the product files directly, now that
they're xml. A three-hundred-digit number is an unreadable nuisance.
We don't want to tempt anyone to type '1' and then lean on the '0' key
until the result looks "big enough". Just type 'i', 'n', 'f'.
Whenever we write a number in our xml files, we write every digit that
could be significant, the way it's written here:
http://www.treasurydirect.gov/NP/BPDLogin?application=np
because lmi has to deal with some very large numbers that are in fact
precisely determined. We never truncate as in scientific notation.
If we ever need more than double precision offers, then we'll have to
change to a more precise number system.
We could treat DBL_MAX as a special case for input and output, but that
would impose a small speed penalty on every number we write--and we
write many numbers. But the standard library already treats infinity as
a special case, and we pay for that already, so we may as well use it.
Almost all the xml files we read and write now use the same code, so they
work the same way. The xml -> xml.fo -> pdf pipeline for output is an
exception, but infinity should never arise there. Another exception is
'.fund' files: we plan to revamp them next year when we add a fund-
selection GUI, so we needn't spend a lot of time on them now. Similarly,
because the 7702A implementation is soon to be rewritten, 'mec*.?pp' files
have largely been left alone. Normal input files ('.cns', '.ill') do
support infinity, but it's not obvious whether that will prove useful.
If you enter 'inf' in, say, the GUI "Dumpin" field, you'll get a message:
inf is not normalized
You can get around that by using a text editor:
<Dumpin>inf</Dumpin>
but you'll probably get a different error message.
Product files are smaller with 'inf' instead of DBL_MAX. For the eighty-two
products we support (including subplans):
inf DBL_MAX
116 KB 299 KB *.strata
7.46 MB 8.73 MB *.database
Smaller files load faster, and the 'numeric_io_test' unit test measures the
speed of reading 'inf' as opposed to normalized numbers, though regression-
test timings [0] show no significant overall effect on lmi's performance.
Other occurrences of DBL_MAX and std::numeric_limits<double>::max() (which
means the same thing) remain in HEAD. I'm not going to consider changing
them now if they aren't directly related to xml, product files, or the GUI,
particularly in light of this article
http://www.cygnus-software.com/papers/x86andinfinity.html
that cautions about performance problems. It's pointless to count clock
cycles for numeric formatting when you're reading a 300-digit number from
disk: it's a pretty safe bet that reading 'inf' will be faster. But there's
no need to change unrelated numerical calculations now, and good reason to
measure the effect on run time when and if eventually do change them.
There are problems converting 'inf' to and from string when really old
standard libraries are used, but I don't consider it too severe to require
a correct C89 implementation twenty-one years after the standard was issued.
We have unit tests to catch such problems.
---------
[0] "regression-test timings"
no product-file or code changes:
system_test(): 240888 milliseconds
system_test(): 237652 milliseconds
system_test(): 244026 milliseconds
/opt/lmi/data[0]$for z in *.database *.strata; do sed -i $z
-e'/17976/s/17976[0-9]*58368/INF/g'; done
but no code changes:
system_test(): 236891 milliseconds
'inf' code changes but no product-file changes:
system_test(): 243333 milliseconds
system_test(): 241016 milliseconds
both 'inf' code changes and product-file changes:
system_test(): 237957 milliseconds
system_test(): 241477 milliseconds
system_test(): 242895 milliseconds
system_test(): 242523 milliseconds
- [lmi] Contemplating the infinite,
Greg Chicares <=