[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] [lmi-commits] valyuta/000 e9efb62: See 'README.branch'
From: |
Greg Chicares |
Subject: |
Re: [lmi] [lmi-commits] valyuta/000 e9efb62: See 'README.branch' |
Date: |
Sat, 15 Aug 2020 16:48:30 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 |
[Resending because the original bounced--see:
https://savannah.nongnu.org/support/index.php?110291
which is thought to have been resolved.]
On 2020-08-14 23:54, Greg Chicares wrote:
> On 2020-08-14 15:43, Vadim Zeitlin wrote:
>> On Tue, 11 Aug 2020 18:08:44 -0400 (EDT) Greg Chicares
>> <gchicares@sbcglobal.net> wrote:
>>
>> GC> branch: valyuta/000
>> GC> commit e9efb62472281faac7965276adc1eb148c20fc4a
>> GC> Author: Gregory W. Chicares <gchicares@sbcglobal.net>
>> GC> Commit: Gregory W. Chicares <gchicares@sbcglobal.net>
>> GC>
>> GC> See 'README.branch'
>
> Let me begin by thanking you for what I've recently renamed as 'monnaie'.
> On this branch, I tried to create a tiny version that I could twiddle
> easily, just for experimentation, and it was very convenient to refer to
> your well-tested implementation. (It didn't stay tiny, but that's the way
> experimentation goes.)
>
>> I'm not sure if I'm even supposed to look at it already,
>
> Not really. It's exploratory, throwaway work. I wanted to move it aside,
> but I didn't want to expunge it yet.
>
>> but after doing
>> this quickly, I have a couple of contradictory remarks:
>>
>> First, I'm really surprised that the version using the currency class runs
>> twice slower than the version using doubles.
>
> It deliberately sacrifices efficiency in favor of development speed,
> because I'm still trying to understand the problem precisely, and to
> envision a solution.
>
> Notionally, of course, we work with integral cents, and perform
> rounding where necessary to stay in the integer domain. For example,
> if you buy a jacket for $69.95, and the sales tax rate is 6.35%,
> the $4.441825 tax is rounded in a way prescribed by regulation,
> which in Connecticut seems to be what std::round() notionally does
> (rounding to nearest, and resolving halfway cases away from zero),
> resulting in a total bill of 69.95 + 4.44 = $74.39 .
>
> But that's not actually what std::round() actually does. Instead,
> working with DECIMAL_DIG precision:
>
> 00 0000000111111111122 <-- count of significant digits
> 12 3456789012345678901 <-- 21 = DECIMAL_DIG
> 74.3918249999999972033 raw value
> 74.3900000000000005684 result of std::round()
>
> Notionally, we have 74 and 39/100 dollars, or 7439 cents.
> Physically, we have some exact 53-bit binary mantissa that's
> quite close to some 16-significant digit decimal number, but
> doesn't equal 7439/100: it has a tiny amount of "dust". Then,
> for a life insurance illustration, we crank it through a hundred
> years of monthly accumulation, and it comes out very "dusty": the
> dust can easily exceed one cent, and maybe quite a few dollars,
> which affects accuracy.
>
> Or we can round at each step, and still store floating-point
> dollars. That means spending time on each step to remove the
> dust, so that (in theory at least) it can't build up. Then
> we're storing 74.3900000000000005684, which is notionally
> "rounded to integral cents" yet does not exactly represent
> any integral number of cents. Even if we round so fastidiously
> and often that accuracy is unaffected, we still have a value
> that might come out either slightly higher or slightly lower,
> and AIUI compilers are still allowed to choose either the
> upper or lower neighbor, and to choose a different neighbor
> when any line of code changes. We have 140 MB of regression-
> test results that we perpetually compare after each code
> change, and often they drift--sometimes, even if we only
> change a single comment in the source. Even with automated
> analysis that filters out the least significant regressions,
> it's still a laborious job. Why not just store 7439 cents?
> Then if that changes, say, to 7441, we have a real difference
> that's either valid or invalid, and we can decide which.
>
> Even if we store 7439 cents, how do we later multiply it by
> another floating-point factor--say, a merchant's discount
> of 2% for paying cash? Do we first convert it to double
> (74.3900000000000005684), manipulate it as dollars, and
> then multiply by one hundred and round? Probably we redo the
> calculation from scratch:
>
> 69.950000000000002842171 dusty
> * 0.979999999999999982236 dusty^2
> * 1.063499999999999889866 dusty^3
> = 72.903988499999996975021
> = 7290 cents (but removing that dust was costly)
>
> And what actually happens when we add notional integers? If I have
> $1.31 and you give me a 25¢ coin, two 5¢ coins, and three 1¢ coins,
> we certainly don't want to convert each of
> {$1.31, $0.25, 2 * $0.05, 3 * $0.01}
> to a slightly inexact double, add them together, and convert back
> to one of the representable neighbors of 1.68 . Accuracy aside,
> that's a lot of costly conversions.
>
> Now, it may seem ideal to store everything as a fixed-point decimal
> number, e.g., so that the tax rate is 635/10000: IOW, use generic
> fixed-point type decimal<N>, so the tax rate is fixed<10000>{635}.
> That would work tidily for many of the values that we read from
> database tables, although it would require considerable effort.
> But in practice it doesn't help much, because sometimes we still
> have to multiply by annuity factors that involve life contingencies
> as well as interest (they're the roots of polynomials of degree
> <= 1200, which even if rational are unlikely to be exact decimals).
>
> But let's step back and reconsider. "Universal life insurance" has
> an "account value" that's just a generalization of a bank account.
> It always contains an exact integral number of cents, just because
> that's the way accounting works. Increments and decrements are
> rounded before they're applied, so that the net amount remains
> integral. Our problem is that, following common practice, we think
> of these amounts as dollars with two decimals. So isn't the solution
> just to hold these values as integral cents? As long as we've done
> all the rounding correctly, we'll get the right answer in the end;
> and comparing regression-test results will be easy if we've stored
> integral (accounting) values as exact integers.
>
> IOW, Japan gets this right (since they took the sen and the rin
> out of circulation).
>
> And it doesn't matter if we store 7290 cents as
> (double)(7290.0)
> or
> (int64_t)(7290)
> because they're both exact. There are actually good arguments for
> preferring integer-valued floating point:
> - a "composite" is a linear combination of individual illustrations,
> where the weights needn't be integral;
> - a composite may reflect "partial mortality", where the weights
> vary by duration and are definitely non-integral; and
> - illustrations can be "scaled" (see Ledger::AutoScale()) so that
> they print thousand or millions instead of single dollars.
> But that design decision is easily deferred by making it a typedef.
> We might even find that one is significantly faster than the other,
> and I wouldn't be surprised if the silicon turns out to favor floating
> point because it's been optimized to improve graphics performance.
>
> But that's all just theoretical and speculative. The harsh reality is
> that rounding had never been performed in a handful of cases (which
> this branch led me to discover, and fix on the trunk). And now we have
> all this weird code like
> return round_max_premium()(ldbl_eps_plus_one_times(temp * a_specamt /
> a_mode));
> that has to be sorted out before we can put a 64-bit lmi into
> production, because each such instance was intended to solve an
> actual problem that we don't want to come crawling out of its grave.
> (See also lmi commit f97235aed, its comments, and its commit message.)
>
>> With an optimizing compiler,
>> abstraction penalty should be at most a few percents and I'd expect it to
>> be more than compensated by the gain of speed due to using integers instead
>> of doubles.
>
> (I'm agnostic on the question whether integers are faster than doubles.
> I did read some online discussions, but they were long on conjectures
> and short on facts.)
>
>> Would it be worth trying to profile both versions to see where
>> does this slowdown come from?
>
> No. I introduced lots of silly conversions on that branch, knowing that
> they'd be prohibitively costly. I even changed the speed test beforehand
> so that I could measure the cost. Furthermore, I used the currency class
> for only a representative handful of variables, so when I need to add
> one of that handful of variables to another outside that handful, I get
> a gratuitous and costly conversion.
>
>> Of course, in the grand scheme of things we
>> know that to achieve anything close to the maximal theoretical performance
>> we need to explore parallelism, either by using threads, or by using SIMD,
>> or, ideally, both of them, and no amount of single-threaded code
>> optimizations can result in anything comparable, but perhaps we could still
>> at least avoid making things slower.
>
> Noted. That's outside the scope of what I'm trying to achieve here.
> What I really want to do is migrate from 32- to 64-bit (hence, from
> x87 to sse) without any surprises (e.g., rectifying all those dubious
> ldbl_eps_plus_one_times() calls)...changing from fractional dollars
> to integral cents along the way, and also making regression-test
> results more stable and easier to check.
>
>> Second, and going in a completely opposite direction, I'm also somewhat
>> surprised that you didn't use this opportunity to add checks for integer
>> overflow to the various operations. This is, of course, very unlikely to
>> happen, as 64-bit integers can represent amounts up to ~92 quadrillions,
>> but I half-expected you to still care about this possibility.
>
> That's another reason to prefer integer-valued doubles to int64_t.
>