[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Would manual optimization help here?
From: |
Greg Chicares |
Subject: |
Re: [lmi] Would manual optimization help here? |
Date: |
Mon, 29 Mar 2021 21:21:23 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 |
On 3/27/21 8:18 PM, Vadim Zeitlin wrote:
> On Sat, 27 Mar 2021 18:47:49 +0000 Greg Chicares <gchicares@sbcglobal.net>
> wrote:
>
> GC> On 3/26/21 11:01 PM, Vadim Zeitlin wrote:
> GC> > On Fri, 26 Mar 2021 14:59:46 +0000 Greg Chicares
> <gchicares@sbcglobal.net> wrote:
> GC> [... caching a double outside a loop, to avoid indexing a vector inside
> the loop...]
[...snip: caching numerous distinct doubles outside a loop does avoid
vector indexing inside the loop, but indexing is cheap, this strategy
may thrash the cache, and the AoS vs. SoA idea seems likelier to help;
but that would be a bigger task than we can undertake now, so...]
> GC> Maybe someday we'll rework that code. There isn't much of it (it's
> GC> mostly just ihs_acctval.cpp + ihs_avmly.cpp), but that's where lmi
> GC> spends the lion's share of run time.
>
> I have to say that it's not the first time I suffer from cognitive
> dissonance due to the fact that, on one hand, performance is clearly
> supposed to be important for lmi, but OTOH, we don't really do anything to
> improve it, even if it sometimes seems that it wouldn't be that hard to do
It's not that we never do anything to improve it, e.g.:
git show 6620be4ea5589
git show 365cbb477dfdc
git show f2e2cd7b6be5d
but rather that we take our time making some improvements that are
obviously worthwhile. But there's a reason for that.
> it. Of course, I could be wrong about this particular optimization, but
> things like using SSE instead of x87 under MSW really should provide
> important performance gains without even spending much time on it, and yet
> we still haven't done it, even though I must be complaining about this
> since at least a decade.
Tomorrow we could change approximately one line in a makefile and start
releasing 64-bit binaries that would be twice as fast. But there would
be a very large number of regressions, some of which might be frank
errors. Investigating them all would take a great deal of effort.
Instead, here's the plan, which is well underway:
- switch from 'double' to currency for dollars-and-cents amounts,
eliciting an enormous number of regressions
- rework rounding with great care, to make sure it does exactly what
we want, in discrete steps--each step resolves a category of
regressions and improves lmi's accuracy
- now the 64- vs. 32-bit regressions are much less vast, and more
tractable, because 64- and 32-bit currency calculations are
very similar if not identical
At the end, we'll see that flipping the switch from 32 to 64 has
become easy because the difficulties largely arose from defectively
little intermediate rounding. But all that testing has been hard.
The next steps will include
- ferreting out every use of LDBL_EPSILON (as you've been advising
for many years) and replacing each with something better; and
- replacing every other use of 'long double', and replacing all with
code that doesn't use the x87;
with careful testing at each step.
A tangent concerning the cost of testing: here's a tiny commit that
fixed an ancient defect:
git show 6dfcbf61
but required serious effort to test. And this related commit:
git show f2a0fce5
is just about as trivial, but I alone spent two long days doing
nothing but testing it, and I'm not the only tester; and this
long-winded commit message:
git show bcd3f130
(don't read it, just weigh it) is just a TL;DR summary of what I saw
while testing f2a0fce5. Before I begin any testing, I write down what
I expect to observe; in this case, I was repeatedly surprised, because
I hadn't thought through all the ramifications in enough detail. And
I've found important mistakes through this sort of testing.
It was necessary for lmi development to take that tangent, because
the tax law changed. But when I wrote that code around 2001, I didn't
have time to finish it, so I had to take a fresh look, and I saw
things that couldn't be unseen, so I couldn't commit 6dfcbf61 and
f2a0fce5 without stepping back and making things right. Just figuring
out what "right" would be took whole days of thought, because it's
quite a complicated matter.
But, as gothic novels inform us, that's what happens when you open
a door that's long been locked and step into a room that no one has
entered for many years (you might find Miss Havisham's wedding cake).
A couple days ago, you reminded me that you had gone through such a
door, on a particular errand:
> And the MemberSymbolTable speed up patch from
> https://github.com/let-me-illustrate/lmi/pull/114 is more than 6 years old
> by now. I realize that there are many other priorities, but it still
> surprises me that this one is so far down the list.
...so, still recovering from that vaccine, with trepidation, I
followed.
That's a complicated set of classes. In order to review any change
at all, I had to refamiliarize myself with it. It had long ago been
paged out not just to L1 -> L2 -> L3, but to a vault in the Svalbard
of my forgotten memory. It costs a lot of mental energy to reload it
so that I can see all the pieces and their interconnections. Maybe
that's just exceptionally costly for me--I don't know--but, for me
at least, that cost is so great that, once we start changing it, I
want to make everything as good as possible before paging it back out.
As I often do, I studied the changes, then put them aside and tried
to write them myself from scratch, to make sure I understood them,
and also to see what other ideas might occur to me while doing that.
ff0e80aac Remove unused any_member<>::object_ field
I quickly reproduced this one. I'm left wondering why I ever added
the duplicative member. It was used only here:
- LMI_ASSERT(object_);
- return &(object_->*pmd);
and the revision seems almost mechanical, so I'm confident that
it's correct. The only question I had concerns the placement of the
added 'const' here:
+ holder_type* const holder = static_cast<holder_type*>(content_);
Let's see:
T const* p; // cannot modify the object pointed to
T* const q; // cannot make pointer point to another object
Would it be even better to write 'const' everywhere we can?
+ holder_type const*const holder = static_cast<holder_type*>(content_);
7e47a9c3a Make any_member<> movable
This one was much harder for me, because I don't think I've ever
written move functions. I searched for something like what Coplien's
1991 book called the "orthodox canonical form" of special members,
updated for a new century--but was surprised to find none, neither
online nor in my books. The way Ilya did it:
- std::move the base class
- std::exchange the pointer [deleting it in operator=(&&)]
seems to be fairly canonical as of C++14; my only question about
it is whether we should test for self-assignment, or at least
assert that it doesn't happen (and, in any case, I'd want to add
a self-assignment unit test).
But wait--is it possible to avoid writing special members, by
replacing the raw pointer with a std::unique_ptr?
And is swap() just dead code that should be removed?
a6e80e5ef Use a sorted vector instead of std::map<> in MemberSymbolTable
I've been struggling with this one. Obviously std::map isn't the
ideal data structure, but what would be ideal?
As an experiment, I tried just changing std::map to std::unordered_map,
just because it was suggested here:
http://scottmeyers.blogspot.com/2015/09/should-you-be-using-something-instead.html
but that seemed to make things worse. The measurement I used was:
- load that 1691-cell census we've used before
- press Ctrl-+ repeatedly, fast enough that screen updates remain off
- watch the timing on the status line
which is primitive, but doesn't require any code change.
OTOH, applying all commits from PR 114 seemed to reduce that measured
timing by one-half to two-thirds, so there's a demonstrable benefit.
But then consider:
+ auto const i = std::lower_bound(member_names_.begin(),
member_names_.end(), s);
- // TODO ?? This would appear to be O(N^2).
- auto i = std::lower_bound(member_names_.begin(), member_names_.end(), s);
AFAICT, the comment was wrong in that lower_bound is O(log N). However,
isn't insertion into a sorted std::vector O(N)? Have we optimized lookups
at the cost of pessimizing insertions?
Here's how this class is used:
- each client "ascribes" members in its ctor (AFAICS, only there)
- there may be hundreds of "ascribed" members
- they're ascribed in non-sorted order
- "ascription" happens often; lookup, probably much oftener
Wouldn't it be better to
- "ascribe" everything into a container with a cheap push_back()
- then lock "ascription" and sort what has been "ascribed"
- and then look up random elements by string-name, but iterate
with, say, std::vector::iterator?
Would the best structure be two vectors (names and values) both
sorted in the same order? Or might it be simpler to maintain
one vector<pair<name,value>>? I realize we use member_names() often,
but do we use it only to iterate across names, so that in the future
we might not need a separate member_names() accessor (or its
underlying separate vector) at all?
And there's an auxiliary free function member_state() at the bottom
of that file that surely wants a better data structure too.
cdfc8d505 Allow computing MemberSymbolTable member index and reusing it
My first reaction was that the changes in 'census_view.cpp'
seem awfully verbose. My next thought was that we could just add
an operator[](int), with an obvious implementation, and regain the
compactness of CensusView::class_parms_from_class_name().
But then again...instead of
for(int j = 0; j < lmi::ssize(vector); ++j) {whatever();}
we can just iterate across the container without explicit indexing.
Furthermore, while I realize your immediate goal was to make the
census manager faster, there's a lot more we ought to do at the
same time. MemberSymbolTable<>::assign(), just to pick one
obvious example, should use natural iterators instead of
iterating by name. Let me also say again, as I've said before,
that I'm very wary of touching class any_member because it is
central to lmi and used pervasively, so we dare not change it
without plenty of careful testing.
Right now, another external need has arisen, and I have to clear
the decks and take care of it (in the proprietary repository), so
I'm writing this email, turning away from the odd/any_member-opt
branch, and closing this door now that we know what's behind it.
I'd be glad to reopen it later, but first let's make a plan, to
be sure we cover everything that needs doing when we go back in.
As for the AoS vs. SoA thing:
> So, to return to this code, while I'd be glad to look at it (because I
> definitely need to understand it better, before doing anything else) and
> trying to optimize it, I'm just not sure if doing this is going to be
> really useful or if it would just end up being one more thing I annoy you
> about.
...let's close that door and leave it locked for a long time,
because it looks like more work and less profit than redesigning
class any_member.
After working in the proprietary repository, I'll probably
return to the taxation work, because that's nearly done and I
have a plan for finishing it. Surprising as it may seem, what
I've been doing with PETE is a prerequisite for doing that
work cleanly--it's not merely a divertissement.
Then we could choose between
- currency, rounding, LDBL_EPSILON --> 64-bit, and
- any_member, data structures, census manager etc.
Of those two, I'm pretty sure the first one should take
priority because it's known to double lmi's speed.