[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Big census [Was: Census-manager Update()]
From: |
Greg Chicares |
Subject: |
Re: [lmi] Big census [Was: Census-manager Update()] |
Date: |
Sun, 02 Nov 2014 03:18:31 +0000 |
User-agent: |
Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 |
On 2014-11-02 01:40Z, Greg Chicares wrote:
>
> By private email I'll send you a case of similar size that includes
> no proprietary or personal information.
Here's another experimental use for that big census. [The results I
report below are for an actual production case of almost exactly the
same size, but that probably doesn't make much difference.]
While I was cleaning up some really old code in the census manager,
I came across this:
// cell_parms().swap(expurgated_cell_parms); // TODO ?? Would this be better?
cell_parms() = expurgated_cell_parms;
At a quick glance, yes, obviously swap() must be better...I guess.
But before committing such a change, I decided to measure it.
Try deleting one cell at a time, repeatedly. Here are the timings
I got, in milliseconds, for: operator=() ; swap as given in the
alternative above; and then--grasping at straws because swap()
did poorly--swap() followed by calling
expurgated_cell_parms.resize(0);
. (The only reason I tried resize() is that the (not so useful)
"Mem Usage" reported by the msw-xp "task manager" went way up
with swap(); it shouldn't really make a difference, because that
vector soon goes out of scope and gets destroyed.)
swap +
assign swap resize
39793 45055 36759
38946 108912 82720
40189 58270 46892
40065 76559
39712 45839
47603 74584
77469
I didn't perform a uniform number of trials each way because
40000 msec is long enough to be irksome. Maybe there aren't
enough trials, but it does appear that using swap() increases
the variance quite a bit, without decreasing the mean.
But what's really important is that either way takes too long,
so I tried the patch below[0] and observed:
Same test as above:
6262
6275
6234
6241
"Harsh" test: deleted ten non-contiguous cells near beginning
46973
47217
46783
The speed is comparable even with the "harsh" test, and an order
of magnitude better for single-cell deletion. But there must exist
an algorithm that delivers about the same performance for the
"harsh" test as for the simple case (because my patch calls
std::vector::erase() in a loop).
Searching online, I found that others had the same problem: that
the "erase-remove" idiom seems like a good fit, but what we're
dealing with here is indexes, not elements:
http://stackoverflow.com/questions/4115279/most-efficient-way-of-erasing-deleting-multiple-stdvector-elements-while-retai
http://stackoverflow.com/questions/6609547/erasing-elements-in-stlvector-by-using-indexes/21148808
This seems like a better idea:
http://www.drdobbs.com/cpp/object-swapping-part-2-algorithms-can-of/232602979
Sadly for me, I haven't the time to work on this any more, but
maybe you can. As a first step, I'd like to commit the patch
below[0] unless you see something incorrect in it. I'm really
starting to think, though, that maybe we should have a container
of smart pointers instead--but that's a big step, and I can't
make time to consider doing it myself.
---------
[0] "so I tried the patch below":
Index: census_view.cpp
===================================================================
--- census_view.cpp (revision 6018)
+++ census_view.cpp (working copy)
@@ -1491,23 +1491,12 @@
LMI_ASSERT(cell_parms().size() == n_items);
- std::vector<Input> expurgated_cell_parms;
- expurgated_cell_parms.reserve
- (n_items - n_sel_items
- );
-
- for(unsigned int j = 0; j < cell_parms().size(); ++j)
+ for(int j = erasures.size() - 1; 0 <= j; --j)
{
- if(!contains(erasures, j))
- {
- expurgated_cell_parms.push_back(cell_parms()[j]);
- }
+ cell_parms().erase(erasures[j] + cell_parms().begin());
}
- LMI_ASSERT(expurgated_cell_parms.size() == n_items - n_sel_items);
+ LMI_ASSERT(cell_parms().size() == n_items - n_sel_items);
-// cell_parms().swap(expurgated_cell_parms); // TODO ?? Would this be
better?
- cell_parms() = expurgated_cell_parms;
-
#if !wxCHECK_VERSION(2,9,3)
// Remove selection to work around wx-2.9.2 bug in GetSelections()
// (we'll set it again below).