lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Optimizing opening big census files


From: Vadim Zeitlin
Subject: [lmi] Optimizing opening big census files
Date: Sun, 30 Nov 2014 22:25:24 +0100

 Hello,

 While looking into optimizing census updates, I also tried optimizing
loading the (big) census in the first place, as it was taking annoyingly
long. The first thing I did was to make a small patch showing the time
taken by loading, see 0001-Show-time-taken-by-loading-the-census.patch.
This patch is not supposed to be applied to LMI, but included in case you
want to benchmark this code yourself, as it could be useful then. Just to
give some concrete numbers, current unmodified version of LMI from trunk
built with gcc 3.4 shows that it takes 14.0 seconds to load the big census
here (this is the time of opening it for the second (and consecutive) time,
it takes slightly longer the first time, but I consistently opened the
census 5 times and took the least of them for all tests). Also, just for
comparison, MSVC 12 LMI build does it in 11.5 seconds.

 Second, I profiled the census loading code. And got a surprise (which, of
course, in itself, was quite reassuring, as everybody knows you're supposed
to be surprised by profiling results): I expected XML parsing to take
significant amount of time, but it turned out that it took only ~10% of
loading time. Practically all the rest is spent constructing Input objects,
i.e. inside push_back() and so in Input copy ctor where it's shared between
AscribeMembers() and DoAdaptExternalities(). Hence I decided to concentrate
my efforts on this code.

 Third, I quickly lost all hope of significantly optimizing it. Normally,
it should be possible to reduce the loading time by 50% (with some really
dirty hacks I could achieve sub-5s times in MSVC version), if only we could
avoid redoing the same initialization of Input again and again. E.g. we
definitely shouldn't be creating the product_database object 4000+ times
and read and parse the same XML product file the same number of times. And
it looks like it should be possible to just copy the map/vector constructed
by AscribeMembers() once instead of painstakingly reinitializing them
several thousands times. But this is not the kind of micro-optimizations I
had in mind when starting to look at this code, changing this would require
some non-trivial modifications to the code and I suspect you don't want to
embark on such project right now. If I'm mistaken about this and you would
like to [try to] optimize this code, please let me know, I do believe it
should be possible to improve it significantly.


 Finally, to avoid writing down all the time spent on this as a complete
loss, I did find two micro-optimizations which help at least a little,
please see the attached patches. They are completely independent and 0, 1
or 2 of them could be applied. The first one of them (patch number 2) just
avoids the slow istream_to_string() call. While the comment claims that
this is the most efficient solution, it's definitely less efficient than
avoiding copying the file data (~40MiB) at all, as the patch does. Applying
it brings down the time from 14.0 to 13.0 (the ".0" are suspicious, but
this is just what I measured) for me and I don't see any reason not to use
the more efficient xmlwrapp function.

 The last patch (number 3) gets rid of more copying, this time in Input
copy ctor. As I wrote above, the really expensive part is not the copying
part of copy ctor but the part that it shares with the default ctor. Still,
extra copying never helps and applying this patch reduces the time to 12.0
(yes, another ".0", all I can say is that the numbers for MSVC are 11.5,
10.9 and 10.1, i.e. comfortingly less round while still showing the same
general tendency, so hopefully gcc results are not completely false). This
patch does have one small drawback: the progress shown in the status bar is
visibly not uniform now as there is a long delay after the case and class
defaults are read while the 4000+ Input objects are being default
initialized and so the progress remains stuck at "2 cells". But I think
it's still worth applying as slightly less accurate progress indicator is
arguably less important than (even if only slightly) better loading speed.


 Please let me know if you have any questions about these patches and,
especially, if you'd like me to continue looking into optimizing Input
construction code.

 Thanks in advance,
VZ

Attachment: 0001-Show-time-taken-by-loading-the-census.patch
Description: Text document

Attachment: 0002-Parse-XML-file-directly-without-reading-it-into-a-st.patch
Description: Text document

Attachment: 0003-Parse-XML-into-Input-in-place-in-multiple_cell_docum.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]