Re: [lmi] testing actuarial

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] testing actuarial_table performance

From:	Greg Chicares
Subject:	Re: [lmi] testing actuarial_table performance
Date:	Mon, 11 Jun 2012 15:39:35 +0000
User-agent:	Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20120312 Thunderbird/11.0

On 2012-05-31 15:38Z, Václav Slavík wrote:
> 
>>> We could still cache the data on disk e.g. as a memory-mapped dump if the
>>> performance proves to be a problem.
>> 
>> I hesitate to venture too far beyond ISO-standard C++ or to add dependencies
>> on more libraries (such as boost::mapped_file). All lmi calculations (not
>> just the server mentioned above) are available through command-line programs
>> that don't depend on wx, so I don't want to introduce a dependency on a wx
>> implementation here.
> 
> The important thing is to use something easily and quickly written and read,
> it doesn't have to be memory mapping (come to think of it, it wouldn't play
> well with using std:: containers anyway). Any kind of a binary dump would
> do — and because it would be just a cache, we could change the format at will,
> while still keeping the advantages of having editable original format.

Okay.

>> Any OS we'd target probably uses memory-mapped IO for its process loader, so
>> couldn't we secure this advantage by compiling tables into a shared library,
>> and let the OS do memory-mapped IO implicitly? If that's a good approach,
> 
> I think it's more trouble than it's worth. Such a DLL must be maintained,
> shipped, kept up to date. An on-disk cache has the crucial advantage that
> it works mostly automatically.

Okay.

I was thinking that a putting tables in a DLL might make the program start
up faster, because no table would need to be parsed from xml. OTOH, it might
make startup slower, because all tables would be read, whether they're
needed or not. Either way, we could gain those advantages and disadvantages
by preloading every table into a cache file included in our distribution.
And a cache would be preferable, anyway, in case any user modifies a table.

>>> Or at least automate the translation
>>> of XML actuarial table into corresponding C++ code ("hard coded table")...
>> 
>> If we do this, and put the generated code in a shared library, then we could
>> distribute one shared library instead of hundreds of files. (Actually, we'd
>> want two libraries: one for proprietary data, and one for nonproprietary.)
>> I think that would make distribution to end users easier and more reliable.
> 
> Wouldn't that be more easily solved by, for example, packing all the data
> files into a ZIP archive?

Yes.

Packaging a distribution to end users is still a largely manual process,
and I was thinking that putting tables in a DLL would be one way to
automate a portion of that. But putting them in an archive would be just
as easy to automate (although we'd introduce a dependency on an archive
library, if the archives are to be used directly at run time). Anyway,
the best way to make a largely manual process more robust is to automate
it completely.

>> And if that's a good idea, then we might do the same for product-database
>> files ('.policy', '.database', etc.). Then we could remove the primitive
>> caching of those files (Input::CachedProductName_ etc.), which preserves
>> the contents of at most one most-recently-used '.database' file
> 
> Shouldn't I rewrite actuarial_table caching in a reusable manner instead?
> That's not significantly more complicated and could be immediately used
> for the database files without the need for large changes.

Yes, that's a really good idea.

>> (In this part:
>>        actuarial_table *t = new actuarial_table(filename, number);
>>        s_cache[key] = t;
>>        return *t;
>> is there a significant performance advantage to using pointers? Maybe I'm
>> just superstitious, but I worry that pointers can be invalid, so I try
>> really hard to avoid them. We don't have std::map::emplace() in libstdc++
>> even with gcc-4.7:
>>  http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.200x
>> but, until we have it, couldn't we insert a small dummy actuarial_table
>> object into the map and then swap() in the real one? or just let the copy
>> ctor be called, if it doesn't cost much?)
> 
> I don't think there is (although it may be a bit more significant with
> multi-dimensional tables later). It just seems wasteful to me store big
> objects like this by value in std::map — especially when all manipulations
> of s_cache are performed in a single small function that is rather
> straightforward.

Help me understand this--I don't see how any advantage would arise
(assuming we have std::map::emplace(), as we eventually will).
Either way, we store a large object somewhere in memory, and access
it by its address, right? Or were you just thinking of the overhead
of copying, until we have std::map::emplace()?

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [lmi] testing actuarial_table performance, Greg Chicares <=
- Re: [lmi] testing actuarial_table performance, Václav Slavík, 2012/06/11

Prev by Date: [lmi] strtoFDL_msvc linking fix
Next by Date: Re: [lmi] testing actuarial_table performance
Previous by thread: [lmi] strtoFDL_msvc linking fix
Next by thread: Re: [lmi] testing actuarial_table performance
Index(es):
- Date
- Thread