[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] [lmi-commits] master 9c510ad 16/22: Measure elapsed time for M
From: |
Greg Chicares |
Subject: |
Re: [lmi] [lmi-commits] master 9c510ad 16/22: Measure elapsed time for MD5 data-file validation |
Date: |
Mon, 30 Mar 2020 00:34:04 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 |
On 2020-03-29 17:58, Vadim Zeitlin wrote:
>
> First of all, thanks a lot for merging this pull request
Thank you and Ilya--it was easy to review because everything's so clear.
> On Sat, 28 Mar 2020 18:23:38 -0400 (EDT) Greg Chicares <address@hidden> wrote:
[...]
> GC> Measure how long it takes to validate MD5 files by two methods:
> GC> - an external md5sum program, as in the past; and
> GC> - internally, as now.
>
> I'd expect there to be a constant difference (in favour of the internal
> calculation), as the 2 methods use more or less the same code, but in the
> external case we also have to pay the penalty for shelling out to another
> process.
[...snip timings from 9c510ad08bb commit message...]
> This would tend to indicate that this penalty is of order of 100ms.
[...timings for another scenario...]
> However here it's more like 150ms.
[...another set of timings...]
> And now it's down to 65ms.
>
> I don't know if the benchmarking data confirm or infirm my hypothesis, to
> be honest. The differences of a few dozens milliseconds could well be due
> to external factors when working with files on a not completely idle
> system.
I just dismissed those differences and concluded as you hypothesized.
> GC> Is the extra security worth the extra delay?
>
> I don't know about this neither, ~100ms is already noticeable and I think
> it could easily be worse on slower machines and/or when using slower
> storage.
Yes, that's why we need to test it on corporate laptops.
The lmi startup speed has already improved by something like 100 ms on my
machine, and probably rather more on those laptops.
The change proposed for measurement and discussion is to revalidate all
data files each time one of these reports is prepared:
- PDF illustration
- PDF group quote (defect: escapes validation today)
and perhaps also these (which are "not to be shared with the public",
but I can't say that instruction is always followed):
- group roster
- "print case to spreadsheet"
A single PDF illustration takes several hundred milliseconds here;
adding seventy msec introduces a small delay that might however be
noticeable. As for the group reports (the last three), perhaps we
could easily revalidate once per group rather than once per cell,
so that the amortized cost of revalidation would be negligible.
Alternatively, we might decide that this isn't worth worrying about,
and just reduce the number of "TODO ??" markers from 350 to 349.
Or of course we could obscurify the
*.database *.funds *.policy *.rounding *.strata
files somehow, and validate only 'expiry' ('configurable_settings.xml'
should probably be ignored anyway). Or, at the other extreme, we could
revalidate everything all the time, and then de-obscurify the '*.xst'
files.
> In fact, the main reason for writing this reply is that I'm surprised by
> how long it takes to compute the hash. Using "openssl speed md5", I get
> ~300MB/s for 64 byte blocks and while I don't know how big the files we're
> verifying are exactly, I'm pretty sure that they're nowhere close to 30MB
> in size. So I wonder if it could be useful to profile the code doing this
> calculation to check if we're not doing something stupid?
Sounds worthwhile.