[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Help modifying 'rate_table.cpp'
From: |
Greg Chicares |
Subject: |
Re: [lmi] Help modifying 'rate_table.cpp' |
Date: |
Thu, 15 Dec 2016 06:02:29 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0 |
On 2016-12-14 18:16, Vadim Zeitlin wrote:
> On Wed, 14 Dec 2016 14:50:23 +0000 Greg Chicares <address@hidden> wrote:
>
> GC> The patch below [0] shows what I'm trying to accomplish: given a flawed
> GC> historical binary file whose values are inconsistent with its stated
> GC> number of decimals, write a text file that states the number of decimals
> GC> that seems to have been intended and formats values to that corrected
> GC> precision.
[...and fix the CRC too...]
> [...] I'd actually prefer to do this on reading the table,
> not when writing it. For me logically adjusting the number of decimals used
> is part of validation and so should be done as early as possible, i.e. in
> validate().
Thanks, I did it that way. Pushed 82b89079ffcdd3702b4c09b3c5455e4a7a3650e0:
Detect and repair decimal-precision inconsistencies in rate tables
> GC> Or
> GC> maybe the right place for this change is
> GC> soa_v3_format::table_impl::write_as_text()
> GC> so that it would be excluded from
> GC> soa_v3_format::table_impl::write_as_binary()
> GC> but that breaks the implicit assumption that those two functions
> GC> share the same implementation.
>
> I think it's nice to reuse the same implementation for both of them. The
> idea was that we could add write_as_xml() (or maybe write_as_json() or
> whatever) later relatively easily, if needed.
Yes, I realize that now. At first, I wondered why functions like
writer::write_values() were implemented in each of two namespaces.
That does make it harder to search for a particular one. I guess
I'm used to it now, but for a while I was thinking life would be
easier if the implementations had a redundant tag, thus:
binary_format::writer::write_values()
text_format::writer::write_values()
I initially thought I'd find a common base class, with these
two functions being virtual. But maybe the things in different
namespaces aren't similar enough for polymorphism.
It still seems odd, though, to have do_write() but not do_read().
> GC> But there's another possible problem. I had imagined that changing the
> GC> number of decimals could affect only write operations, but its value
> GC> is used in some read operations as well. In parse_single_value(), it's
> GC> used to set 'num_spaces_allowed', which is used only for validation;
> GC> there, I'd hesitate to "fix" it, because that might prevent reading
> GC> a table. It's also used here:
> GC>
> GC> // After the max select age only the last column remains,
> just
> GC> // skip the spaces until it.
> GC> skip_spaces
> GC>
> (*select_period_*text_format::get_value_width(*num_decimals_)
> GC> ,start
> GC> ,current
> GC> ,line_num
> GC> );
> GC>
> GC> and would the same reasoning apply there? I suppose that serves a
> GC> validation purpose, too: if the stated number of decimals is not
> GC> consistent with the format of the text table being read, then
> GC> that might indicate a serious problem (a different problem than the
> GC> one addressed by the patch below, which tries to fix an inconsistency
> GC> between the stated number of decimals and binary values).
>
> If we assume that the number of decimals in the text format always
> corresponds to the reality, then it shouldn't matter because it's never
> going to be adjusted anyhow in this case then. If we do encounter a text
> file with mismatching number of decimals, then it will result in an error
> when trying to read it -- but this would seem like the right thing to do,
> wouldn't it?
Sounds good. As the commit message explains in detail, that should be
the ultimate design, but for now I've made the code issue a mere warning
and then correct the error--so that defective old tables can readily be
fixed. I hope to get them all fixed soon, because it's tedious to clean
up all these megabytes of old data and I'd like to get past that.
> The only problem is that the error message won't clearly say
> that the problem is due to a mismatch between the number of decimals
> specified and the precision actually used for the values, but as long as we
> suppose that this is never going to happen anyhow, the quality of error
> message is probably not very important.
Take a look--I think the code I just committed manages to do that.
> GC> Vadim, where do you think I should make this change?
>
> In validate().
Doing it there, instead of in writer::write_values() (in the "text"
namespace IIRC) as in my original sketch, also brought the very
welcome incidental benefit that I didn't have to deal with objects
of type 'boost::optional const&'. I realize it's awfully late for
code review, but what was the rationale for 'optional'? I realize
that some fields such as "Construction method:" may be absent, but
for text fields like that, wouldn't an empty string do as well? Or
were any non-text fields missing--e.g., "Maximum select age:" for
a table that is not select (so that the field wouldn't really have
any meaning--but I thought it had a required value nonetheless)?
BTW, I looked over my latest change before committing it, and saw
that I had written
*hash_value_ = compute_hash_value();
with an asterisk, where a few lines lower you had
hash_value_ = correct_hash_value;
with no asterisk. Now, hash_value_ is 'boost::optional<uint32_t>',
so '*hash_value_' invokes 'T& operator*()' and we can assign a T
through that reference. But how does the line with no asterisk
work? The RHS of the assignment is 'correct_hash_value':
auto const correct_hash_value = compute_hash_value();
and because it's auto we look up the function:
unsigned long table_impl::compute_hash_value() const
and find that the type is 'unsigned long'. Oh, wait, I see:
optional& operator=(T const&)
makes it work without an asterisk. But...was it really a good
design decision for boost to provide that assignment operator?
Even though it seems convenient, isn't it astonishing that these
two statements:
*p = 42;
p = 42;
are equivalent?