[lmi] Help modifying 'rate

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Help modifying 'rate_table.cpp'

From:	Greg Chicares
Subject:	[lmi] Help modifying 'rate_table.cpp'
Date:	Wed, 14 Dec 2016 14:50:23 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0

The patch below [0] shows what I'm trying to accomplish: given a flawed
historical binary file whose values are inconsistent with its stated
number of decimals, write a text file that states the number of decimals
that seems to have been intended and formats values to that corrected
precision. Everything was fine until I realized that these corrections
require the hash value to be recalculated, so I added this line:

+        hash_value_ = compute_hash_value();

in the patch below, but that line doesn't compile: both hash_value_ and
compute_hash_value() are hidden from this context.

I suppose this means I should move these changes from the vicinity of
text_format::writer::write_values(), probably to table_impl::do_write().
Then the number of decimals and hash value would be recalculated in
  text_format::writer::write_values()
which is what's really wanted, and also, accidentally, in
  binary_format::writer::write_values()
which isn't desired but might be okay (although I'm not sure whether
that might cause hash values to differ between binary and text). Or
maybe the right place for this change is
    soa_v3_format::table_impl::write_as_text()
so that it would be excluded from
    soa_v3_format::table_impl::write_as_binary()
but that breaks the implicit assumption that those two functions
share the same implementation.

But there's another possible problem. I had imagined that changing the
number of decimals could affect only write operations, but its value
is used in some read operations as well. In parse_single_value(), it's
used to set 'num_spaces_allowed', which is used only for validation;
there, I'd hesitate to "fix" it, because that might prevent reading
a table. It's also used here:

                // After the max select age only the last column remains, just
                // skip the spaces until it.
                skip_spaces
                    
(*select_period_*text_format::get_value_width(*num_decimals_)
                    ,start
                    ,current
                    ,line_num
                    );

and would the same reasoning apply there? I suppose that serves a
validation purpose, too: if the stated number of decimals is not
consistent with the *format* of the text table being read, then
that might indicate a serious problem (a different problem than the
one addressed by the patch below, which tries to fix an inconsistency
between the stated number of decimals and binary values).

Vadim, where do you think I should make this change?

---------

[0] "The patch below":

diff --git a/rate_table.cpp b/rate_table.cpp
index bbc3a04..84f89c2 100644
--- a/rate_table.cpp
+++ b/rate_table.cpp
@@ -607,15 +607,121 @@ void writer::write_table_type(table_type tt)
         ;
 }
 
+namespace
+{
+std::size_t deduce_number_of_decimals(double arg)
+{
+    std::string s = value_cast<std::string>(arg);
+
+    // Early exit: no decimal point means zero decimals.
+    if(std::string::npos == s.find('.'))
+        {
+        return 0;
+        }
+
+    std::size_t d = 0;
+
+    // Strip leading blanks and zeros.
+    std::string::size_type q = s.find_first_not_of(" 0");
+    if(std::string::npos != q)
+        {
+        s.erase(0, q);
+        }
+
+    // Strip trailing blanks.
+    std::string::size_type r = s.find_last_not_of(" ");
+    if(std::string::npos != r)
+        {
+        s.erase(1 + r);
+        }
+
+    // Preliminary result is number of characters after '.'.
+    // (Decrement for '.' unless nothing followed it.)
+    d = s.size() - s.find('.');
+    if(d) --d;
+
+    // Length of stripped string is number of significant digits
+    // (on both sides of the decimal point) plus one for the '.'.
+    // If this total exceeds 15--i.e., if there are more than 14
+    // significant digits--then there may be excess precision.
+    // In that case, keep only the first 15 digits (plus the '.',
+    // for a total of 16 characters), because those digits are
+    // guaranteed to be significant for IEEE754 double precision;
+    // drop the rest, which may include arbitrary digits. Then
+    // drop any trailing string that's all zeros or nines, and
+    // return the length of the remaining string. This wrongly
+    // truncates a number whose representation requires 15 or 16
+    // digits when the last one or more decimal digit is a nine,
+    // but that doesn't matter for the present use case: rate
+    // tables aren't expected to have more than about eight
+    // decimal places; and this function will be called for each
+    // number in a table and the maximum result used, so that
+    // such incorrect truncation can only occur if every number
+    // in the table is ill-conditioned in this way.
+    if(15 < s.size())
+        {
+        s.resize(16);
+        if('0' == s.back() || '9' == s.back())
+            {
+            d = s.find_last_not_of(s.back()) - s.find('.');
+            }
+        }
+
+    return d;
+}
+
+std::size_t deduce_number_of_decimals(std::vector<double> const& values)
+{
+    std::size_t required_decimals = 0;
+    for(auto v: values)
+        {
+        required_decimals = std::max(required_decimals, 
deduce_number_of_decimals(v));
+        }
+    return required_decimals;
+}
+} // Unnamed namespace.
+
 void writer::write_values
         (std::vector<double> const& values
-        ,boost::optional<uint16_t> const& num_decimals
+        ,boost::optional<uint16_t> const& putative_num_decimals
         ,boost::optional<uint16_t> const& min_age
         ,boost::optional<uint16_t> const& max_age
         ,boost::optional<uint16_t> const& select_period
         ,boost::optional<uint16_t> const& max_select_age
         )
 {
+    uint16_t specified_decimals =
+          putative_num_decimals
+        ? *putative_num_decimals
+        : throw std::logic_error("Number of decimals must be specified.")
+        ;
+    uint16_t required_decimals = deduce_number_of_decimals(values);
+    if(required_decimals < specified_decimals)
+        {
+        warning()
+            << "Table specifies " << specified_decimals << " decimals"
+            << " but " << required_decimals << " would suffice."
+            << LMI_FLUSH
+            ;
+        }
+    uint16_t corrected_decimals = specified_decimals;
+// Later, change '<' to '!=' to trim trailing zeros too (and expunge
+// the warning above); but for now, adjust only for lost precision.
+    if(required_decimals < specified_decimals)
+        {
+        warning()
+            << "Table specifies " << specified_decimals << " decimals"
+            << " but " << required_decimals << " were necessary."
+            << "\nThis flaw has been corrected, and the CRC recalculated."
+            << LMI_FLUSH
+            ;
+        corrected_decimals = required_decimals;
+        hash_value_ = compute_hash_value();
+        }
+    // The optional<>const& argument cannot be modified, so construct
+    // a new object with the corrected value.
+    boost::optional<uint16_t> const num_decimals(corrected_decimals);
+
     write(e_field_min_age            , min_age             );
     write(e_field_max_age            , max_age             );
     write(e_field_select_period      , select_period       );

[Prev in Thread]

Current Thread

[Next in Thread]

[lmi] Help modifying 'rate_table.cpp', Greg Chicares <=
- Re: [lmi] Help modifying 'rate_table.cpp', Vadim Zeitlin, 2016/12/14
  - Re: [lmi] Help modifying 'rate_table.cpp', Greg Chicares, 2016/12/15
    - Re: [lmi] Help modifying 'rate_table.cpp', Vadim Zeitlin, 2016/12/15
    - Re: [lmi] Help modifying 'rate_table.cpp', Greg Chicares, 2016/12/15
    - Re: [lmi] Help modifying 'rate_table.cpp', Vadim Zeitlin, 2016/12/15

Prev by Date: Re: [lmi] Numerics
Next by Date: Re: [lmi] Numerics
Previous by thread: Re: [lmi] Numerics
Next by thread: Re: [lmi] Help modifying 'rate_table.cpp'
Index(es):
- Date
- Thread