lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] rate_table_tool: merge a whole directory


From: Greg Chicares
Subject: Re: [lmi] rate_table_tool: merge a whole directory
Date: Sat, 3 Dec 2016 22:22:20 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0

On 2016-12-01 15:28, Greg Chicares wrote:
> On 2016-12-01 02:03, Greg Chicares wrote:
> [...]
>> New hypothesis: The original binary tables may be slightly inaccurate.
[...]
> Proven.

Furthermore, converting the value manually as in HEAD at this moment
is less accurate than using std::strtod().

For example, convert 0.02066781 to a double-precision floating-point
number using both methods:

  0.020667810000000001736 --> +1736 manual arithmetic
  0.020667809999999998266 --> -1734 strtod()

Detailed demonstration: I applied this rough experimental patch (which
accidentally includes numerous other experimental changes in my working
copy, but it's obvious which part is relevant here):

----------8<----------8<----------8<----------8<----------8<----------8<----------
diff --git a/rate_table.cpp b/rate_table.cpp
index e123077..797826a 100644
--- a/rate_table.cpp
+++ b/rate_table.cpp
@@ -24,8 +24,11 @@
 #include "rate_table.hpp"
 
 #include "alert.hpp"
+#include "assert_lmi.hpp"
 #include "crc32.hpp"
+#include "miscellany.hpp"               // ios_in_binary(), 
ios_out_trunc_binary()
 #include "path_utility.hpp"
+#include "value_cast.hpp"
 
 #include <boost/filesystem/convenience.hpp>
 #include <boost/filesystem/exception.hpp>
@@ -51,6 +54,7 @@
 #include <cstring>                      // std::strncmp()
 #include <iomanip>
 #include <ios>
+#include <iostream>                     // std::cerr
 #include <istream>
 #include <limits>
 #include <map>
@@ -192,52 +196,6 @@ T get_value_or(boost::optional<T> const& o, U v)
     return o ? *o : v;
 }
 
-template<typename T>
-struct open_file_traits;
-
-template<>
-struct open_file_traits<fs::ifstream>
-{
-    static std::ios_base::openmode get_mode() { return std::ios_base::in; }
-    static char const* describe_access() { return "reading"; }
-};
-
-template<>
-struct open_file_traits<fs::ofstream>
-{
-    static std::ios_base::openmode get_mode() { return std::ios_base::out; }
-    static char const* describe_access() { return "writing"; }
-};
-
-// Helper function opening the stream for reading or writing the given file and
-// throwing an exception on error. It shouldn't be used directly, prefer to use
-// the more readable open_{text,binary}_file() helpers below.
-template<typename T>
-void open_file(T& ifs, fs::path const& path, std::ios_base::openmode mode)
-{
-    ifs.open(path, open_file_traits<T>::get_mode() | mode);
-    if(!ifs)
-        {
-        fatal_error()
-            << "file '" << path << "' could not be opened for "
-            << open_file_traits<T>::describe_access()
-            << std::flush
-            ;
-        }
-}
-
-template<typename T>
-inline void open_text_file(T& fs, fs::path const& path)
-{
-    open_file(fs, path, static_cast<std::ios_base::openmode>(0));
-}
-
-template<typename T>
-inline void open_binary_file(T& fs, fs::path const& path)
-{
-    open_file(fs, path, std::ios_base::binary);
-}
-
 // Functions doing the same thing as istream::read() and ostream::write()
 // respectively, but taking void pointers and this allowing to avoid ugly casts
 // to char in the calling code.
@@ -1567,6 +1525,8 @@ double table_impl::parse_single_value
     ,int line_num
     )
 {
+    char const* beginning = current;
+
     // The number of spaces before the value should be at least one,
     // and no greater than (gap_length, plus one if the number of
     // decimals is zero, because get_value_width() assumes, contrary
@@ -1653,6 +1613,51 @@ double table_impl::parse_single_value
     value /= std::pow(10, *num_decimals_);
     value += res_int_part.num;
 
+    if(value != std::strtod(beginning, NULL))
+        {
+        double x_arith  = value;
+        double x_strtod = std::strtod(beginning, NULL);
+
+        std::ostringstream oss;
+        oss << std::fixed << std::setprecision(21) << x_arith;
+        std::string s_arith = oss.str();
+        oss.str(std::string());
+        oss.clear();
+        oss << std::fixed << std::setprecision(21) << x_strtod;
+        std::string s_strtod = oss.str();
+
+        int const N = 9;
+        int i_arith  = value_cast<int>(s_arith .substr(s_arith .size() - N));
+        int i_strtod = value_cast<int>(s_strtod.substr(s_strtod.size() - N));
+        if(i_arith  > 5e5) i_arith  = 1e9 - i_arith;
+        if(i_strtod > 5e5) i_strtod = 1e9 - i_strtod;
+
+        if     (i_strtod <  i_arith) std::cerr << "strtod is better" << 
std::endl;
+        else if(i_strtod == i_arith) std::cerr << "same"             << 
std::endl;
+        else                         std::cerr << "strtod is WORSE"  << 
std::endl;
+
+        std::cerr // reset flags later, if we keep this
+            << location_info(line_num, beginning - start + 1)
+            << "\nDiscrepancy:\n"
+//            << std::hex << std::setw(8) << std::setfill('0')
+//            << "  " << std::hex << std::setw(8) << value << " arithmetic\n"
+//            << "  " << std::hex << std::setw(8) << std::strtod(beginning, 
NULL) << " strtod()\n"
+//            << "  " << std::setprecision(21) << value << " arithmetic\n"
+//            << "  " << std::setprecision(21) << std::strtod(beginning, NULL) 
<< " strtod()\n"
+//            << "  " << std::fixed << std::setprecision(21) << x_arith  << " 
arith\n"
+//            << "  " << std::fixed << std::setprecision(21) << x_strtod << " 
strtod()\n"
+            << "  " << s_arith  << " " << i_arith  << " arith\n"
+            << "  " << s_strtod << " " << i_strtod << " strtod\n"
+            << std::endl
+            ;
+        }
+//    LMI_ASSERT(value == std::strtod(beginning, NULL));
+
+#if 0
+    std::string s(beginning, current);
+    return value_cast<double>(s);
+#endif // 0
+
     return value;
 }
 
@@ -2306,11 +2311,14 @@ unsigned long table_impl::compute_hash_value() const
 
 table table::read_from_text(fs::path const& file)
 {
+    fs::ifstream ifs(file, ios_in_binary());
+    if(!ifs) fatal_error() << "Unable to open '" << file << "'." << LMI_FLUSH;
+
     try
         {
-        fs::ifstream ifs;
-        open_text_file(ifs, file);
-
+//    fs::ifstream ifs(file, ios_in_binary());
+//    if(!ifs) fatal_error() << "Unable to open '" << file << "'." << 
LMI_FLUSH;
+// move above?
         return table(table_impl::create_from_text(ifs));
         }
     catch(std::runtime_error const& e)
@@ -2347,8 +2355,8 @@ table table::read_from_text(std::string const& text)
 
 void table::save_as_text(fs::path const& file) const
 {
-    fs::ofstream ofs;
-    open_text_file(ofs, file);
+    fs::ofstream ofs(file, ios_out_trunc_binary());
+    if(!ofs) fatal_error() << "Unable to open '" << file << "'." << LMI_FLUSH;
 
     impl_->write_as_text(ofs);
 }
@@ -2523,17 +2531,16 @@ database_impl::database_impl(fs::path const& path)
     :path_(path)
 {
     fs::path const index_path = get_index_path(path);
-
-    fs::ifstream index_ifs;
-    open_binary_file(index_ifs, index_path);
+    fs::ifstream index_ifs(index_path, ios_in_binary());
+    if(!index_ifs) fatal_error() << "Unable to open '" << index_path << "'." 
<< LMI_FLUSH;
     read_index(index_ifs);
 
     // Open the database file right now to ensure that we can do it, even if we
     // don't need it just yet. As it will be used soon anyhow, delaying opening
     // it wouldn't be a useful optimization.
-    auto const ifs = std::make_shared<fs::ifstream>();
-    open_binary_file(*ifs, get_data_path(path));
-
+    fs::path const data_path = get_data_path(path);
+    auto const ifs = std::make_shared<fs::ifstream>(data_path, 
ios_in_binary());
+    if(!*ifs) fatal_error() << "Unable to open '" << data_path << "'." << 
LMI_FLUSH;
     data_is_ = ifs;
 }
 
@@ -2921,7 +2928,8 @@ void database_impl::save(fs::path const& path)
                     )
                 ,description_(description)
             {
-            open_binary_file(ofs_, temp_path_);
+            ofs_.open(temp_path_, ios_out_trunc_binary());
+            if(!ofs_) fatal_error() << "Unable to open '" << temp_path_ << 
"'." << LMI_FLUSH;
             }
 
             void close()
---------->8---------->8---------->8---------->8---------->8---------->8----------

Then I used that code to convert hundreds of proprietary tables,
which produced this output:

/tmp[0]$wine /opt/lmi/bin/rate_table_tool.exe --accept --file=proprietary 
--merge=/tmp/rate_tables          
strtod is better
 at position 448 at line 52
Discrepancy:
  0.014887009999999999132 868 arith
  0.014887010000000000867 867 strtod

same
 at position 100 at line 65
Discrepancy:
  0.000205070000000000014 14 arith
  0.000205069999999999986 14 strtod

same
 at position 28 at line 27
Discrepancy:
  0.000157120000000000014 14 arith
  0.000157119999999999986 14 strtod

strtod is better
 at position 652 at line 32
Discrepancy:
  0.020667810000000001736 1736 arith
  0.020667809999999998266 1734 strtod

same
 at position 208 at line 42
Discrepancy:
  0.000527450000000000054 54 arith
  0.000527449999999999946 54 strtod

same
 at position 232 at line 52
Discrepancy:
  0.001882089999999999892 108 arith
  0.001882090000000000108 108 strtod

same
 at position 16 at line 72
Discrepancy:
  0.002695460000000000217 217 arith
  0.002695459999999999783 217 strtod

same
 at position 52 at line 19
Discrepancy:
  0.000078560000000000007 7 arith
  0.000078559999999999993 7 strtod

same
 at position 52 at line 23
Discrepancy:
  0.000078560000000000007 7 arith
  0.000078559999999999993 7 strtod

strtod is better
 at position 88 at line 78
Discrepancy:
  0.011484339999999999132 868 arith
  0.011484340000000000867 867 strtod

same
 at position 100 at line 46
Discrepancy:
  0.000065289999999999993 7 arith
  0.000065290000000000007 7 strtod

same
 at position 148 at line 30
Discrepancy:
  0.000130579999999999986 14 arith
  0.000130580000000000014 14 strtod

same
 at position 100 at line 67
Discrepancy:
  0.006450950000000000434 434 arith
  0.006450949999999999566 434 strtod

same
 at position 388 at line 33
Discrepancy:
  0.000714119999999999946 54 arith
  0.000714120000000000054 54 strtod

same
 at position 256 at line 31
Discrepancy:
  0.000810020000000000054 54 arith
  0.000810019999999999946 54 strtod

same
 at position 532 at line 21
Discrepancy:
  0.001609779999999999892 108 arith
  0.001609780000000000108 108 strtod

same
 at position 316 at line 25
Discrepancy:
  0.000447829999999999973 27 arith
  0.000447830000000000027 27 strtod

same
 at position 460 at line 30
Discrepancy:
  0.002039550000000000217 217 arith
  0.002039549999999999783 217 strtod

same
 at position 16 at line 44
Discrepancy:
  0.000157120000000000014 14 arith
  0.000157119999999999986 14 strtod

same
 at position 52 at line 69
Discrepancy:
  0.001731330000000000108 108 arith
  0.001731329999999999892 108 strtod

strtod is better
 at position 112 at line 70
Discrepancy:
  0.004664760000000000434 434 arith
  0.004664759999999999567 433 strtod

strtod is better
 at position 232 at line 68
Discrepancy:
  0.020848619999999998265 1735 arith
  0.020848620000000001734 1734 strtod

same
 at position 652 at line 24
Discrepancy:
  0.004430349999999999566 434 arith
  0.004430350000000000434 434 strtod

same
 at position 100 at line 67
Discrepancy:
  0.006450950000000000434 434 arith
  0.006450949999999999566 434 strtod

same
 at position 64 at line 14
Discrepancy:
  0.000065289999999999993 7 arith
  0.000065290000000000007 7 strtod

same
 at position 220 at line 58
Discrepancy:
  0.006702740000000000434 434 arith
  0.006702739999999999566 434 strtod

same
 at position 448 at line 33
Discrepancy:
  0.003401099999999999783 217 arith
  0.003401100000000000217 217 strtod

same
 at position 640 at line 22
Discrepancy:
  0.002765710000000000217 217 arith
  0.002765709999999999783 217 strtod

same
 at position 196 at line 23
Discrepancy:
  0.000078560000000000007 7 arith
  0.000078559999999999993 7 strtod

same
 at position 112 at line 48
Discrepancy:
  0.000178529999999999986 14 arith
  0.000178530000000000014 14 strtod

strtod is better
 at position 232 at line 68
Discrepancy:
  0.020848619999999998265 1735 arith
  0.020848620000000001734 1734 strtod

same
 at position 316 at line 32
Discrepancy:
  0.000314240000000000027 27 arith
  0.000314239999999999973 27 strtod

same
 at position 100 at line 67
Discrepancy:
  0.006450950000000000434 434 arith
  0.006450949999999999566 434 strtod

same
 at position 304 at line 38
Discrepancy:
  0.001428239999999999892 108 arith
  0.001428240000000000108 108 strtod

same
 at position 136 at line 46
Discrepancy:
  0.000628480000000000054 54 arith
  0.000628479999999999946 54 strtod

same
 at position 172 at line 55
Discrepancy:
  0.001902609999999999892 108 arith
  0.001902610000000000108 108 strtod

same
 at position 304 at line 15
Discrepancy:
  0.000314240000000000027 27 arith
  0.000314239999999999973 27 strtod

same
 at position 268 at line 18
Discrepancy:
  0.000314240000000000027 27 arith
  0.000314239999999999973 27 strtod

same
 at position 124 at line 22
Discrepancy:
  0.000039280000000000003 3 arith
  0.000039279999999999997 3 strtod

same
 at position 232 at line 59
Discrepancy:
  0.002877000000000000217 217 arith
  0.002876999999999999783 217 strtod

same
 at position 340 at line 26
Discrepancy:
  0.000714119999999999946 54 arith
  0.000714120000000000054 54 strtod

same
 at position 40 at line 15
Discrepancy:
  0.000065289999999999993 7 arith
  0.000065290000000000007 7 strtod

same
 at position 184 at line 27
Discrepancy:
  0.000314240000000000027 27 arith
  0.000314239999999999973 27 strtod

same
 at position 76 at line 67
Discrepancy:
  0.004793429999999999566 434 arith
  0.004793430000000000434 434 strtod

same
 at position 196 at line 69
Discrepancy:
  0.014131640000000000867 867 arith
  0.014131639999999999133 867 strtod

strtod is better
 at position 4 at line 61
Discrepancy:
  0.054417599999999996530 3470 arith
  0.054417600000000003468 3468 strtod

same
 at position 184 at line 20
Discrepancy:
  0.000130579999999999986 14 arith
  0.000130580000000000014 14 strtod

same
 at position 256 at line 39
Discrepancy:
  0.000911050000000000054 54 arith
  0.000911049999999999946 54 strtod

same
 at position 4 at line 28
Discrepancy:
  0.000019640000000000002 2 arith
  0.000019639999999999998 2 strtod

same
 at position 100 at line 67
Discrepancy:
  0.006450950000000000434 434 arith
  0.006450949999999999566 434 strtod

same
 at position 388 at line 21
Discrepancy:
  0.000447829999999999973 27 arith
  0.000447830000000000027 27 strtod

strtod is better
 at position 124 at line 69
Discrepancy:
  0.008971989999999999132 868 arith
  0.008971990000000000867 867 strtod

same
 at position 220 at line 25
Discrepancy:
  0.000261159999999999973 27 arith
  0.000261160000000000027 27 strtod

same
 at position 208 at line 42
Discrepancy:
  0.000527450000000000054 54 arith
  0.000527449999999999946 54 strtod

same
 at position 232 at line 52
Discrepancy:
  0.001882089999999999892 108 arith
  0.001882090000000000108 108 strtod

same
 at position 16 at line 72
Discrepancy:
  0.002695460000000000217 217 arith
  0.002695459999999999783 217 strtod

same
 at position 76 at line 61
Discrepancy:
  0.000130579999999999986 14 arith
  0.000130580000000000014 14 strtod

same
 at position 76 at line 25
Discrepancy:
  0.000065289999999999993 7 arith
  0.000065290000000000007 7 strtod

strtod is better
 at position 796 at line 25
Discrepancy:
  0.015056719999999999132 868 arith
  0.015056720000000000867 867 strtod

same
 at position 340 at line 49
Discrepancy:
  0.003644200000000000217 217 arith
  0.003644199999999999783 217 strtod

strtod is better
 at position 484 at line 35
Discrepancy:
  0.005139130000000000434 434 arith
  0.005139129999999999567 433 strtod

same
 at position 100 at line 67
Discrepancy:
  0.006450950000000000434 434 arith
  0.006450949999999999566 434 strtod

same
 at position 148 at line 18
Discrepancy:
  0.000065289999999999993 7 arith
  0.000065290000000000007 7 strtod

same
 at position 64 at line 14
Discrepancy:
  0.000065289999999999993 7 arith
  0.000065290000000000007 7 strtod

same
 at position 220 at line 58
Discrepancy:
  0.006702740000000000434 434 arith
  0.006702739999999999566 434 strtod

strtod is better
 at position 448 at line 47
Discrepancy:
  0.013715689999999999132 868 arith
  0.013715690000000000867 867 strtod

same
 at position 4 at line 53
Discrepancy:
  0.001044639999999999892 108 arith
  0.001044640000000000108 108 strtod

strtod is better
 at position 412 at line 55
Discrepancy:
  0.016205110000000001735 1735 arith
  0.016205109999999998266 1734 strtod

strtod is better
 at position 448 at line 47
Discrepancy:
  0.013715689999999999132 868 arith
  0.013715690000000000867 867 strtod

strtod is better
 at position 736 at line 24
Discrepancy:
  0.008526829999999999132 868 arith
  0.008526830000000000867 867 strtod

same
 at position 604 at line 39
Discrepancy:
  0.011092049999999999133 867 arith
  0.011092050000000000867 867 strtod

same
 at position 208 at line 42
Discrepancy:
  0.000527450000000000054 54 arith
  0.000527449999999999946 54 strtod

same
 at position 232 at line 52
Discrepancy:
  0.001882089999999999892 108 arith
  0.001882090000000000108 108 strtod

same
 at position 16 at line 72
Discrepancy:
  0.002695460000000000217 217 arith
  0.002695459999999999783 217 strtod

Number of tables merged: 594
/tmp[0]$

Many of those discrepancies are marked "same", meaning only that the
technique used here isn't powerful enough to discern which converted
value is correctly rounded, though it's not likely that both are.
But even this weak technique shows cases where we should prefer
std::strtod(), and none where we shouldn't prefer it; I think that's
reason enough to conclude that we should use the library function
rather than trying to do this conversion ourselves.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]