lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] rate_table_tool: merge a whole directory


From: Vadim Zeitlin
Subject: Re: [lmi] rate_table_tool: merge a whole directory
Date: Sun, 4 Dec 2016 18:28:47 +0100

On Sat, 3 Dec 2016 22:22:20 +0000 Greg Chicares <address@hidden> wrote:

GC> Furthermore, converting the value manually as in HEAD at this moment
GC> is less accurate than using std::strtod().

 I spent some time (more than I should have, probably...) trying to
understand how can this happen and I think I finally did, so I'd like to
share it with you just in case it can be useful. Please note that I still
didn't find any reason to prefer doing the parsing manually rather than
using strtod() other than performance (using strtod() is exactly two times
slower in my tests, but considering its absolute performance it still
doesn't matter), so you can just omit all the rest of this message if
you're not interested in the details.

 So, I wrote a small test program that just dumbly iterates over all
possible strings of the form "0.ddddd" with the given number of digits
after the point and compares the results of using strtoul() and dividing
by the exponent manually and calling strtod, here it is:
---------------------------------- >8 --------------------------------------
#include <array>
#include <chrono>
#include <cmath>
#include <iomanip>
#include <iostream>

#ifdef USE_THREADS
    #include <mutex>
    #include <thread>

    std::mutex mutex_cout;
#endif

#ifndef NUM_DECIMALS
    #error Use -DNUM_DECIMALS=n on the command line
#endif

const double POWER_10 = std::pow(10, NUM_DECIMALS);
using number_as_text = std::array<char, NUM_DECIMALS + 3>;

void check_parse(number_as_text const& t)
{
    auto d1 = std::strtoull(&t[2], nullptr, 10) / POWER_10,
         d2 = std::strtod(&t[0], nullptr);

    if (d1 != d2) {
#ifdef USE_THREADS
        std::lock_guard<std::mutex> lock(mutex_cout);
#endif

        std::cout << std::fixed << std::setprecision(21);
        std::cout << t.data() << " " << d1 << " " << d2 << "\n";
    }
}

void check_all_digits_at(number_as_text& t, int pos)
{
    for (char c = '0'; c <= '9'; ++c) {
        t[pos + 2] = c;

        if (pos == NUM_DECIMALS - 1)
            check_parse(t);
        else
            check_all_digits_at(t, pos + 1);
    }
}

int main()
{
    number_as_text t;
    t.fill('0');
    t[1] = '.';
    t.back() = '\0';

    using namespace std::chrono;
    const auto start = steady_clock::now();

#ifdef USE_THREADS
    std::array<std::thread, 10> threads;
    std::array<number_as_text, 10> texts;

    for (int n = 0; n < 10; ++n) {
        auto& t_fixed_first = texts[n];
        t_fixed_first = t;
        t_fixed_first[2] = static_cast<char>('0' + n);

        threads[n] = std::thread(check_all_digits_at, std::ref(t_fixed_first), 
1);
    }

    for (auto& thr: threads) {
        thr.join();
    }
#else
    check_all_digits_at(t, 0);
#endif

    std::cout << "Checked " << static_cast<uint64_t>(POWER_10) << " numbers in "
              << duration_cast<milliseconds>(steady_clock::now() - 
start).count()
              << "ms\n";

    return 0;
}
---------------------------------- >8 --------------------------------------

 Running this program for the number of decimals up to 10 (but you'd better
use threads if you use this precision, it takes roughly 10 minutes with
them and so would take an hour and a half without) doesn't detect any
discrepancies when it's compiled with either MSVS 2015 or g++ 4.9.1 under
Linux x64. However compiling it with MinGW 4.9.1 compiler lmi uses or (and
this is the part which took me an embarrassingly long time to understand)
in 32 bit mode under Linux, shows plenty of discrepancies starting from
NUM_DECIMALS=6 (there are none for 5).

 Of course, after realizing that the difference in gcc behaviour was due to
the difference in architecture, it didn't take me long to understand that
this was due to using x87 floating point instructions in 32 bits. And,
indeed, adding "-mfpmath=sse -msse2" to the compilation options makes all
the check pass even when using MinGW 4.9.1 (I only tested it up to 8
decimals though, as, without std::thread support that it lacks, anything
beyond starts taking too long).

 I won't repeat the arguments I had already made (and even several times, I
believe) for switching to the use of SSE instructions in lmi, but it's just
clearly annoying to have different (and non-standard, without speaking of
slower) behaviour in lmi code than with the other compilers/architectures.
I realize switching to SSE is not a priority (and probably will never be),
but it would still be nice to stop using x87.

 Regards,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]