[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Automated GUI testing, revisited
From: |
Greg Chicares |
Subject: |
Re: [lmi] Automated GUI testing, revisited |
Date: |
Thu, 04 Dec 2014 14:10:48 +0000 |
User-agent: |
Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 |
On 2014-11-14 15:43Z, Vadim Zeitlin wrote:
> On Wed, 12 Nov 2014 01:55:21 +0000 Greg Chicares <address@hidden> wrote:
[...]
> GC> Now I'm wondering whether 'wx_test.conf' is really necessary.
>
> The current file was added solely for the purpose of storing
> machine-specific timing statistics, so if we don't need them any more, it's
> indeed not necessary.
Yes, I absolutely want to get rid of that automated timing comparison.
I ran the test suite yesterday, and once again the a test "failed"
because the time differed by more than ten percent...and I suspect
that this "error" caused some other subtest not to run when I really
did want it to be run. I didn't spend time verifying this suspicion,
because I really don't want any automated timing comparison at all.
Instead, we'll capture the stderr output and compare that visually
to saved output.
> However a local config file might be useful for other things which are
> specific to a particular machine or a particular installation, e.g. as I
> proposed yesterday, we could use it to specify the machine-specific
> directory containing the test files. So I'm not so sure we should hurry to
> get rid of it.
Okay, let's not rush that decision.
> GC> I think a similar workflow for 'wx_test' makes the most sense.
> GC> We'll run 'wx_test' often (maybe whenever we run a system test),
> GC> and save a series of results, e.g. as 'wx_test-20141022T2055Z'
> GC> etc. to parallel 'md5sums-20141022T2055Z' above. Comparing a new
> GC> file side by side against a saved touchstone[0] lets us see what
> GC> changed, e.g.
> GC> + MSEC_0.cns run=434 disk=11000 spreadsheet=710
> GC> - MSEC_0.cns run=441 disk=11022 spreadsheet=708
(BTW, I will soon propose that we write the about-dialog version string,
and expiry dates, to stderr so that we can compare them in exactly the
same fashion. We'll fold this into our accustomed workflow, which is
heavily based on diffing flat-test output of successive test runs.)
> The trouble is that this output is the same on all machines while the
> execution times are not. I don't know how much variance can there be, but I
> strongly suspect it can go well beyond 10%.
Suppose Kim and I run the same test at the same time and see:
MSEC_0.cns run=400 disk=10000 spreadsheet=700 [Kim]
MSEC_0.cns run=555 disk=11111 spreadsheet=777 [Greg's fancy machine]
MSEC_0.cns run=777 disk=23456 spreadsheet=999 [Greg's old machine]
That's absolutely fine. Our timings are not going to be similar.
But on each machine we'll save a series of output files that will
be comparable with other files in the same series on the same machine.
> (although IME it's not so small neither, I had to bump up the tolerance to
> 10% after regularly getting false positives with 5%) run-time variations
> between the runs on the same machine, it wouldn't help at all with
> comparing outputs from different machines.
Let's stop doing that. We try 5%, and it doesn't work, so we try 10%,
figuring that will "fail" less often, but knowing it can still "fail";
then it fails, and we have to spend time thinking about it...
> GC> so we don't need timings like this
> GC> time_run=434
> GC> time_disk=11000
> GC> time_spreadsheet=710
> GC> in 'wx_test.conf'. But then the configuration file isn't needed
> GC> at all.
>
> We definitely can implement it like you suggest above and if the tests, or
> at least this particular test involving the timings, will only ever run on
> a single machine, then I agree it's a good solution. But if we want to be
> able to check on any arbitrary machine that the timings haven't changed too
> much, e.g. after making some change to the code, then it isn't and I still
> believe that some kind of a local configuration file is needed.
>
> Please let me know what is your decision about this.
Thanks for thinking all of this through and pointing out the pitfalls.
These considerations might have much greater weight with a different
audience. But we're an actuarial department: we're very comfortable
with judging numerical differences at a glance. Automated comparisons
that would help others just get in our way.