dotgnu-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DotGNU]Could someone test this on all platforms ?


From: Paolo Molaro
Subject: Re: [DotGNU]Could someone test this on all platforms ?
Date: Sun, 9 Feb 2003 12:58:23 +0100
User-agent: Mutt/1.4i

On 02/09/03 Rhys Weatherley wrote:
> > This assumes PNetMark is a valid benchmark program: it's not enough for
> > a benchmark to be well-understood, it should also reflect usage
> > patterns, for example, to have any respectable results.
> > Hopefully some independent expert will write a benchmark for the CLR
> > or an existing benchmark may be ported and validated.
> > In the mean time the field is open for armchair benchmarkers! :-)
> 
> PNetMark is a direct port of the CaffeineMark from Java to C#, with only 1 
> differerence - it uses C#'s 2D arrays in the Float benchmark instead of 
> arrays of arrays, as in Java.  You are welcome to verify this for yourself, 
> and to submit patches if you do find a discrepancy.  The CaffeineMark has 

A quick google search turned up only pages that said the caffeinemark
source code is not available. Do you have an url handy?
It seems caffeinemark has 9 or 11 tests, depending on the version, while
pnetmark has only 5, so, even assuming caffeinemark is a valid
benchmark, pnetmark is in no way comparable to it. Yes, some of the
tests may require stuff not implemented by pnet (or mono, for that
matter), but this simply means that you can't compare pnetmark with
caffeinemark.
The caffeinemark people report they fixed their benchmark in version 3.0
to use proper tests, so that the execution engine actually does execute
the code: at least in the logic benchmark in pnetmark, this is not the
case, the Run() function could be implemented with a simple 'return;',
so it doesn't given very meaningful results. Note: writing the benchmark
tests correctly would reduce the score of an optimizing engine and
favour pnet.
There is one other thing that turns out in the pnetmark benchmarks: none
of them check the execution engine executed the code correctly: I think
this is a requirement for a benchmark, a score is of no use if the code
was executed incorrectly.

> been widely used to benchmark the compute performance of JVM's, which is why 
> I chose it.

It is completely fine to start from an existing widely used benchmark, but 
that doesn't mean the benchmark is valid. BTW, you write the same thing
yourself in your pnetmark faq. Let me quote:

=== cut cut cut
6. Vendor X's engine has a higher PNetMark than you. What does that mean?

Basically nothing. Using PNetMark to compare Portable.NET with other CLR
implementations will probably give bogus results.
=== cut cut cut

> I was extremely careful not to embellish it, or to slant it towards pnet.  
> Since version 0.0.1, the only changes have been to the build system, FAQ, and 
> detection of runtime engines that don't have floating point (for benchmarking 
> embedded CLR's).  The core code remains unchanged.

I never said it was slanted towards pnet, for all I know a proper
benchmark could give better results for pnet. The point is that,
for example, the Magnification values need a rationale. If you change
the magnification you can tweak the final score from pnetmark, this
doesn't mean that they were tweaked to give a doctored result, but it simply
means, IMHO, that there is no justification for their current values.
For example the string benchmark the value is 2771 and for the float one
it's 4449. This roughtly means you give 1.6x more importance to the float
benchmark: what is the rationale for this? If someone does a research
on a representative corpus of code for the CLR and he comes up with
mostly the same magnification factor I'll have no problems accepting the
value. As things stand right now, there is no evidence that this
research has been done.
Almost on the same topic, consider the string benchmark: it
only calls three functions:
        StringBuilder::Append (string)
        StringBuilder::ToString ()
        String::IndexOf (string, int)

I doubt anyone would consider those three functions representative
for the performance of the CLR on strings. At the minimum I would add
String::Equals(), String::get_Chars(), a couple of string constructors.
But to choose the right methods to test someone has to do a research
on the use of String and StringBuilder on a corpus of CLR programs.
If you have done the research, please, share the results with us, as it
is I find it hard to believe that those three methods are representative.
As a simple example, the current mono gets about 1650 in the string
benchmark. It took 5 minutes to write a patch:

StringBuilder.cs |   15 ++++++++++++++-
  1 files changed, 14 insertions(+), 1 deletion(-)

to get a score of 5330 and an improvement of the total score from
about 6500 to 8000, a 23% improvement. Now, after the patch can we
consider mono generally 23% faster? The answer is no and the reason is
that the benchmark is not representative and hence, IMHO, not useful.

> If you would care to suggest a better benchmark to port, then do so.  But 

It looks like both caffeinemark and jmark don't have sources available,
so I'm not sure there are ready-to-port java benchmarks. A good start
may be the benchmarks from http://www.bagley.org/~doug/shootout/
(there is a port to C# of some of the programs at
http://dada.perl.it/shootout/). The benchmarks check they were executed
correctly, there are many more tests, so it should be more
representative, it takes into account more factors, like memory usage
and startup speed (hey, pnet is fast to start, so don't say I'm
proposing it to play it down;-). It may have some shortcomings, but it
seems like a better starting point than caffeinemark.
I would also keep two separate scores for two different kind of
benchmarks: those that test the execution engine and those that test
the library implementation. Of course a total of the two can be computed
as well, but I think it makes sense to also keep those results distinct.

> don't insult my profressionalism by implying that PNetMark is not 
> independently written or verifiable just because it has "pnet" in its name.

You said you just did a direct port of caffeinemark, so why should you
take it personally, I don't know. Moreover I just repeated on the list
what you wrote yourself in the pnetmark FAQ: the results from pnetmark
are probably bogus. I'm usually happy when people agree with me, looks
like you get upset instead; I'll try not to agree with you in public
ever again;-) ;-)
I didn't say that pnetmark was unverifiable: see my comments above
about what I think on the validity of the benchmarks and how the scores
are computed; your comments on the issues raised are welcome.

Cheers.

lupus

-- 
-----------------------------------------------------------------
address@hidden                                     debian/rules
address@hidden                             Monkeys do it better


reply via email to

[Prev in Thread] Current Thread [Next in Thread]