bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Skill level names


From: Lasse Hjorth Madsen
Subject: Re: Skill level names
Date: Mon, 8 Jul 2024 23:47:47 +0200

Thanks for pointing this out, Tim. I also think it is more appropriate to divide the sum of errors by the total number of moves, rather than the number of unforced moves.

From a statistical point of view, whenever you subset, you risk introducing bias. Here, we subset all moves to unforced moves only. In this case we may create a bias that favor stronger players, as they probably know better than the average player how to avoid getting a lot of forced non-moves while on the bar.

I generally think it's better to correct past errors that to replicate them, so I think gnubg should do just that, and divide by all moves.

I don't expect many players to agree, though.

/Lasse

man. 8. jul. 2024 kl. 21.46 skrev Timothy Y. Chow <tchow@math.princeton.edu>:
Ian Shaw wrote:

> The scaling of the PR values comes historically from Snowie, which used
> the sum of both players' moves as the divisor. Gnubg uses only the
> player's unforced moves, which naturally means gnubg error rates are at
> least double Snowie error rates. When XG was created Xavier calculated
> the error rates using the same method as gnubg, but then divided by 2 to
> scale them to the match Snowie Error Rate, which is what most people
> were familiar with.

XG's definition of PR is rather complicated because of its definition of a
"decision":

http://timothychow.net/cg/www.bgonline.org/forums/197598.html

Since PR has become a de facto standard, it makes sense to try to
replicate it. But replicating PR would require some additional programming
since it's not quite the same as GNU's native error rate calculation.

I'm not in favor of dropping Snowie ER entirely. It has its merits, or
rather, PR has its pathologies. Neil Robins pointed out one surprising
example here:

https://www.bgonline.org/forums/webbbs_config.pl?read=154585

More generally, as I've stated numerous times on rec.games.backgammon and
BGOnline, eliminating forced or obvious moves from the denominator has
some strange consequences that most people don't seem to appreciate. One
reason we divide the total equity lost by the length of the session is so
that errors are weighted according to their *frequency of occurrence in
actual play*. If a very unusual type of decision arises and I botch it,
then that should not count against me as much as a very common type of
decision that I mess up (assuming both types of mistake cost 0.05 each,
say). So far so good.

But now think about what happens if we delete forced moves from the
denominator. That means that errors occurring in games with a lot of
forced moves hurt our PR more than errors occurring in games with no
forced moves. In two separate games, I might make a error of exactly the
same size, but in one game I get unlucky and get closed out. My PR will
probably suffer more in the game where I have bad luck, because I'll be
dividing my equity loss by a smaller number. Is this what we really want
from PR? Maybe, maybe not. It's not obvious to me. A large majority of the
backgammon community has somehow gone along with this way of doing things
without thinking it through, or even recognizing that there is something
to think about here.

Somehow people have come to conceptualize a backgammon session as a
sequence of quiz problems, where the only role of the denominator is the
measure the length of the quiz, but in reality there can be correlations
(or anti-correlations) between the *types* of decisions you're presented
with in a game and the *number* of decisions in the game. By messing with
the denominator in a funny way, PR produces some strange and
hard-to-understand effects. ER has the advantage of keeping things simple:
the denominator is just the number of rolls. That is the most obvious
measure of length, and it has the advantage of being simple to understand.

If GNU stops calculating Snowie ER, then it will be very difficult to
extract this potentially illuminating and instructive statistic from a
backgammon session.

Tim


reply via email to

[Prev in Thread] Current Thread [Next in Thread]