emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Emacs-diffs] master f18af6c: Audit use of lsh and fix glitches


From: Pip Cet
Subject: Re: [Emacs-diffs] master f18af6c: Audit use of lsh and fix glitches
Date: Fri, 24 Aug 2018 18:00:10 +0000

On Thu, Aug 23, 2018 at 3:56 PM Stefan Monnier <address@hidden> wrote:
> > Well, it isn't `==' if we're building with
> > --enable-check-lisp-object-type,
>
> Currently, the resulting assembly code should be pretty close even in
> that case, tho.

Okay; I wasn't sure whether we were talking about a literal "people
use == on Lisp_Objects" problem or not. The resulting assembly code is
indeed equivalent to a `==', it seems.

> The whole purpose of hash-consing (for me) is to avoid turning EQ into
> something like:
>
>     if (BIGNUMP (x))
>         return slow_eq (x, y);
>     else
>         return x == y;
>
> What we do in slow_eq is largely irrelevant: the problem is the cost of
> `if (BIGNUMP (x))`, both in terms of code size and processing time.

I understand that.

In fact, the cost is fairly (and, to me, surprisingly) high, about 1%.
That is, indeed, way more than I thought, and I'll have to look at the
assembler code to figure out why it's so expensive.

But what I want is only about one tenth as bad (in terms of code size)
as what you describe:

the code you don't want:
 13 .text         00227f8e  0000000000419a00  0000000000419a00  00019a00  2**4
the code you do want (i.e. vanilla):
 13 .text         001d945e  0000000000419a00  0000000000419a00  00019a00  2**4
the code I want:
 13 .text         001e0b3e  0000000000419a00  0000000000419a00  00019a00  2**4

(I got it down to 24816 bytes of code size difference with another
compiler, but still using the standard make flags on an x86_64
pc-linux-gnu system.)

The performance penalty is a quarter of a clock cycle per problematic
EQ, on this machine, though obviously anything in that range depends
on your precise CPU architecture and surrounding insns that affect
superscalar scheduling. We call EQ a lot, there are between 15 and 32
billion problematic calls in the temacs/emacs invocations to rebuild
all .elc files in the Emacs distribution. ("problematic" means that
gcc wasn't able to prove that one argument to EQ couldn't possibly be
a bignum, and had to emit a conditional branch insn). So that also
works out to something in the 1% range. (However, upon inspection it
turns out that adding the debugging code makes gcc fail to optimize
away many of the problematic calls, so the actual number may be much
less than 1%).

The size of the emacs binary, stripped, is also about 1% more with my code.

The difference between the code you suggested and mine is mostly
NILP(), though my test code optimizes away all tag bit checks if the
compiler proves either argument is definitely not a bignum, or that
the arguments have different tag bits. I'm also using
__builtin_expect, something I think should be okay in this
exceptionally hot single location.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]