bug-gperf
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gperf] shrinking the asso_values array


From: Mark Stankus
Subject: Re: [bug-gperf] shrinking the asso_values array
Date: Thu, 5 Sep 2019 09:58:20 -0700

Ok, thanks.

Mark Stankus

On Wed, Sep 4, 2019 at 5:13 PM Bruno Haible <address@hidden> wrote:
>
> Hi Mark,
>
> > FOURTH SUGGESTION ( MEDIUM DIFFICULTY )
> >
> >   I was wondering if it would be worthwhile to use an "if" statement
> > and a shorter static array asso_values in the "hash" function in some
> > cases. The reason would be that the array is 256 bytes and loading
> > different parts of the array could use up to 4 cache lines. Loading
> > caches lines is very expensive from the point of view of the CPU
> > (sometimes 100 cycles) as opposed to an if statement (which is a lot less,
> > but is blocking.
> >
> >   I looked at permut2.gperf in the examples directory of the gperf
> > distribution. I will talk about this and next suggestion via examples
> > related to that file.
> >
> >   In our case, the largest hash value is 5 and all but four entries in
> > the array are 6.
> > Recall that in the perumt2.gperf example, only the letters x, y and z are 
> > used.
> >
> >   Instead of
> >
> >   return len + asso_values[(unsigned char) str[1] + 1] +
> > assoc_values[(unsigned char)[str[0]];
> >
> > in the hash function, there could be an optimized(?) version of the 
> > following
> >
> >   unsigned int returned_value = 6;
> >   if ( len <= 2 && 'x' <= str[1] && str[1] <= 'z' &&  'x' <= str[0] &&
> > str[0] <= 'z' )
> >  {
> >     unsigned char first = (unsigned char) str[1] - 'x' + 1;
> >     unsigned char second = (unsigned char) str[0] - 'x';
> >     returned_value len + modified_asso_values[first] +
> > modified_asso_values[second];
> >   }
> >   return  returned_value;
>
> On today's CPUs, with their long pipelines of prepared instructions, the
> condition you are writing there would cost 5 conditional jumps. Or, when
> we leave out the redundant 'len <= 2' test (already done by the caller)
> and assume an optimizing compiler that optimizes
>   'x' <= str[1] && str[1] <= 'z'
> to
>   (unsigned int)(str[1] - 'x') <= 'z' - 'x'
> we still have 2 conditional jumps.
>
> > If either letter is out of range, the array is not accessed at all.
>
> Optimizing cache behaviour at the expense of conditional jumps is not
> good. You need a reasonable balance between both.
>
> Bruno
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]