bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bsearch utility


From: Paul Eggert
Subject: Re: bsearch utility
Date: Fri, 02 Sep 2005 11:35:28 -0700
User-agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

Sorav Bansal <address@hidden> writes:

> I have now ported look.c to the latest CVS repository too. In doing
> so, I noticed that the new keycompare() function no longer calls
> 'trim_trailing_blanks()'. Its not clear to me, why these calls are not
> necessary anymore. Is that because if the keys match till the last
> non-blank character, then you allow the number of trailing blanks to
> be considered as a tie-breaker?

Yes.  POSIX requires this and other 'sort' implementations do it that
way, so we thought it better to be compatible.  Please see the
2004-04-25 ChangeLog entry.

> These calls are needed for 'look' though. For now, I am adding the
> calls and commenting with XXX in compare.h.

I guess they'll need to be conditional on whether it's 'look' rather
than 'sort'.

>>I don't see the need for the -B option, or the -l option.  Can't
>>we omit them?
>>
> I thought '-l' could be useful for looking into unsorted files. '-B'
> can be used for setting the buffer size which affects performance,
> especially when doing linear search.

But doesn't 'grep' do the same job that look -l would?  Perhaps I'm
missing something here.

> However, I agree that they are not particularly necessary.

OK, then (unless I'm missing something) let's please omit them for
now.  We can always put them in later if there's a need.

Speaking of which, the most important thing needed for this change is
documentation.  We need a section in doc/coreutils.texi.  I'd put it
just after the 'tsort invocation' section.  Also, a couple of lines
in NEWS.

>>Isn't fseeko kind of a loser, performance-wise?  Wouldn't it
>>be faster if you used lseek, and took block boundaries into
>>account?  I suppose this is more of a tuning thing tho.
>>
> How do you propose using block boundaries into account? I am not sure
> what you mean, so I am leaving the fseeko() calls in place right now.

For POSIX-conformance reasons, fseeko is required to do an underlying
lseek to the exact same position; in effect, buffering is turned off.
Thus you'll often get better performance by seeking to buffer
boundaries, even if you use fseeko.  Often it's easier just to use
lseek.

This is just a performance issue, not a correctness issue, so we
shouldn't let it slow us down too much.  Better to get it right
(first) and then speed it up.  (But I thought I'd mention it anyway.
:-)

> This time, I have tried my best to match the style in
> coreutils. Please point me to specific areas where I am not conforming
> to the coding style and I would be glad to correct them.

These are all fairly minor, but they'll have to be done anyway so
we appreciate your help.

Comments should look like this:

  /* Use English sentences.  Break long lines
     like this.  Put two spaces after each period.  */

Prefer "const *" to "*".

Don't use "register".

Don't put a space before ";".

Don't parenthesize the expression X in "return X;".

Use x2realloc or x2nrealloc to double a buffer's size, instead of
checking for size overflow by hand.

Prefer 'bool' to 'int' where either will do.

Thanks again for helping with this!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]