bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bu


From: Pádraig Brady
Subject: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086
Date: Tue, 09 Oct 2012 13:51:03 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 10/09/2012 01:32 PM, Jim Meyering wrote:
Pádraig Brady wrote:
On 10/09/2012 12:52 PM, Jim Meyering wrote:> Eric Blake wrote:
On 07/26/2011 02:33 PM, marcel partap wrote:
Here's a patch. Adds STRCASEEQ_LEN macro for case insensitive extension
matching.
#regards/marcel.

Your patch would make the new behavior unconditional.  But I like case
sensitivity, and think that case insensitivity should be an opt-in
process that I request, with coordination between dircolors to
generate a new string for LS_COLORS to be honored by ls.  Furthermore,
the patch is lacking in NEWS, documentation, and testsuite coverage.

Additionally, you should be aware that strncasecmp() has undefined
behavior in non-C multibyte locales.  It would probably be better to
use c_strncasecmp(), so that you are guaranteed defined behavior
regardless of the current locale.

Would you care to tackle those additional issues?  And are you set up
for copyright assignment, since the patch will probably be non-trivial
by that point in time?

I saw this while going back through old bugs, so started poking around.
How about this: if a suffix is not matched, convert it to lower case
and check again.  That way, anyone who cares to highlight .TaR or .TAR
differently from .tar can still easily do so, yet names ending with
uppercase .TAR, .JPG, .FLAC, etc. will now be highlighted by default.

Does anyone think it's worth making this new fallback-to-case-insensitivity
an option?

Not me anyway.

While looking at that, I realized that comparing each name in an ls'd
directory with each of nearly 100 suffixes should leave nontrivial room
for improvement.  Sure enough, once I'd replaced that linear search
with a hash-table lookup, I see a measurable improvement.

and taking best-of-10 times, I see a 0.34 -> 0.23 (~30%) improvement.

Very nice.

Of course, I have deliberately used the case that shows the most improvement:
    - a directory with very many files
    - no suffix match, so the old code searches all suffixes
    - using tmpfs: minimal stat overhead

@@ -4343,21 +4412,11 @@ print_color_indicator (const struct fileinfo *f, bool 
symlink_target)
       }

     /* Check the file's suffix only if still classified as C_FILE.  */
-  ext = NULL;
-  if (type == C_FILE)
-    {
-      /* Test if NAME has a recognized suffix.  */
-
-      len = strlen (name);
-      name += len;             /* Pointer to final \0.  */
-      for (ext = color_ext_list; ext != NULL; ext = ext->next)
-        {
-          if (ext->ext.len <= len
-              && STREQ_LEN (name - ext->ext.len, ext->ext.string,
-                            ext->ext.len))
-            break;
-        }
-    }
+  char *suffix;
+  struct sufco *ext;   /* Color extension */
+  ext = (type == C_FILE && (suffix = strrchr (name, '.'))
+         ? suffix_color_lookup (suffix)
+         : NULL);

So do we now only support suffixes delimited by '.' ?
Previously the delimiter was arbitrary or optional:

   touch star; LS_COLORS="*tar=01;31" /bin/ls --color *tar

Good catch.  I realized that early on, but then forgot to mention it.
Yes, I would have to document that the "." is now required.
It seems like a reasonable restriction, but technically
it could be called a regression.

Also, with these changes, a multiple-"." suffix will no longer work.
I.e., before, if you wanted to give *.tar.xz files a color different
from plain *.xz files, you could.

Does anyone object to that?

It's marginal, though I'd be inclined to keep the existing
support for arbitrary suffixes. We could fall back to the
slower linear scan iff an entension entry in LS_COLORS didn't
contain a single '.'  To be more generally performant and
support a longest suffix match we'd have to use something
like a trie I think.

cheers,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]