|
From: | Bruno Haible |
Subject: | optimizing localcharset |
Date: | Sun, 20 May 2018 13:38:06 +0200 |
User-agent: | KMail/5.1.3 (Linux/4.4.0-119-generic; KDE/5.18.0; x86_64; ; ) |
Hi, Since wcwidth() reportedly has become a bottleneck [1][2], and some of the time the gnulib wcwidth() replacement spends is in localcharset(), let me optimize localcharset(). Patch 1 removes support for Linux libc5 (obsolete since ca. 2001), glibc 2.0.x (last used in Red Hat Linux 5.2, obsolete since ca. 2003 [3][4]) and Mac OS X 10.2 (obsolete since 2003-2005 [5]). Patch 2 adds a simple manual test, so that I can verify the results are as expected when doing changes to the code. Patch 3 removes the ability to specify the platform-dependent mapping in an external file. This ability was useful up until ca. 2007. config.charset has not changed for Unix platforms since 2010, therefore it is safe to assume that the current mappings are nearly correct, i.e. not many people will need to adjust them, and those that do can report it here or change the source code locally. At the same time, introduce a binary search for the mapping lookup. The ultimate optimization of the tables would be through gperf, but this comes with the cost of several extra files in the source code tree, and is mostly relevant for old platforms only. Patch 4 adds missing mappings. Found while testing on various platforms. Patch 5 is a micro-optimization. I also attempted to replace the binary search that works with strcmp() with one that progresses one character at a time. This has the same overall asymptotic complexity, and uses ca. 20% less memory accesses, but is slower by a factor of 2. 'perf annotate' told me that this is because apparently strcmp() has a fast implementation in glibc, whereas the "one character at a time" algorithm uses plain x86_64 instructions throughout. And apparently the cost of calling the function strcmp() is negligible. Bruno [1] https://lists.gnu.org/archive/html/bug-gnulib/2018-04/msg00059.html [2] https://lists.gnu.org/archive/html/coreutils/2018-05/msg00013.html [3] https://distrowatch.com/table.php?distribution=redhat [4] https://en.wikipedia.org/wiki/Red_Hat_Linux [5] https://en.wikipedia.org/wiki/Darwin_(operating_system)#Release_history
0001-localcharset-Remove-support-for-obsolete-platforms.patch
Description: Text Data
0002-localcharset-Add-a-manual-test.patch
Description: Text Data
0003-localcharset-Move-mapping-tables-into-the-code.patch
Description: Text Data
0004-localcharset-Map-the-locale-encodings-found-in-newer.patch
Description: Text Data
0005-localcharset-Optimize.patch
Description: Text Data
[Prev in Thread] | Current Thread | [Next in Thread] |