[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: seq feature: print letters
From: |
Assaf Gordon |
Subject: |
Re: seq feature: print letters |
Date: |
Thu, 29 Jan 2015 00:03:07 -0500 |
Hello William,
On Jan 28, 2015, at 19:57, William Bader <address@hidden> wrote:
<...>
> I think that bloat is an important issue. Systems with limited resources
> need to run coreutils. Would a smart watch need to print a sequence of
> letters to run?
<...>
> If you need a sequence with letters, you can always use another filter to
> convert numbers to letters, for example,
>
> seq 1 10 | awk -e '{ printf "%c\n", ($1+64) }'
This example works well for English, but English characters are rarely an
issue, since many shells support the "{A..Z}" syntax.
However for almost all other non-English languages there are unique and
specialized sequences in the unicode standard, such as non-sequential
point-codes and multi-symbol "letters".
A visual way to appreciate the complexity is the unicode/CLDR website and its
charts:
http://www.unicode.org/cldr/charts/26/by_type/core_data.alphabetic_information.index.html
Scrolling down to the "latin languages" chart section, one can see the
variability in letter inclusion for each language.
Another issue is properly supporting all the environments in which coreutils
can operate, including non "utf-8" locales, and even EBCDIC (in which even
English letters are not consecutive, e.g. this post from 2005:
http://lists.gnu.org/archive/html/bug-coreutils/2005-04/msg00189.html ).
The current suggested patch handles all those cases, at the cost of including
the unicode modules from gnulib.
These are the main reasons for the complexity/size of the feature.
---
This is not to say the feature is worth or not worth the added size (or bloat);
I think by now it's not a technical decision, but more of a strategic one.
I personally like it, but I can understand if others prefer not to include it
in coreutils and put it elsewhere.
- Assaf