[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [External] : Re: Strange whitespaces.
From: |
Hongyi Zhao |
Subject: |
Re: [External] : Re: Strange whitespaces. |
Date: |
Fri, 1 Oct 2021 15:26:51 +0800 |
On Fri, Oct 1, 2021 at 2:36 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Hongyi Zhao <hongyi.zhao@gmail.com>
> > Date: Fri, 1 Oct 2021 09:51:22 +0800
> > Cc: help-gnu-emacs <help-gnu-emacs@gnu.org>
> >
> > > We now highlight any non-ASCII character whose Unicode
> > > general category is "Space Separator" (or Zs for short).
> >
> > I fail to see the connection between the abbreviation and the original
> > representation it stands for.
>
> You mean, Zs vs "Space Separator"?
Yes.
> Please complain to the Unicode Consortium about any of that, Emacs just uses
> the names and
> nomenclature they invented. See
>
> https://www.unicode.org/reports/tr44/#General_Category_Values
I presumably basically figured out the logic behinds the nomenclature:
Based on the Description given on the above URL:
a space character (of various non-zero widths)
So, the Z <---> non-zero, and s <---> space.
This is like the naming rules used in regular expression
metacharacters, say, in python [1]:
\s
For Unicode (str) patterns:
Matches Unicode whitespace characters (which includes [ \t\n\r\f\v],
and also many other characters, for example the non-breaking spaces
mandated by typography rules in many languages). If the ASCII flag is
used, only [ \t\n\r\f\v] is matched.
For 8-bit (bytes) patterns:
Matches characters considered whitespace in the ASCII character set;
this is equivalent to [ \t\n\r\f\v].
\S
Matches any character which is not a whitespace character. This is the
opposite of \s. If the ASCII flag is used this becomes the equivalent
of [^ \t\n\r\f\v].
[1] https://docs.python.org/3/library/re.html
HZ