[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Is there a way to "asciify" a string?
From: |
Richard Wordingham |
Subject: |
Re: Is there a way to "asciify" a string? |
Date: |
Thu, 31 May 2018 23:52:07 +0100 |
On Thu, 31 May 2018 17:08:47 +0200 (CEST)
"S. Champailler" <schampaillerspam@skynet.be> wrote:
> I second that, removing accents and other "nationalities" is much
> trickier than one might expect (you can look at Java example, the
> Java unicode support is quite complete), especially for lanugages far
> away from english such as russian. By "tricky" I mean there are
> *hundreds* of edge cases. Nevertheless, there are ways do sort of do
> what you want by playing with thigsn such as "non spacing combining
> characters", "normalized strings", etc. If you have the opportunity,
> just try to do it, the great lesson you'lll get of that is that human
> languages are super complexe (and thus super interesting).
Make sure you transliterate the string first. Remember that stripping
out Indic vowels (many of which are gc=Mn) is no more reasonable than
stripping out ASCII vowels.
> Today, everyone should use Unicode, it's much simpler. Many file
> systems support unicode.
But be warned that some very different strings may compare equal. The
Unicode Collation algorithm is highly likely *not* to be the default.
Windows XP used to compare strings of Canadian Aboriginal Syllabics of
the same length as equal. I remember using sort -u to remove duplicates
from a list of words on a Linux distribution, and finding that I only
had one left. I now play safe and do that sort of trick in the C locale.
Richard.
- Re: Is there a way to "asciify" a string?, (continued)
- Re: Is there a way to "asciify" a string?, John Mastro, 2018/05/30
- Re: Is there a way to "asciify" a string?, tomas, 2018/05/27
- Re: Is there a way to "asciify" a string?, Philipp Stephani, 2018/05/28
- Re: Is there a way to "asciify" a string?, Marcin Borkowski, 2018/05/28
- Re: Is there a way to "asciify" a string?, tomas, 2018/05/28
- Re: Is there a way to "asciify" a string?, Yuri Khan, 2018/05/28
- Re: Is there a way to "asciify" a string?, tomas, 2018/05/28
- Re: Is there a way to "asciify" a string?, Marcin Borkowski, 2018/05/30
- Re: Is there a way to "asciify" a string?, Stefan Monnier, 2018/05/31
- Re: Is there a way to "asciify" a string?, S. Champailler, 2018/05/31
- Re: Is there a way to "asciify" a string?,
Richard Wordingham <=
- Re: Is there a way to "asciify" a string?, Marcin Borkowski, 2018/05/31
- Re: Is there a way to "asciify" a string?, Eli Zaretskii, 2018/05/31
- Re: Is there a way to "asciify" a string?, Yuri Khan, 2018/05/31
- Re: Is there a way to "asciify" a string?, Stefan Monnier, 2018/05/31
- Message not available
- Re: Is there a way to "asciify" a string?, James K. Lowden, 2018/05/31
- Re: Is there a way to "asciify" a string?, Stefan Monnier, 2018/05/31
Re: Is there a way to "asciify" a string?, Eric Abrahamsen, 2018/05/27
Re: Is there a way to "asciify" a string?, Eli Zaretskii, 2018/05/27