bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#55331: Improved support for combining diacritics


From: Paul Eggert
Subject: bug#55331: Improved support for combining diacritics
Date: Mon, 9 May 2022 11:30:28 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1

On 5/8/22 23:38, Benson Muite wrote:
When using

grep -E 
"\s[a-z\`\'āáàēéèīíìịị̄ị́ị̀ōóòọọ̄ọọ́ọ̀ūúùụ̄ụ́ụ̀n̄ńǹm̄ḿm̀]{4}$"

to extract 4 letter Igbo words

The {4} means "4 characters", not "4 letters", and a combining character counts as a character.

It might be nice for 'grep' to have ways to perform Unicode normalization before matching. In the meantime perhaps you can get what you want by normalizing the text before running it through 'grep'.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]