bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#13041: 24.2; diacritic-fold-search


From: martin rudalics
Subject: bug#13041: 24.2; diacritic-fold-search
Date: Fri, 07 Dec 2012 11:37:00 +0100

> This is usable to sort and compare strings, but I don't see
> how ucs-normalize.el could help in the search.  I suppose the
> searched buffer can't be normalized before starting a search.

You can either temporarily

- leave the text alone but give each string that should be handled
  specially a text property with the normalized form.  In this case
  searching has to pay attention to these properties, if present.

- normalize the text and give each normalized string a text property
  with the original text.  In this case searching will proceed as usual
  but you have to restore the original text when done.

I don't know how feasible these are for searching.  But I used the
second approach for sorting without problems.

Also I don't know how to handle the return value and/or highlighting
when, for example, finding a match for "suf" within "suffer".  For
example, replacing each occurrence of "suf" with the empty string should
leave us with "fer" here.  So in this case, we have to deal with the
normalized string anyway.  OTOH replacing a match for "res" in "résumé"
with the empty string should probably leave us with "umé".

> So the search function somehow should be able to skip combining
> characters in the buffer.  But to do this, the translation table needs
> to contain additional information about certain characters to ignore.
> Also the translation table should be able to map a sequence of
> characters like "ss" to "ß".

I have no idea how many mappings like "ß" -> "ss" exist.  The problem is
that we don't get them from UnicodeData.txt IIUC.

martin






reply via email to

[Prev in Thread] Current Thread [Next in Thread]