help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to convert an arbitrary string into a filename


From: Platon Pronko
Subject: Re: How to convert an arbitrary string into a filename
Date: Thu, 27 Apr 2023 16:06:59 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1

On 2023-04-27 13:53, Eli Zaretskii wrote:
Date: Thu, 27 Apr 2023 07:52:55 +0300
From: Jean Louis <bugs@gnu.support>
Cc: help-gnu-emacs@gnu.org

* Eli Zaretskii <eliz@gnu.org> [2023-04-26 16:09]:
If you need to convert an accented character to its base character
(i.e. "remove" the accent), Emacs has much more general facilities:

   (require 'ucs-normalize)
   (substring (ucs-normalize-NFKD-string "Ć") 0 1)
    => "C"

Alright, then like this:

(defun string-slug (s &optional random)
   "Return slug for Website Revision System by using string S.

RANDOM number may be added on the end."
   (let* ((random (or random nil))
         ;; (case-fold-search t)
          (s (replace-regexp-in-string "[^[:word:]]" " " s))
          (s (replace-regexp-in-string " +" " " s))
         (s (substring (ucs-normalize-NFKD-string s) 0 1))
          (s (replace-regexp-in-string "^[[:space:]]+" "" s))
          (s (replace-regexp-in-string "[[:space:]]+$" "" s))
          (s (replace-regexp-in-string " " "-" s))
         (s (if random (concat s "-" (number-to-string (random-number))) s)))
     s))

(string-slug " OK, here, üößčć") ➜ ""

It doesn't give good result.

Of course.  Because you didn't understand how to use
ucs-normalize-NFKD-string for your purposes.  Please read its doc
string, and try to play with it, starting from the example I've shown.


I think something like this should work better:

(replace-regexp-in-string ucs-normalize-combining-chars-regexp "" 
(ucs-normalize-NFKD-string "Ć"))

The idea here is to replace "combined" codepoints with their Compatibility Decomposition, so instead of one 
"Ć" codepoint (0x0106) you will get "C" codepoint (0x43) followed by "combining acute 
accent" codepoint. Then you can regex-replace these combining characters and get the clean string.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]