emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs i18n


From: Mattias Engdegård
Subject: Re: Emacs i18n
Date: Thu, 28 Mar 2019 12:03:26 +0100

27 mars 2019 kl. 22.22 skrev Juri Linkov <address@hidden>:
> 
> I tried ‘regexp-opt’ and it generates a ready-to-use regexp:
> 
>  (replace-regexp-in-string
>   "%d" "\\\\([0-9]+\\\\)"
>   (regexp-opt '("finished with %d match found"
>                 "finished with %d matches found"
>                 "finished with no matches found")))
> 
>  ⇒ "\\(?:finished with \\(?:\\(?:\\([0-9]+\\) match\\(?:es\\)?\\|no 
> matches\\) found\\)\\)"

Well now. There is no guarantee that regexp-opt won't split the %d. Format 
strings must be parsed left-to-right for correctness¹. I'm still skeptical, but 
if you really want to give this a try, then first segment the format string:

"Today %d little piggies built %03o houses and said '%s'."
"Today %d little piggy built %o house and said '%s'."
=>
("Today " ?d " little piggies built " ?o " houses and said '" ?s "'.")
("Today " ?d " little piggy built " ?o " house and said '" ?s "'.")

leaving the format placeholders as atomic entities (here shown as characters, 
but you may need more information there).
Then run your fav diff algo on the result. Most important to performance is 
prefix merging; anything else is just to make the regexp smaller.

Here, prefix and suffix merging would leave you with (still in abstract form)

("Today " ?d " little pigg"
 (("ies built " ?o " houses")
  ("y built " ?o " house"))
 " and said '" ?s "'.")

From there you can either recursively try to find more common subsequences, or 
call it a day and render it into a regexp:

"Today -?[0-9]+ little pigg\\(?:ies built -?[0-7]+ houses\\|y built -?[0-7]+ 
house\\) and said '\\(?:.\\|\n\\)*'."

All this will need to be done at run-time, since it is run on translated 
strings.

¹ To match format parameters, try something like
  (rx "%"
      (opt (1+ digit) "$")
      (0+ digit)
      (opt "." (0+ digit))
      (any "%sdioxXefgcS"))




reply via email to

[Prev in Thread] Current Thread [Next in Thread]