Improve `replace-regexp-in-string' ergonomics?

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Improve `replace-regexp-in-string' ergonomics?

From:	Lars Ingebrigtsen
Subject:	Improve `replace-regexp-in-string' ergonomics?
Date:	Wed, 22 Sep 2021 06:36:27 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

`replace-regexp-in-string' often leads to pretty awkward code.  I wonder
whether we could improve it somehow.

Here's a real life example:

(defun org-babel-js-read (results)
[...]
       (org-babel-read
        (concat "'"
                (replace-regexp-in-string
                 "\\[" "(" (replace-regexp-in-string
                            "\\]" ")" (replace-regexp-in-string
                                       ",[[:space:]]" " "
                                       (replace-regexp-in-string
                                        "'" "\"" results))))))

That's kinda hard to read, but variations on this is pretty common.
When you have one `replace-regexp-in-string', you often have another.

We introduced `thread-last' in 2014, and there seems to be one (1) place
in the Emacs code base, so I guess that didn't take off, but rewriting
with that, we get:

       (org-babel-read
        (concat "'"
                (thread-last
                  results
                  (replace-regexp-in-string "'" "\"")
                  (replace-regexp-in-string ",[[:space:]]" " ")
                  (replace-regexp-in-string "\\]" ")")
                  (replace-regexp-in-string "\\[" "("))))

Which is somewhat more readable (but note that this totally breaks down
if you want to mix in LITERAL etc).  But I wonder whether we should
consider renaming the function to something more palatable, and since we
have `string-replace', why not `regexp-replace'?  The length of the name
of this common function is itself offputting.

       (org-babel-read
        (concat "'"
                (thread-last
                  results
                  (regexp-replace "'" "\"")
                  (regexp-replace ",[[:space:]]" " ")
                  (regexp-replace "\\]" ")")
                  (regexp-replace "\\[" "("))))

We could also consider making `regexp-replace' take a series of pairs,
since this is so common.  Like:

       (org-babel-read
        (concat "'"
                (regexp-replace "'" "\""
                                ",[[:space:]]" " "
                                "\\]" ")"
                                "\\[" "("
                                results)))

Or some variation thereupon with some more ()s to group pairs.

The most popular way to deal with the awkwardness is to just give up and
go all imperative:

(defun authors-canonical-author-name (author file pos)
[...]
  (when author
    (setq author (replace-regexp-in-string "[ \t]*[(<].*$" "" author))
    (setq author (replace-regexp-in-string "\\`[ \t]+" "" author))
    (setq author (replace-regexp-in-string "[ \t]+$" "" author))
    (setq author (replace-regexp-in-string "[ \t]+" " " author))

Which leads me to my other point -- about a quarter of the usages of the
function in Emacs core has "" as the replacement, so perhaps that should
have its own function?  `regexp-remove'?

Then that could be:

  (when author
    (setq author (regexp-remove "[ \t]*[(<].*$" author))
    (setq author (regexp-remove "\\`[ \t]+" author))
    (setq author (regexp-remove "[ \t]+$" author))
    (setq author (regexp-replace "[ \t]+" " " author))

or

  (when author
    (setq author
          (regexp-replace
           "[ \t]+" " " (regexp-remove
                         "[ \t]*[(<].*$" (regexp-remove
                                          "\\`[ \t]+" (regexp-remove
                                                       "[ \t]+$" author)))))))
or

  (when author
    (setq author
          (thread-last author
                       (regexp-remove "[ \t]*[(<].*$")
                       (regexp-remove "\\`[ \t]+")
                       (regexp-remove "[ \t]+$")
                       (regexp-replace "[ \t]+" " ")))))


Or...  something else.  I'm sure nobody else has thought about this
issue before.  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

[Prev in Thread]

Current Thread

[Next in Thread]

Improve `replace-regexp-in-string' ergonomics?, Lars Ingebrigtsen <=
- Re: Improve `replace-regexp-in-string' ergonomics?, Yuri Khan, 2021/09/22
  - Re: Improve `replace-regexp-in-string' ergonomics?, Lars Ingebrigtsen, 2021/09/22
  - Re: Improve `replace-regexp-in-string' ergonomics?, Thierry Volpiatto, 2021/09/22
- Re: Improve `replace-regexp-in-string' ergonomics?, Po Lu, 2021/09/22
  - Re: Improve `replace-regexp-in-string' ergonomics?, Lars Ingebrigtsen, 2021/09/22
    - Re: Improve `replace-regexp-in-string' ergonomics?, Po Lu, 2021/09/22
    - Re: Improve `replace-regexp-in-string' ergonomics?, Lars Ingebrigtsen, 2021/09/22
    - Re: Improve `replace-regexp-in-string' ergonomics?, Po Lu, 2021/09/22
- Re: Improve `replace-regexp-in-string' ergonomics?, Adam Porter, 2021/09/22
  - Re: Improve `replace-regexp-in-string' ergonomics?, Lars Ingebrigtsen, 2021/09/22

Prev by Date: Crash in bidi.c in Haiku port
Next by Date: Re: Improve `replace-regexp-in-string' ergonomics?
Previous by thread: Crash in bidi.c in Haiku port
Next by thread: Re: Improve `replace-regexp-in-string' ergonomics?
Index(es):
- Date
- Thread