[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Improve `replace-regexp-in-string' ergonomics?
From: |
Lars Ingebrigtsen |
Subject: |
Improve `replace-regexp-in-string' ergonomics? |
Date: |
Wed, 22 Sep 2021 06:36:27 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) |
`replace-regexp-in-string' often leads to pretty awkward code. I wonder
whether we could improve it somehow.
Here's a real life example:
(defun org-babel-js-read (results)
[...]
(org-babel-read
(concat "'"
(replace-regexp-in-string
"\\[" "(" (replace-regexp-in-string
"\\]" ")" (replace-regexp-in-string
",[[:space:]]" " "
(replace-regexp-in-string
"'" "\"" results))))))
That's kinda hard to read, but variations on this is pretty common.
When you have one `replace-regexp-in-string', you often have another.
We introduced `thread-last' in 2014, and there seems to be one (1) place
in the Emacs code base, so I guess that didn't take off, but rewriting
with that, we get:
(org-babel-read
(concat "'"
(thread-last
results
(replace-regexp-in-string "'" "\"")
(replace-regexp-in-string ",[[:space:]]" " ")
(replace-regexp-in-string "\\]" ")")
(replace-regexp-in-string "\\[" "("))))
Which is somewhat more readable (but note that this totally breaks down
if you want to mix in LITERAL etc). But I wonder whether we should
consider renaming the function to something more palatable, and since we
have `string-replace', why not `regexp-replace'? The length of the name
of this common function is itself offputting.
(org-babel-read
(concat "'"
(thread-last
results
(regexp-replace "'" "\"")
(regexp-replace ",[[:space:]]" " ")
(regexp-replace "\\]" ")")
(regexp-replace "\\[" "("))))
We could also consider making `regexp-replace' take a series of pairs,
since this is so common. Like:
(org-babel-read
(concat "'"
(regexp-replace "'" "\""
",[[:space:]]" " "
"\\]" ")"
"\\[" "("
results)))
Or some variation thereupon with some more ()s to group pairs.
The most popular way to deal with the awkwardness is to just give up and
go all imperative:
(defun authors-canonical-author-name (author file pos)
[...]
(when author
(setq author (replace-regexp-in-string "[ \t]*[(<].*$" "" author))
(setq author (replace-regexp-in-string "\\`[ \t]+" "" author))
(setq author (replace-regexp-in-string "[ \t]+$" "" author))
(setq author (replace-regexp-in-string "[ \t]+" " " author))
Which leads me to my other point -- about a quarter of the usages of the
function in Emacs core has "" as the replacement, so perhaps that should
have its own function? `regexp-remove'?
Then that could be:
(when author
(setq author (regexp-remove "[ \t]*[(<].*$" author))
(setq author (regexp-remove "\\`[ \t]+" author))
(setq author (regexp-remove "[ \t]+$" author))
(setq author (regexp-replace "[ \t]+" " " author))
or
(when author
(setq author
(regexp-replace
"[ \t]+" " " (regexp-remove
"[ \t]*[(<].*$" (regexp-remove
"\\`[ \t]+" (regexp-remove
"[ \t]+$" author)))))))
or
(when author
(setq author
(thread-last author
(regexp-remove "[ \t]*[(<].*$")
(regexp-remove "\\`[ \t]+")
(regexp-remove "[ \t]+$")
(regexp-replace "[ \t]+" " ")))))
Or... something else. I'm sure nobody else has thought about this
issue before.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
- Improve `replace-regexp-in-string' ergonomics?,
Lars Ingebrigtsen <=
Re: Improve `replace-regexp-in-string' ergonomics?, Adam Porter, 2021/09/22