[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Improve `replace-regexp-in-string' ergonomics?
From: |
Adam Porter |
Subject: |
Re: Improve `replace-regexp-in-string' ergonomics? |
Date: |
Wed, 22 Sep 2021 02:33:03 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) |
Lars Ingebrigtsen <larsi@gnus.org> writes:
> `replace-regexp-in-string' often leads to pretty awkward code. I
> wonder whether we could improve it somehow.
That would be appreciated. :)
> Here's a real life example:
>
> (defun org-babel-js-read (results)
> [...]
> (org-babel-read
> (concat "'"
> (replace-regexp-in-string
> "\\[" "(" (replace-regexp-in-string
> "\\]" ")" (replace-regexp-in-string
> ",[[:space:]]" " "
> (replace-regexp-in-string
> "'" "\"" results))))))
>
> That's kinda hard to read, but variations on this is pretty common.
> When you have one `replace-regexp-in-string', you often have another.
>
> We introduced `thread-last' in 2014, and there seems to be one (1) place
> in the Emacs code base, so I guess that didn't take off, but rewriting
> with that,
It doesn't seem that `thread-last' is very popular among Elispers, but
more among Clojurists (e.g. it's also implemented in dash.el as `->>').
But I've found it very useful in some cases, and I'm using `thread-last'
more often, trying to avoid adding dependencies on dash.el unless
necessary.
> we get:
>
> (org-babel-read
> (concat "'"
> (thread-last
> results
> (replace-regexp-in-string "'" "\"")
> (replace-regexp-in-string ",[[:space:]]" " ")
> (replace-regexp-in-string "\\]" ")")
> (replace-regexp-in-string "\\[" "("))))
>
> Which is somewhat more readable (but note that this totally breaks
> down if you want to mix in LITERAL etc). ... The length of the name of
> this common function is itself offputting.
Agreed, the name seems too long, and the function's signature is awkward
(I always have to check the argument list when I use it). Most of the
time, I don't want to replace with automatic case matching, nor do I
want to substitute the original matched text, so I have to add the
FIXEDCASE argument, and then carefully re-read the docstring for LITERAL
and decide whether I need it, too.
The SUBEXP argument, I'm not so sure about. Having it at the end would
break threading. Having it after the replacement would mean having a
"nil" much of the time, which wouldn't be as pretty. I suppose the
third argument could be either a SUBEXP or the string, and if a SUBEXP,
an optional fourth argument could be the string? But since these are
likely called often and in loops, I suppose that might be undesirable.
> But I wonder whether we should consider renaming the function to
> something more palatable, and since we have `string-replace', why not
> `regexp-replace'?
Sounds good to me. (Since it's also string-related, a
`string-replace-regexp' alias might be warranted, but I don't want to
get too bikesheddy now.)
> (org-babel-read
> (concat "'"
> (thread-last
> results
> (regexp-replace "'" "\"")
> (regexp-replace ",[[:space:]]" " ")
> (regexp-replace "\\]" ")")
> (regexp-replace "\\[" "("))))
>
> We could also consider making `regexp-replace' take a series of pairs,
> since this is so common. Like:
>
> (org-babel-read
> (concat "'"
> (regexp-replace "'" "\""
> ",[[:space:]]" " "
> "\\]" ")"
> "\\[" "("
> results)))
>
> Or some variation thereupon with some more ()s to group pairs.
It is common, but IMHO, it would be better to use a separate function
for that case, e.g. maybe `regexp-replace-pairs'. (Alternatively,
`pcase-dolist' makes it easy to call a function in a loop with paired
arguments, so maybe it's not really needed.)
> The most popular way to deal with the awkwardness is to just give up and
> go all imperative:
>
> (defun authors-canonical-author-name (author file pos)
> [...]
> (when author
> (setq author (replace-regexp-in-string "[ \t]*[(<].*$" "" author))
> (setq author (replace-regexp-in-string "\\`[ \t]+" "" author))
> (setq author (replace-regexp-in-string "[ \t]+$" "" author))
> (setq author (replace-regexp-in-string "[ \t]+" " " author))
>
> Which leads me to my other point -- about a quarter of the usages of the
> function in Emacs core has "" as the replacement, so perhaps that should
> have its own function? `regexp-remove'?
IMHO, I'd lean toward not adding this unless it's really needed, but I
won't opine too strongly on it.
Re: Improve `replace-regexp-in-string' ergonomics?,
Adam Porter <=
Re: Improve `replace-regexp-in-string' ergonomics?, Andreas Schwab, 2021/09/22
Re: Improve `replace-regexp-in-string' ergonomics?, Augusto Stoffel, 2021/09/22
Re: Improve `replace-regexp-in-string' ergonomics?, Lars Ingebrigtsen, 2021/09/22
Re: Improve `replace-regexp-in-string' ergonomics?, Dmitry Gutov, 2021/09/22