Re: Multi-byte character as delimiter

From: Assaf Gordon
Subject: Re: Multi-byte character as delimiter
Date: Sat, 7 Mar 2020 16:18:37 -0700
User-agent: Mutt/1.11.4 (2019-03-13)


On Tue, Mar 03, 2020 at 01:49:16PM +0100, Haakon Storm Heen wrote:
> ### What I'm trying
> cat example.txt|gsed "sāœ‹$(printf '\t')āœ‹|āœ‹"
> ### Error
> gsed: -e expression #1, char 2: delimiter character is not a single-byte
> character
> ### Workaround? Feature request?
> - Any way around this?
> - Should I add multibyte delimiter characters as a feature request?

Currently, there is no way around it.
Enforcing single-byte delimiter is gnu sed's behaviour since at least
version 4.0a from 2003.

You can of course ask for it as a feature, but personally I do not think
the benefits outweigh the costs of such addition.

> The rationale behind this is:
> - emoji/unicode are (IMHO) better visual indicators (than plain `ascii`)

That is only true if your terminal properly supports unicode characters.
it would be very easy to assume all terminals behave as nice as MacOS's
terminal, but I suspect many do not.

They are also somewhat harder to type than regular characters.

> - many files I process are scripts that might contain the usual delimiter
> characters `/` `_` `|` ...

Do you mean that you are processing shell scripts using "gsed" ?
The scripts containing these characters are only a problem if you need
to replace several of these in one SED command, isn't it ?

For example, if you wanted to replace slashes AND pipe in the same sed
command, it would still be easy (and visually clear) to use ";" as
deliimter, no ? e.g.:

        gsed 's;/;FOO;'

> - replacing a šŸ¤š by mistake with something else is not as detrimental as
> replacing `/` or `|` if the file happens to be a shell script.

If you are using sed as a pipe (eg. "cat | gsed ...") then you
have the original file at hand if something detrimental happened.
If you are replacing inplace, use the backup option to keep a previous

These two methods can help recover from any mistakes.

All in all, I'm not in favor of adding this as an option.
However, if you have other convincing use-cases please do send them.
And, since working code is worth a thousand emails, if you (or others)
want to try to implement this (including unit tests) - this will be a
strong case in favor.

 - assaf

