sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Feature to add


From: Assaf Gordon
Subject: Re: Feature to add
Date: Thu, 19 Jul 2018 05:44:06 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

(adding sed-devel@ mailing list, please use reply-all to keep the thread public and archived).


Hello Russell,


On 19/07/18 04:18 AM, Russell Harper wrote:
I'm not writing specifically about parsing floating point numbers or factoring integers, these are just examples to illustrate. You can substitute anything else instead.

What I'm proposing is an x flag for substitutions to indicate that the substitution is obtained by running an executable and inserting its output.

     's/<reg-exp>/<executable> <argument>*/x'

Some examples:

    's/UUID/uuidgen/gx'                # replaces instances of "UUID" with output from uuidgen     's/([0-9]+)/factor \1/gx'          # replaces integers with output from factor <integer>     's~(http://[A-Za-z.]+)~wget \1~x'  # replaces URL with output from wget <URL>
     's~([a-z]+)~./pluralize \1~gix'    # custom utility to pluralize words

Currently there is no easy and robust way to do this in any of the core utilities.

Thank you for expanding and explaining on your request.

This indeed seems like a specialized feature, perhaps a bit out of scope
for sed. GNU sed does have the "s///e" extension ("e" for "eval"),
but that runs a shell command on the entire pattern space once,
and not on every matched group as in your examples.

However Perl can easily do exactly what you ask for (and in a robust way).

First,
Perl's regex substitution also has an "e" flag, but it is more powerful than sed's: it calls a perl function on every matched group.

In the following example, every number (matching the regex /(\d+)/ )
is transformed using perl's built-in hex() function:

  $ echo 230 19 FOO 40 BAR 50 | perl -np -e 's/(\d+)/hex($1)/ge'
  560 25 FOO 64 BAR 80

(That is: 0x230 is 560 in decimal, 0x19 is 25 in decimal, etc.).

Similarly,
we can define our own perl function to do any transformation we'd like. The following example increments any matched number by 1:

  $ echo 230 19 FOO 40 BAR 50 \
        | perl -np -e 'sub f($) { return $_[0] + 1 ; }' \
                   -e 's/(\d+)/f($1)/ge'
  231 20 FOO 41 BAR 51


Lastly,
Perl excels at text processing and evaluating external commands,
so we modify our function to execute "factor" on any matched
number:

  $ echo 230 19 FOO 40 BAR 50 \
        | perl -np -e 'sub f($) { return `factor $_[0]` ; }' \
                   -e 's/(\d+)/f($1)/ge'
  230: 2 5 23
   19: 19
   FOO 40: 2 2 2 5
   BAR 50: 2 5 5


And an example with UUID:

  $ echo UUID FOO UUID BAR UUID \
      | perl -np -e 'sub f($) { $t = `uuidgen` ; chomp $t ; $t }' \
                 -e 's/(UUID)/f($1)/ge'
4a64a434-73b2-47f9-985f-2eff776b981d FOO fc7f3796-cfed-4850-a363-a70edfceee1b BAR de65fe02-96fd-436e-ae2b-66127c438702


Of course,
when executing things like that on the shell, extra care must be taken
to ensure malicious input can't cause unintended consequences with shell
escaping tricks.

=======

As for adding a new feature to sed:

There is always a trade-off between adding more and more specialized
features to sed, and between using existing solution even if they are
a bit more verbose (i.e. my perl examples are much longer than the
hypothetical s///x sed feature).

I don't think we can/should modify sed's existing s///e flag (that would
break existing scripts), but we could perhaps consider adding a new
flag.

What do others think - is it worth it, or better just stick with perl ? (Jim?)

The semantics of such flag must be carefully defined, e.g.
what's the interplay with grouping, with global flag, with other flags?

regards,
 - assaf



reply via email to

[Prev in Thread] Current Thread [Next in Thread]