help-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help needed with substitute* command


From: Tobias Geerinckx-Rice
Subject: Re: Help needed with substitute* command
Date: Thu, 06 Jan 2022 14:50:15 +0100

Hullo Mortimer,

I hope this answer isn't too basic for you.  This input:

Mortimer Cladwell 写道:
  ---input.txt(2)-------
  foo(abc)bar(def)

does not match the extended regular expression:

"foo([a-z]+)bar(.*)$"

This would:

 ---input.txt(3)-------
 foolishbarista

result: bazlishista

I'm not the one to either write or recommend a tutorial on extended regular expressions, but you'll find plenty on the 'net. There's also ‘info (grep)Regular Expressions’ which might be good. These things aren't specific to Guile, although a few dialects exist, and I think Guile uses the POSIX one. The differences are quite small.

In this specific example

("foo([a-z]+)bar(.*)$" all letters end)

the first string is an extended regular expression.

It will match a literal ‘foo’ anywhere on a line, followed by 1 or more lowercase letters, followed by a literal ‘bar’, followed by anything until the end of the line.

It will NOT match anything with ‘()’ brackets in it, like your original input.txt(2). The brackets are regexp syntax used for grouping and capturing.

If an optional variable name follows the regexp, it will be set to the complete match. Here, that is ‘all’, which in our example will contain "foolishbarista". It's not used here.

In practice, this variable would be named ‘_’ to indicate that it's unimportant:

 (("foo([a-z]+)bar(.*)$" _ letters end)
  (string-append "baz" letters end))

but the author of the manual example thought that ‘all’ would be more clear.

Each subsequent optional variable will be set to the content matched by () groups. Here, ‘letters’ will be set to whatever matched ‘[a-z]+’, and ‘end’ to whatever matched ‘.*’.

In our example ‘letters’ is "lish" and ‘end’ is "ista".

This is powerful, because we can construct arbitrary strings at run time based that can differ significantly for each line that matches the same regexp:

(string-append "baz" letter end)

is just Scheme code that uses the captured variables above, without hard-coding assumptions about what was matched.

 footbarnacles → baztnacles
 foodiebarmaid → bazdiemaid
 …

Minutes of fun.

This special meaning of ‘()’ in extended rexeps means that if you would want to match:

 ---input.txt(4)-------
 fo(bizzle)

you'd write:

 "fo\\(bizzle\\)"

Because "\" in a string *also* has special meaning to Guile itself, we have to write "\\(" if we want the regexp engine to see "\(".

Is the letters/letter in the manual a typo? If I use letter I get
"...unbound variable..."

Yes, that was a typo, both names should match. I've fixed it. Thanks for apparently being the first to test this snippet!

Kind regards,

T G-R

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]