sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Question] How the sed deal with the '\0' embedded in string?


From: Assaf Gordon
Subject: Re: [Question] How the sed deal with the '\0' embedded in string?
Date: Tue, 13 Sep 2016 23:06:28 -0400

(re-adding sed-devel@ mailing list, please reply to the mailing list with 
technical discussions)

Hello,

> On Sep 13, 2016, at 22:39, Du Dengke <address@hidden> wrote:
> 
> At the same time, I also check how the sed substitute command do when meet 
> the string that contain NUL characters.
> 
> sed/execute.c: do_subst(sub)
>     sed/regexp.c: match_regex()
>         lib/regexec.c: regexec()
>           lib/regexec.c: re_search_internal()
> 
> The regexec() function deal with null-terminated string, when the string 
> contain NUL characters, how it works? I can't find
> the correct function for that, it's more complicated for me.

Generally speaking, the concept is the same:
when passing a string (char*) with an additional 'length' parameter, embedded 
NULs can be treated like other characters, because the length is known and 
there's no need to use strlen(3). If the length is not known and strlen(3) is 
used, then the first NUL will indicate end-of-string.

Specifically,
sed did not call regexec(3) but called 're_search', which then called 
're_search_internal'.
Both of these functions require a 'length' parameter, and do not rely on 
'strlen(3)'.
These functions are non-standard (i.e. they are not in posix), and are 
implemented by gnulib.
If you use the standard posix function regexec(3), then indeed it takes a 
NUL-terminated string
and will not be able to deal with NULs.


As a side-note, about 5 weeks ago sed's implementation changed to a different 
regex engine (DFA from GNU grep), and under certain circumstances the code now 
calls the more efficient 'dfaexec', with a fallback to 're_search'.
This code is available on sed's git repository:
  http://git.savannah.gnu.org/cgit/sed.git/log/
However, the principle still holds: dfaexec does not use strlen(3), and so 
embedded NULs are treated like other characters.

regards,
  - assaf





reply via email to

[Prev in Thread] Current Thread [Next in Thread]