bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: minor documentation suggestion for FS values and "whitespace" in gen


From: Andrew J. Schorr
Subject: Re: minor documentation suggestion for FS values and "whitespace" in general
Date: Tue, 24 Mar 2020 08:45:19 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

In the code inside field.c, set_FIELDWIDTHS uses is_blank, which tests for ' ' 
or
'\t', but other places (def_parse_field and re_parse_field) test for '\n' in
addition to those two. Granted, in the normal case where RS is '\n', it doesn't
matter whether FS is checking for '\n', but I suppose it could matter when
RS has an unusual value...

Regards,
Andy

On Tue, Mar 24, 2020 at 03:55:07AM -0600, address@hidden wrote:
> Whitespace is ' ' and '\t'.  I wll clarify the documentation, but
> likely not in terms of [[:blank:]], since I suspect that in UTF locales
> it can match more than just ' ' and '\t'.
> 
> Thanks,
> 
> Arnold
> 
> Ed Morton <address@hidden> wrote:
> 
> > I was just looking up which exact characters get included in the set of 
> > field separators when FS is " " (the default value) and got confused by 
> > this in the gawk documentation:
> >
> >     Class    Meaning
> >     [:blank:]    Space and TAB characters
> >     [:space:]    Space characters (these are: space, TAB, newline,
> >     carriage return, formfeed and vertical tab)
> >
> >     FS == " "
> >          Fields are separated by runs of *whitespace*. Leading and
> >     trailing whitespace are ignored. This is the default.
> >     /(bold added by me)/
> >
> > I took the last statement above to mean that FS would be the set of 
> > characters defined by the [:space:] character class but it's not since 
> > FS doesn't include carriage return (\r) nor vertical tab (\v) (I didn't 
> > bother checking others)when FS is " ", neither is it the [:blank:] 
> > character class since it includes newlines (\n). Instead it seems to be 
> > [:blank:] plus newline and that's supported by the POSIX spec if we 
> > assume by <blank> they mean [:blank:]:
> >
> >     ...by default, a field is a string of non- <blank> non- <newline>
> >     characters.
> >
> > But what does newline mean in all of the above? Is it always linefeed 
> > (\n) on all platforms or is it LF (\n) on UNIX and CRLF (\r\n) on 
> > Windows or something else? I really don't know.
> >
> > So - maybe you could update the documentation to say "Fields are 
> > separated by runs of the whitespace (i.e. [:blank:] plus linefeed 
> > characters)" or similar? I couldn't find anywhere in the documentation 
> > that states exactly which characters  FS includes when assigned " " nor 
> > what exactly is meant by "whitespace" throughout the documentation and I 
> > think that one tweak to provide a clear definition of the term 
> > "whitespace" would clarify all of it.
> >
> >      Ed.
> >
> >
> >
> >

-- 
Andrew Schorr                      e-mail: address@hidden
Telemetry Investments, L.L.C.      phone:  917-305-1748
152 W 36th St, #402                fax:    212-425-5550
New York, NY 10018-8765



reply via email to

[Prev in Thread] Current Thread [Next in Thread]