[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: minor documentation suggestion for FS values and "whitespace" in gen
From: |
Andrew J. Schorr |
Subject: |
Re: minor documentation suggestion for FS values and "whitespace" in general |
Date: |
Tue, 24 Mar 2020 08:45:19 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi,
In the code inside field.c, set_FIELDWIDTHS uses is_blank, which tests for ' '
or
'\t', but other places (def_parse_field and re_parse_field) test for '\n' in
addition to those two. Granted, in the normal case where RS is '\n', it doesn't
matter whether FS is checking for '\n', but I suppose it could matter when
RS has an unusual value...
Regards,
Andy
On Tue, Mar 24, 2020 at 03:55:07AM -0600, address@hidden wrote:
> Whitespace is ' ' and '\t'. I wll clarify the documentation, but
> likely not in terms of [[:blank:]], since I suspect that in UTF locales
> it can match more than just ' ' and '\t'.
>
> Thanks,
>
> Arnold
>
> Ed Morton <address@hidden> wrote:
>
> > I was just looking up which exact characters get included in the set of
> > field separators when FS is " " (the default value) and got confused by
> > this in the gawk documentation:
> >
> > Class Meaning
> > [:blank:] Space and TAB characters
> > [:space:] Space characters (these are: space, TAB, newline,
> > carriage return, formfeed and vertical tab)
> >
> > FS == " "
> > Fields are separated by runs of *whitespace*. Leading and
> > trailing whitespace are ignored. This is the default.
> > /(bold added by me)/
> >
> > I took the last statement above to mean that FS would be the set of
> > characters defined by the [:space:] character class but it's not since
> > FS doesn't include carriage return (\r) nor vertical tab (\v) (I didn't
> > bother checking others)when FS is " ", neither is it the [:blank:]
> > character class since it includes newlines (\n). Instead it seems to be
> > [:blank:] plus newline and that's supported by the POSIX spec if we
> > assume by <blank> they mean [:blank:]:
> >
> > ...by default, a field is a string of non- <blank> non- <newline>
> > characters.
> >
> > But what does newline mean in all of the above? Is it always linefeed
> > (\n) on all platforms or is it LF (\n) on UNIX and CRLF (\r\n) on
> > Windows or something else? I really don't know.
> >
> > So - maybe you could update the documentation to say "Fields are
> > separated by runs of the whitespace (i.e. [:blank:] plus linefeed
> > characters)" or similar? I couldn't find anywhere in the documentation
> > that states exactly which characters FS includes when assigned " " nor
> > what exactly is meant by "whitespace" throughout the documentation and I
> > think that one tweak to provide a clear definition of the term
> > "whitespace" would clarify all of it.
> >
> > Ed.
> >
> >
> >
> >
--
Andrew Schorr e-mail: address@hidden
Telemetry Investments, L.L.C. phone: 917-305-1748
152 W 36th St, #402 fax: 212-425-5550
New York, NY 10018-8765