bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26879: end-of-line issue with cygwin 4.4-1 sed 4.4


From: Eric Blake
Subject: bug#26879: end-of-line issue with cygwin 4.4-1 sed 4.4
Date: Fri, 12 May 2017 15:47:56 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0

On 05/11/2017 04:59 PM, Assaf Gordon wrote:
> On 05/11/2017 04:29 PM, Eric Blake wrote:
>> On 05/11/2017 03:13 PM, Assaf Gordon wrote:
>>> If one wants the old sed behavior on cygwin (automatic
>>> handling of CR/LF),
>>> all that's needed is rebuilding sed from upstream source?
>>
>> No. [...]
>> The default upstream behavior has ALWAYS been to handle files in native
>> mode (ie. open("r") - where the choice of text or binary is determined
>> by the file system).  Downstream Cygwin sed USED to have a patch that
>> overrode upstream behavior to do freopen(NULL, "rt", stdin)
> 
> I see. Thanks for explaining.
> 
> So the only systems where 'sed' does automatically strip CR/LF
> are MingW/MSVC/MSDOS builds (and only there the "-b/--binary" option
> makes a difference) ?

Upstream sed doesn't ever actively strip CR by itself. Rather, opening a
file in default mode (that is, open("r")), on a system where that mode
resolves to text mode (Cygwin depending on your mount options, and I
think on mingw/MSVC by default) causes CR to be stripped by the
underlying libc.  Using -b/--binary causes sed to use open("rb") instead
of open("r"), at which point you tell libc to use binary mode no matter
what.

Downstream sed on Cygwin used to add a patch to use open("rt") to force
text mode unless you used 'sed -b'; that downstream patch was removed in
Feb 2017.

> 
> If so,
> should we remove the "#ifdef __CYGWIN__" from sed's source code
> since it now behaves exactly like gnu/linux ?

Cygwin behaves like gnu/linux if you use binary mount points. But Cygwin
still supports text mount points, and therefore 'sed -b' is still useful
on Cygwin, and therefore I don't think the #ifdef __CYGWIN__ should be
removed from sed's source code.  (I do, however, think the #ifdef could
be rewritten to '#if O_BINARY', because a non-zero O_BINARY is a more
reliable indicator of the platforms where binary-vs-text actually
matters, without having to be a long list of specific platforms)


> To summarize, IIUC:
> If someone uses new (post feb-2017) cygwin exclusively -
> everything should "just work" and files have only '\n' line endings.

If you manage your data solely through Cygwin programs, then your data
should only have \n line endings, so sed should "just work".  But if you
intermix cygwin programs with data from other sources (a trivial example
being the Windows command shell, whose builtin 'echo' uses \r\n line
endings), then cygwin's default of treating pipe input as binary coupled
with the native windows' application default of generating output as
text means that sed will act strangely unless you rewrite your pipeline
to filter out the \r from the native app before feeding it to sed.

> 
> Line-Ending problems will occur of someone mixes old/new cygwin
> tools or files (e.g. files created on old cygwin and used with newer
> cygwin programs),

No, cygwin has favored binary files for years now, unless you
specifically configure for a text mount. (Text mounts used to be very
easy to set up by running cygwin's setup.exe, but we removed that
functionality at least 10 years ago because it caused more problems than
it solved, so setting up a text mode mount is now a lot more involved).
So mixing data created with old cygwin with sed from new cygwin is
unlikely to cause problems if you never changed defaults, because the
defaults have been to produce data with \n endings for years now.

> or
> if mixing cygwin/non-cygwin tools.

Correct (I just repeated that above, before reading below).


> Out of curiosity (if anyone knows):
> What does "Windows Subsystem For Linux" do with line-endings?

I have not played with it yet, but my gut feel: \n endings only.  It is
emulating Linux system calls and executing actual Linux userspace
programs (where text mode does not exist).  open("rt") is thus an error
(since glibc does not support it).  But note that Windows Subsystem For
Linux is a _distinct_ subsystem (think of it more like a virtual
machine) - you CANNOT make it directly interact with native windows
programs (can't pipe data from one subsystem to another); they can only
see a common filesystem.  So Cygwin still has a niche (where you have a
program specifically compiled to Windows API using the cygwin1.dll, and
therefore operating in your normal windows subsystem).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]