[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC] have tail always detect broken pipes
From: |
Pádraig Brady |
Subject: |
Re: [RFC] have tail always detect broken pipes |
Date: |
Thu, 8 Jun 2017 22:25:51 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 |
On 07/06/17 14:27, Bernhard Voelker wrote:
> On 06/07/2017 10:00 AM, Pádraig Brady wrote:
>> The following will hang indefinitely, due to no further data
>> being written through the pipe, and thus no SIGPIPE generated:
>>
>> $ tail -f /etc/hosts | sleep 1
>>
>> A more practical use case might be:
>>
>> tail -f file.log | grep -q trigger &&
>> process_immediately
>>
>> Another case where one might notice this is that sometimes
>> (depending on how slow the reader is) the following will hang,
>> and you may not notice the issue until data arrives
>>
>> tail -f file.log | grep_typo filter
>
> indeed, annoying.
>
>> Below is a quick hack to support this,
>> and tail now terminates promptly for the above cases.
>
> I didn't have a look at the patch below (no time, sorry), but ...
>
>> The implementation is a proof on concept done in a few minutes
>> (and may change from poll() to select() if possible).
>> A disadvantage is the extra syscalls to do the polling.
>> Actually I may be able to combine the existing select()
>> with this new poll(), and thus wait for changes in the
>> output as well as the inotify descriptor.
>
> ... I think this is related to another aspect: should tail consume
> more input when the reader following in the pipe terminates?
>
> $ seq 100000 | { tail -c +5 | sed -n '1{p;q}'; cat; } | head -n5
> 3
> 505
> 15506
> 15507
> 15508
>
> Although the above is a corner case, POSIX doesn't specify explicitly
> for tail(1) what to do when the reader terminates, but it would be nice
> to have a deterministic behavior, wouldn't it?
I agree it would, but these are async processes,
so I'm not sure we can have such deterministic behavior between them.
We can (and do) flush the output, but the reader may still
be present and processing previous data. I think the only
guarantee one can make between async processes like this
is that sources can exit in a "timely manner" when the sinks terminate.
Another way to look at it is there are various buffers in play here
and tail will not get a SIGPIPE even if sed in the example above
has only read a portion of the data that tail wrote.
To demonstrate the non-determinism of your example:
$ seq 100000 | { src/tail -c +5 | sed -n '1{p;q}'; cat; } | head -n5
3
505
15506
15507
15508
$ seq 100000 | { src/tail -c +5 | sed -n '1{p;q}'; cat; } | head -n5
3
457
13458
13459
13460
So sed processed and exited a bit more quickly in the second case.
cheers,
Pádraig