coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How tail works on a large file?


From: Eric Blake
Subject: Re: How tail works on a large file?
Date: Sat, 22 Aug 2020 07:37:11 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0

On 8/22/20 5:45 AM, Peng Yu wrote:
Hi,

I tried to tail a large file (2.8GB) to get is last 10 lines. It runs very fast.

How is this achieved? Does tail do it differently between a file
(random disk access) and a pipe (sequential disk access)? Thanks.

Yes. Using 'strace' (on Linux) or a comparable program (on other platforms) will let you see the syscalls that tail performs; tail attempts lseek(fd,-bufsize,SEEK_END) in order to read just a buffer of information first; that works on random-access files (tail only has to search through one or more buffers until it finds the last few lines) but fails on pipes (which aren't seekable, so tail has to read the entire file, and buffer things in memory, although the buffer only has to be as large as the number of lines it is looking for).

You could also read the source code for yourself. In fact, look at all the spots where 'presume_input_pipe' is used, to distinguish between seekable files where lseek works, and pipes where it doesn't (there is even an undocumented 'tail ---presume-input-pipe' option that lets you force-disable the lseek optimization, to get the speed penalty of a non-seekable file even when testing on seekable input).
https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c#n216

On a higher level, it has appeared over the years that you have a tendency to ask questions to make others do the research for you, instead of diving into the code yourself. You would do well to remember that such behavior tends to be viewed as anti-social.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]