coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: du reports a 1.2PB file on a 1TB btrfs disk


From: Kaz Kylheku (Coreutils)
Subject: Re: RFC: du reports a 1.2PB file on a 1TB btrfs disk
Date: Wed, 11 Mar 2020 04:29:24 -0700
User-agent: Roundcube Webmail/0.9.2

On 2020-03-10 21:31, Jim Meyering wrote:
On Tue, Mar 10, 2020 at 12:24 PM Kaz Kylheku (Coreutils)
<address@hidden> wrote:
On 2020-03-10 11:52, Jim Meyering wrote:
> Otherwise, du provides no way of seeing how much of the actual disk
> space is being used by such FS-compressed files.

If you stat the file, what are the values of st_size, st_blksize and
st_blocks?

That particular file is long gone, but I've just created a 1.8T file
on a 700G file system.
Before I began this experiment, "Avail" was 524G, so it appears to
occupy about 60G actual space.

Sorry; forget I mentioned st_blksize; I forgot that st_blocks is
measured in 512 byte blocks regardless of st_blksize.

FTR, I created the file by running this: yes $(printf '%065535d\n' 0) > big

$ stat big
  File: big
Size: 1957123607586 Blocks: 3822507048 IO Block: 4096 regular file

So here, the Blocks value (coming from st_blocks) doesn't inform us
differently from size; if we multiply it by 512, it matches the size
exactly.

The underlying FS can use the st_blocks value to indicate the actual
storage. For instance, if I do this on ext4:

   # dd of=file seek=$((1024 * 1024)) count=1 if=/dev/zero

Then:

   # du -h file
   12K     file
   # du --apparent-size -h file
   513M    file

The apparent size comes from the st_blocks information in the stat structure:

  # stat file
    File: `file'
Size: 536871424 Blocks: 24 IO Block: 4096 regular file
  Device: 902h/2306d      Inode: 1624448     Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
  Access: 2020-03-11 04:22:26.000000000 -0700
  Modify: 2020-03-11 04:22:26.000000000 -0700
  Change: 2020-03-11 04:22:26.000000000 -0700

The issue you are seeing here is that btrfs should be probably be
publishing a st_blocks value that matches the actual storage,
accounting for sparseness and compression, and not just a repetition
of the size, rounded up to a block and quoted in 512 byte units.

The fidelity of the du output is only as good as what is in stat.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]