On Tue, Aug 10, 2021 at 10:32:55AM +0200, Hanna Reitz wrote:
On 09.08.21 20:41, Vivek Goyal wrote:
On Fri, Jul 30, 2021 at 05:01:33PM +0200, Max Reitz wrote:
When the inode_file_handles option is set, try to generate a file handle
for new inodes instead of opening an O_PATH FD.
Being able to open these again will require CAP_DAC_READ_SEARCH, so the
description text tells the user they will also need to specify
-o modcaps=+dac_read_search.
Generating a file handle returns the mount ID it is valid for. Opening
it will require an FD instead. We have mount_fds to map an ID to an FD.
get_file_handle() fills the hash map by opening the file we have
generated a handle for. To verify that the resulting FD indeed
represents the handle's mount ID, we use statx(). Therefore, using file
handles requires statx() support.
So opening the file and storing that fd in mount_fds table might be
a potential problem with inotify work Ioannis is doing.
So say a file foo.txt was opened O_RDONLY and fd stored in mount_fs. Now
say user unlinks foo.txt. If notifications are enabled, final notification
will not be generated till this mount_fds fd is closed.
Now question is when will this fd be closed? If it closed at some
later point and then notification is generated, that will break
notificaitons.
Currently, it is never closed.
In fact even O_PATH fd is delaying notifications due to same reason.
But its not too bad as we close O_PATH fd pretty quickly after
unlinking. And we were hoping that file handle support will get rid
of this problem because we will not keep O_PATH fd open.
But, IIUC, mount_fds stuff will make it even worse. I did not see
the code which removes this fd from mount_fds. So I am not sure what's
the life time of this fd.
The lifetime is forever. If we wanted to remove it at some point, we’d need
to track how many file handles we have open for the given mount fd and then
remove it from the table once the count reaches 0, so it would still be
delayed.
I think in practice the first thing that is looked up from some mount will
probably be the root directory, which cannot be deleted before everything
else on the mount is gone, so that would work. We track how many handles
are there, if the whole mount were to be deleted, I hope all lo_inodes are
evicted, the count goes to 0, and we can drop the mount fd.
Keeping a reference count on mount_fd object make sense. So we probably
maintain this hash table and lookup using mount_id (as you are already
doing). All subsequent inodes from same filesystem will use same
object. Once all inodes have been flushed out, then mount_fd object
should go away as well (allowing for unmount on host).
I think we can make the assumption that the mount fd is the root directory
certain by, well, looking into mountinfo... That would result in us always
opening the root node of the filesystem, so that first the whole filesystem
needs to disappear before it can be deleted (and our mount fd closed) –
which should work, I guess?
This seems more reasonable. And I think that's what man page seems to
suggest.
The mount_id argument returns an identifier for the filesystem mount
that corresponds to pathname. This corresponds to the first field in
one of the records in /proc/self/mountinfo. Opening the pathname in
the fifth field of that record yields a file descriptor for the mount
point; that file descriptor can be used in a subsequent call to
open_by_handle_at().
Fifth field seems to be the mount point. man proc says.
(5) mount point: the pathname of the mount point relative to
the process's root directory.
So opening mount point and saving as mount_fd (if it is not already
in hash table) and then take a per inode reference count on mount_fd
object looks like will solve the life time issue of mount_fd as
well as the issue of temporary failures arising because we can't
open a device special file.