[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87o8i340vo.fsf@collabora.com>
Date: Tue, 05 Jan 2021 16:52:43 -0300
From: Gabriel Krisman Bertazi <krisman@...labora.com>
To: Dave Chinner <david@...morbit.com>
Cc: dhowells@...hat.com, viro@...iv.linux.org.uk, tytso@....edu,
khazhy@...gle.com, adilger.kernel@...ger.ca,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
kernel@...labora.com
Subject: Re: [PATCH 4/8] vfs: Add superblock notifications
Dave, thanks for the feedback.
Dave Chinner <david@...morbit.com> writes:
> On Fri, Dec 11, 2020 at 05:55:32PM -0300, Gabriel Krisman Bertazi wrote:
>> > Fundamentally, though, I'm struggling to understand what the
>> > difference between watch_mount() and watch_sb() is going to be.
>> > "superblock" watches seem like the wrong abstraction for a path
>> > based watch interface. Superblocks can be shared across multiple
>> > disjoint paths, subvolumes and even filesystems.
>>
>> As far as I understand the original patchset, watch_mount was designed
>> to monitor mountpoint operations (mount, umount,.. ) in a sub-tree,
>> while watch_sb monitors filesystem operations and errors. I'm not
>> working with watch_mount, my current interest is in having a
>> notifications mechanism for filesystem errors, which seemed to fit
>> nicely with the watch_sb patchset for watch_queue.
>
> <shrug>
>
> The previous patches are not part of your proposal, and if they are
> not likely to be merged, then we don't really care what they are
> or what they did. The only thing that matters here is what your
> patchset is trying to implement and whether that is appropriate or
> not...
I think the mistake was only mentioning them in the commit message, in
the first place.
>> > The path based user API is really asking to watch a mount, not a
>> > superblock. We don't otherwise expose superblocks to userspace at
>> > all, so this seems like the API is somewhat exposing internal kernel
>> > implementation behind mounts. However, there -is- a watch_mount()
>> > syscall floating around somewhere, so it makes me wonder exactly why
>> > we need a second syscall and interface protocol to expose
>> > essentially the same path-based watch information to userspace.
>>
>> I think these are indeed different syscalls, but maybe a bit misnamed.
>>
>> If not by path, how could we uniquely identify an entire filesystem?
>
> Exactly why do we need to uniquely identify a filesystem based on
> it's superblock? Surely it's already been identified by path by the
> application that registered the watch?
I see. In fact, we don't, as that is an internal concept. The patch
abuses the term superblock to refer to the entire filesystem. I should
to operate in terms of mounts.
>> Maybe pointing to a block device that has a valid filesystem and in the
>> case of fs spawning through multiple devices, consider all of them? But
>> that would not work for some misc filesystems, like tmpfs.
>
> It can't be block device based at all - think NFS, CIFS, etc. We
> can't use UUIDs, because not all filesystem have them, and snapshots
> often have identical UUIDs.
>
> Really, I think "superblock" notifications are extremely problematic
> because the same superblock can be shared across different security
> contexts. I'm not sure what the solution might be, but I really
> don't like the idea of a mechanism that can report errors in objects
> outside the visibility of a namespaced container to that container
> just because it has access to some path inside a much bigger
> filesystem that is mostly out of bounds to that container.
I see. To solve the container visibility problem, would it suffice to
forbid watching partial mounts of a filesystem? For instance, either
the watched path is the root_sb or the API returns EINVAL. This limits
the usability of the API to whoever controls the root of the filesystem,
which seems to cover the use case of the host monitoring an entire
filesystem. Would this limitation be acceptable?
Alternatively, we want something similar to fanotify FAN_MARK_FILESYSTEM
semantics? I suppose global errors (like an ext4 fs abort) should be
reported individually for every mountpoint, while inode errors are only
reported for each mountpoint for which the object is accessible.
--
Gabriel Krisman Bertazi
Powered by blists - more mailing lists