[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxh7aT+EvWYMa9v=SyRjfdh4Je_FmS0+TNqonHE5Z+_TPw@mail.gmail.com>
Date: Tue, 12 Nov 2024 09:11:32 +0100
From: Amir Goldstein <amir73il@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Josef Bacik <josef@...icpanda.com>, kernel-team@...com, linux-fsdevel@...r.kernel.org,
jack@...e.cz, brauner@...nel.org, linux-xfs@...r.kernel.org,
linux-btrfs@...r.kernel.org, linux-mm@...ck.org, linux-ext4@...r.kernel.org
Subject: Re: [PATCH v6 06/17] fsnotify: generate pre-content permission event
on open
On Tue, Nov 12, 2024 at 1:37 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Mon, 11 Nov 2024 at 16:00, Amir Goldstein <amir73il@...il.com> wrote:
> >
> > I think that's a good idea for pre-content events, because it's fine
> > to say that if the sb/mount was not watched by a pre-content event listener
> > at the time of file open, then we do not care.
>
> Right.
>
> > The problem is that legacy inotify/fanotify watches can be added after
> > file is open, so that is allegedly why this optimization was not done for
> > fsnotify hooks in the past.
>
> So honestly, even if the legacy fsnotify hooks can't look at the file
> flag, they could damn well look at an inode flag.
>
Legacy fanotify has a mount watch (FAN_MARK_MOUNT),
which is the common way for Anti-malware to set watches on
filesystems, so I am not sure what you are saying.
> And I'm not even convinced that we couldn't fix them to just look at a
> file flag, and say "tough luck, somebody opened that file before you
> started watching, you don't get to see what they did".
That would specifically break tail -f (for inotify) and probably many other
tools, but as long as we also look at the inode flags (i_fsnotify_mask)
and the dentry flags (DCACHE_FSNOTIFY_PARENT_WATCHED),
then I think we may be able to get away with changing the semantics
for open files on a fanotify mount watch.
Specifically, I would really like to eliminate completely the cost of
FAN_ACCESS_PERM event, which could be gated on file flag, because
this is only for security/Anti-malware and I don't think this event is
practically
useful and it sure does not need to guarantee permission events to mount
watchers on already open files.
>
> So even if we don't look at a file->f_mode flag, the lergacy cases
> should look at i_fsnotify_mask, and do that *first*.
>
> IOW, not do it like fsnotify_object_watched() does now, which is just
> broken. Again, it looks at inode->i_sb->s_fsnotify_mask completely
> pointlessly, but it also does it much too late - it gets called after
> we've already called into the fsnotify() code and have messed up the
> I$ etc.
>
> The "linode->i_sb->s_fsnotify_mask" is not only an extra indirection,
> it should be very *literally* pointless. If some bit isn't set in
> i_sb->s_fsnotify_mask, then there should be no way to set that bit in
> inode->i_fsnotify_mask. So the only time we should access
> i_sb->s_fsnotify_mask is when i_notify_mask gets *modified*, not when
> it gets tested.
>
i_fsnotify_mask is the cumulative mask of all inode watchers
s_fsnotify_mask is *not* the cumulative of all i_fsnotify_mask
s_fsnotify_mask is the cumulative mask of all sb watchers
mnt_fsnotify_marks is the cumulative mask of all mount watchers
> But even if that silly and pointless i_sb->s_fsnotify_mask thing is
> removed, fsnotify_object_watched() is *still* wrong, because it
> requires that mnt_mask as an argument, which means that the caller now
> has to look it up - all this entirely pointless work that should never
> be done if the bit wasn't set in inode->i_fsnotify_mask.
>
> So I really think fsnotify is doing *everything* wrong.
Note the difference between fsnotify_sb_has_watchers()
and fsnotify_object_watched().
The former is an early optimization gate that checks if there are
any inode/mount/sb watchers (per class) on the filesystem, regardless
of specific events and specific target inode/file.
We could possibly further optimize fsnotify_sb_has_watchers() to avoid
access to ->s_fsnotify_info by setting sb flag (for each priority class).
The latter checks if any inode/mount/sb are interested in a specific
event on the said object.
In upstream code, fsnotify_object_watched() is always gated behind
fsnotify_sb_has_watchers(), which tries to prevent the indirect call.
The new fsnotify_file_has_pre_content_watches() helper could
have checked fsnotify_sb_has_priority_watchers(sb,
FSNOTIFY_PRIO_CONTENT);
but it is better to gate by file flag as you suggested.
>
> And I most certainly don't want to add more runtime hooks to
> *critical* code like open/read/write.
>
> Right now, many of the fsnotify things are for "metadata", ie for
> bigger file creation / removal / move etc. And yes, the "don't do this
> if there are no fsnotify watchers AT ALL" does actually end up meaning
> that most of the time I never see any of it in profiles, because the
> fsnotify_sb_has_watchers() culls out that case.
>
> And while the fsnotify_sb_has_watchers() thing is broken garbage and
> does too many indirections and is not testing the right thing, at
> least it's inlined and you don't get the function calls.
>
> That doesn't make fsnotify "right", but at least it's not in my face.
> I see the sb accesses, and I hate them, but it's usually at least
> hidden. Admittedly not as well hidden as it *should* be, since it does
> the access tests in the wrong order, but the old fsnotify_open()
> doesn't strike me as "terminally broken".
>
> It doesn't have a permission test after the open has already done
> things, and it's inlined enough that it isn't actively offensive.
>
> And most of the other fsnotify things have the same pattern - not
> great, but not actively offensive.
>
> These new patches make it in my face.
>
> So I do require that the *new* cases at least get it right. The fact
> that we have old code that is misdesigned and gets it wrong and should
> also be improved isn't an excuse to add *more* badly coded stuff.
>
> And yes, if somebody fixes the old fsnotify stuff to check just the
> i_fsnotify_mask in the inline function, and moves all the other silly
> checks out-of-line, that would be an improvement. I'd very much
> applaud that. But it's a separate thing from adding new hooks.
We will do both.
Thanks,
Amir.
Powered by blists - more mailing lists