linux-ext4 - Re: [PATCH v6 06/17] fsnotify: generate pre-content permission event on open

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxgxtQhe_3mj5SwH9568xEFsxtNqexLfw9Wx_53LPmyD=Q@mail.gmail.com>
Date: Tue, 12 Nov 2024 00:59:43 +0100
From: Amir Goldstein <amir73il@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Josef Bacik <josef@...icpanda.com>, kernel-team@...com, linux-fsdevel@...r.kernel.org, 
	jack@...e.cz, brauner@...nel.org, linux-xfs@...r.kernel.org, 
	linux-btrfs@...r.kernel.org, linux-mm@...ck.org, linux-ext4@...r.kernel.org
Subject: Re: [PATCH v6 06/17] fsnotify: generate pre-content permission event
 on open

On Tue, Nov 12, 2024 at 12:22 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Mon, 11 Nov 2024 at 14:46, Josef Bacik <josef@...icpanda.com> wrote:
> >
> > Did you see the patch that added the
> > fsnotify_file_has_pre_content_watches() thing?
>
> No, because I had gotten to patch 6/11, and it added this open thing,
> and there was no such thing in any of the patches before it.
>
> It looks like you added FSNOTIFY_PRE_CONTENT_EVENTS in 11/17.
>
> However, at no point does it look like you actually test it at open
> time, so none of this seems to matter.
>
> As far as I can see, even at the end of the series, you will call the
> fsnotify hook at open time even if there are no content watches on the
> file.
>
> So apparently the fsnotify_file_has_pre_content_watches() is not
> called when it should be, and when it *is* called, it's also doing
> completely the wrong thing.
>
> Look, for basic operations THAT DON'T CARE, you now added a function
> call to fsnotify_file_has_pre_content_watches(), that function call
> looks at inode->i_sb->s_iflags (doing two D$ accesses that shouldn't
> be done!), and then after that looks at the i_fsnotify_mask.
>
> THIS IS EXACTLY THE KIND OF GARBAGE I'M TALKING ABOUT.
>
> This code has been written by somebody who NEVER EVER looked at
> profiles. You're following chains of pointers when you never should.
>
> Look, here's a very basic example of the kind of complete mis-design
> I'm talking about:
>
>  - we're doing a basic read() on a file that isn't being watched.
>
>  - we want to maybe do read-ahead
>
>  - the code does
>
>         if (fsnotify_file_has_pre_content_watches(file))
>                 return fpin;
>
>    to say that "don't do read-ahead".
>
> Fine, I understand the concept. But keep in mind that the common case
> is presumably that there are no content watches.
>
> And even ignoring the "common case" issue, that's the one you want to
> OPTIMIZE for. That's the case that matters for performance, because
> clearly if there are content watches, you're going to go into "Go
> Slow" mode anyway and not do pre-fetching. So even if content watches
> are common on some load, they are clearly not the case you should do
> performance optimization for.
>
> With me so far?
>
> So if THAT is the case that matters, then dammit, we shouldn't be
> calling a function at all.
>
> And when calling the function, we shouldn't start out with this
> completely broken logic:
>
>         struct inode *inode = file_inode(file);
>         __u32 mnt_mask = real_mount(file->f_path.mnt)->mnt_fsnotify_mask;
>
>         if (!(inode->i_sb->s_iflags & SB_I_ALLOW_HSM))
>                 return false;
>
> that does random crap and looks up some "mount mask" and looks up the
> superblock flags.
>
> Why shouldn't we do this?
>
> BECAUSE NONE OF THIS MATTERS IF THE FILE HASN'T EVEN BEEN MARKED FOR
> CONTENT MATCHES!
>
> See why I'm shouting? You're doing insane things, and you're doing
> them for all the cases that DO NOT MATTER. You're doing all of this
> for the common case that doesn't want to see that kind of mindless
> overhead.
>
> You literally check for the "do I even care" *last*, when you finally
> do that fsnotify_object_watched() check that looks at the inode. But
> by then you have already wasted all that time and effort, and
> fsnotify_object_watched() is broken anyway, because it's stupidly
> designed to require that mnt_mask that isn't needed if you have
> properly marked each object individually.
>
> So what *should* you have?
>
> You should have had a per-file flag saying "Do I need to even call
> this crud at all", and have it in a location where you don't need to
> look at anything else.
>
> And fsnotify already basically has that flag, except it's mis-designed
> too. We have FMODE_NONOTIFY, which is the wrong way around (saying
> "don't notify", when that should just be the *default*), and the
> fsnotify layer uses it only to mark its own internal files so that it
> doesn't get called recursively. So that flag that *looks* sane and is
> in the right location is actually doing the wrong thing, because it's
> dealing with a rare special case, not the important cases that
> actually matter.
>
> So all of this readahead logic - and all of the read and write hooks -
> should be behind a simple "oh, this file doesn't have any notification
> stuff, so don't bother calling any fsnotify functions".
>
> So I think the pattern should be
>
>     static inline bool fsnotify_file_has_pre_content_watches(struct file *file)
>     {
>         if (unlikely(file->f_mode & FMODE_NOTIFY))
>                 return out_of_line_crud(file);
>         return false;
>     }
>

I think that's a good idea for pre-content events, because it's fine
to say that if the sb/mount was not watched by a pre-content event listener
at the time of file open, then we do not care.

The problem is that legacy inotify/fanotify watches can be added after
file is open,
so that is allegedly why this optimization was not done for fsnotify
hooks in the past.

Thanks,
Amir.