linux-kernel - Re: [PATCH] vfs: avoid spurious dentry ref/unref cycle on open

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHFoVHk1ZZOa=Bbb1MyGgSxeAK1bMvLPFKnagzuLz7PBGw@mail.gmail.com>
Date: Wed, 7 Aug 2024 07:46:02 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: brauner@...nel.org, jack@...e.cz, linux-kernel@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] vfs: avoid spurious dentry ref/unref cycle on open

On Wed, Aug 7, 2024 at 7:32 AM Al Viro <viro@...iv.linux.org.uk> wrote:
>
> On Wed, Aug 07, 2024 at 05:57:07AM +0200, Mateusz Guzik wrote:
>
> [there'll be a separate reply with what I hope might be a usable
> approach]
>

At the moment I'm trying to get the spurious lockref cycle out of the
way because it is interfering with the actual thing I'm trying to do.

This entire pointer poisoning business is optional and the v2 I posted
does not do any of it.

With this in mind can you review v2? Here is the link for reference:
https://lore.kernel.org/linux-fsdevel/20240806163256.882140-1-mjguzik@gmail.com/T/#u

As for the feedback below, I'm going to circle back to it when I take
a stab at adding debug macros, but no ETA.

> > Yes, this is my understanding of the code and part of my compliant. :)
> >
> > Things just work(tm) as is with NULLified pointers, but this is error-prone.
>
> And carrying the arseloads of information (which ones do and which do not
> need to be dropped) is *less* error-prone?  Are you serious?
>
> > As a hypothetical suppose there is code executing some time after
> > vfs_open which looks at nd->path.dentry and by finding the pointer is
> > NULL it concludes the lookup did not work out.
> >
> > If such code exists *and* the pointer is poisoned in the above sense
> > (notably merely branching on it with kasan already traps), then the
> > consumer will be caught immediately during coverage testing by
> > syzkaller.
>
> You are much too optimistic about the quality of test coverage in this
> particular area.
>
> > If such code exists but the pointer is only nullified, one is only
> > going to find out the hard way when some functionality weirdly breaks.
>
> To do _useful_ asserts, one needs invariants to check.  And "we got
> to this check after having passed through that assignment at some
> earlier point" is not it.  That's why I'm asking questions about
> the state.
>
> The thing is, suppose I (or you, or somebody else) is trying to modify
> the whole thing.  There's a magical mystery assert in the way; what
> should be done with it?  Move it/split it/remove it/do something
> random and hope syzkaller won't catch anything?  If I can reason
> about the predicate being checked, I can at least start figuring out
> what should be done.  If not, it's bloody guaranteed to rot.
>
> This particular area (pathwalk machinery) has a nasty history of
> growing complexity once in a while, with following cleanups and
> massage to get it back into more or less tolerable shape.
> And refactoring that had been _painful_ - I'd done more than
> a few there.
>
> As far as I can tell, at the moment this flag (and yes, I've seen its
> removal in the next version) is "we'd called vfs_open_consume() at
> some point, then found ourselves still in RCU mode or we'd called
> vfs_open_consume() more than once".
>
> This is *NOT* a property of state; it's a property of execution
> history.  The first part is checked in the wrong place - one of
> the invariants (trivially verified by code examination) is that
> LOOKUP_RCU is never regained after it had been dropped.  The
> only place where it can be set is path_init() and calling _that_
> between path_init() and terminate_walk() would be
>         a) a hard and very visible bug
>         b) would've wiped your flag anyway.
> So that part of the check is basically "we are not calling
> vfs_open_consume() under rcu_read_lock()".  Which is definitely
> a desirable property, since ->open() can block.  So can
> mnt_want_write() several lines prior.  Invariant here is
> "the places where we set FMODE_OPENED or FMODE_CREATED may
> not have LOOKUP_RCU".  Having
>         if (!(file->f_mode & (FMODE_OPENED | FMODE_CREATED))) {
>                 error = complete_walk(nd);
>                 if (error)
>                         return error;
>         }
> in the beginning of do_open() guarantees that for vfs_open()
> call there.  All other places where that can happen are in
> lookup_open() or called from it (via ->atomic_open() to
> finish_open()).  And *that* definitely should not be done
> in RCU mode, due to
>         if (open_flag & O_CREAT)
>                 inode_lock(dir->d_inode);
>         else
>                 inode_lock_shared(dir->d_inode);
>         dentry = lookup_open(nd, file, op, got_write);
> in the sole caller of that thing.  Again, can't grab a blocking
> lock under rcu_read_lock().  Which is why we have this
>                 if (WARN_ON_ONCE(nd->flags & LOOKUP_RCU))
>                         return ERR_PTR(-ECHILD);
>         } else {
>                 /* create side of things */
>                 if (nd->flags & LOOKUP_RCU) {
>                         if (!try_to_unlazy(nd))
>                                 return ERR_PTR(-ECHILD);
>                 }
> slightly prior to that call.  WARN_ON_ONCE is basically "lookup_fast()
> has returned NULL and stayed in RCU mode", which should never happen.
> try_to_unlazy() is straight "either switch to non-RCU mode or return an
> error" - that's what this function is for.  No WARN_ON after that - it
> would only obfuscate things.
>
> *IF* you want to add debugging checks for that kind of stuff, just call
> that assert_nonrcu(nd), make it check and whine and feel free to slap
> them in reasonable amount of places (anything that makes a reader go
> "for fuck sake, hadn't we (a) done that on the entry to this function
> and (b) done IO since then, anyway?" is obviously not reasonable, etc. -
> no more than common sense limitations).
>
> Another common sense thing: extra asserts won't confuse syzkaller, but
> they very much can confuse a human reader.  And any rewrites are done
> by humans...
>
> As for the double call of vfs_open_consume()...  You do realize that
> the damn thing wouldn't have reached that check if it would ever have
> cause to be triggered, right?  Seeing that we call
> static inline struct mnt_idmap *mnt_idmap(const struct vfsmount *mnt)
> {
>         /* Pairs with smp_store_release() in do_idmap_mount(). */
>         return smp_load_acquire(&mnt->mnt_idmap);
> }
> near the beginning of do_open(), ~20 lines before the place where
> you added that check...
>
> I'm not sure it makes sense to defend against a weird loop appearing
> out of nowhere near the top of call chain, but if you want to do that,
> this is not the right place for that.



-- 
Mateusz Guzik <mjguzik gmail.com>