lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250421004419.GU2023217@ZenIV>
Date: Mon, 21 Apr 2025 01:44:19 +0100
From: Al Viro <viro@...iv.linux.org.uk>
To: Ian Kent <ikent@...hat.com>
Cc: Christian Brauner <brauner@...nel.org>, Ian Kent <raven@...maw.net>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	Mark Brown <broonie@...nel.org>,
	Eric Chanudet <echanude@...hat.com>, Jan Kara <jack@...e.cz>,
	Clark Williams <clrkwllms@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
	Alexander Larsson <alexl@...hat.com>,
	Lucas Karpinski <lkarpins@...hat.com>, Aishwarya.TCV@....com
Subject: Re: [PATCH v4] fs/namespace: defer RCU sync for MNT_DETACH umount

On Mon, Apr 21, 2025 at 08:12:25AM +0800, Ian Kent wrote:

> > One issue is of course that we need to guarantee that someone will
> > always put the last reference.
> > 
> > Another is that the any scheme that elides the grace period in
> > namespace_unlock() will also mess up the fastpath in mntput_no_expire()
> > such that we either have to take lock_mount_hash() on each
> > mntput_no_expire() which is definitely a no-no or we have to come up
> > with an elaborate scheme where we do a read_seqbegin() and
> > read_seqcount_retry() dance based on mount_lock. Then we still need to
> > fallback on lock_mount_hash() if the sequence count has changed. It's
> > ugly and it will likely be noticable during RCU lookup.
> 
> But the mounts are unhashed before we get to the scu sync, doesn't that
> 
> buy us an opportunity for the seqlock dance to be simpler?

What does being unhashed have to do with anything?  Both unhashing and
(a lot more relevant) clearing ->mnt_ns happens before the delay - the
whole point of mntput_no_expire() fastpath is that the absolute majority
of calls happens when the damn thing is still mounted and we are
definitely not dropping the last reference.

We use "has non-NULL ->mnt_ns" as a cheap way to check that the reference
that pins the mounted stuff is still there.  The race we need to cope with
is

A: mntput_no_expire():
	sees ->mnt_ns != NULL, decides that we can go for fast path
B: umount(2):
	detaches the sucker from mount tree, clears ->mnt_ns, drops the
	reference that used to be there for the sucker being mounted.
	The only reference is one being dropped by A.
A: OK, so we are on the fast path; plain decrement is all we need

... and now mount has zero refcount and no one who would destroy it.

RCU delay between the removal from mount tree (no matter which test
you are using to detect that) and dropping the reference that used
to be due to being mounted avoids that - we have rcu_read_lock()
held over the entire fast path of mntput_no_expire(), both the
test (whichever test we want to use) and decrement.  We get
A: rcu_read_lock()
A: see that victim is still in the tree (again, in whatever way you
want - doesn't matter)
A: OK, we know that grace period between the removal from tree and
dropping the reference couldn't have started before our rcu_read_lock().
Therefore, even if we are seeing stale data and it's being unmounted,
we are still guaranteed that refcount is at least 2 - the one we are
dropping and the one umount() couldn't have gotten around to dropping
yet.  So the last reference will not be gone until we drop rcu_read_lock()
and we can safely decrement refcount without checking for zero.

And yes, that case of mntput() is _MUCH_ hotter than any umount-related
load might ever be.  Many orders of magnitude hotter...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ