linux-kernel - Re: [PATCH v4] fs/namespace: defer RCU sync for MNT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250409142510.PIlMaZhX@linutronix.de>
Date: Wed, 9 Apr 2025 16:25:10 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: Christian Brauner <brauner@...nel.org>,
	Eric Chanudet <echanude@...hat.com>,
	Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>,
	Clark Williams <clrkwllms@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>, Ian Kent <ikent@...hat.com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-rt-devel@...ts.linux.dev,
	Alexander Larsson <alexl@...hat.com>,
	Lucas Karpinski <lkarpins@...hat.com>
Subject: Re: [PATCH v4] fs/namespace: defer RCU sync for MNT_DETACH umount

On 2025-04-09 16:02:29 [+0200], Mateusz Guzik wrote:
> On Wed, Apr 09, 2025 at 03:14:44PM +0200, Sebastian Andrzej Siewior wrote:
> > One question: Do we need this lazy/ MNT_DETACH case? Couldn't we handle
> > them all via queue_rcu_work()?
> > If so, couldn't we have make deferred_free_mounts global and have two
> > release_list, say release_list and release_list_next_gp? The first one
> > will be used if queue_rcu_work() returns true, otherwise the second.
> > Then once defer_free_mounts() is done and release_list_next_gp not
> > empty, it would move release_list_next_gp -> release_list and invoke
> > queue_rcu_work().
> > This would avoid the kmalloc, synchronize_rcu_expedited() and the
> > special-sauce.
> > 
> 
> To my understanding it was preferred for non-lazy unmount consumers to
> wait until the mntput before returning.
> 
> That aside if I understood your approach it would de facto serialize all
> of these?
> 
> As in with the posted patches you can have different worker threads
> progress in parallel as they all get a private list to iterate.
> 
> With your proposal only one can do any work.
> 
> One has to assume with sufficient mount/unmount traffic this can
> eventually get into trouble.

Right, it would serialize them within the same worker thread. With one
worker for each put you would schedule multiple worker from the RCU
callback. Given the system_wq you will schedule them all on the CPU
which invokes the RCU callback. This kind of serializes it, too.

The mntput() callback uses spinlock_t for locking and then it frees
resources. It does not look like it waits for something nor takes ages.
So it might not be needed to split each put into its own worker on a
different CPU… One busy bee might be enough ;)

Sebastian