linux-kernel - Re: [PATCH 08/19] VFS: introduce lookup_and

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250212155132.GQ1977892@ZenIV>
Date: Wed, 12 Feb 2025 15:51:32 +0000
From: Al Viro <viro@...iv.linux.org.uk>
To: NeilBrown <neilb@...e.de>
Cc: Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jeff Layton <jlayton@...nel.org>,
	Dave Chinner <david@...morbit.com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 08/19] VFS: introduce lookup_and_lock() and friends

On Wed, Feb 12, 2025 at 04:22:16PM +1100, NeilBrown wrote:

> lookup_for_removal() etc would only be temporarily needed.  Eventually
> (I hope) we would get to a place where all filesystems support all
> operations with only a shared lock.  When we get there,
> lookup_for_remove() and lookup_for_create() would be identical again.
> 
> And the difference wouldn't be that one takes a shared lock and the
> other takes an exclusive lock.  It would be that one takes a shared or
> exclusive lock based on flag X stored somewhere (inode, inode_operations,
> ...) while the other takes a shared or exclusive lock based on flag Y.
> 
> It would be nice to be able to accelerate that and push the locking down
> into the filesystems call at once as Linus suggested last time:
> 
> https://lore.kernel.org/all/CAHk-=whz69y=98udgGB5ujH6bapYuapwfHS2esWaFrKEoi9-Ow@mail.gmail.com/
> 
> That would require either adding a new rwsem to each inode, possibly in
> the filesystem-private part of the inode, or changing VFS to not lock
> the inode at all.  The first would be unwelcome by fs developers I
> expect, the second would be a serious challenge.  I started thinking
> about and quickly decided I had enough challenges already.

I think it's the wrong way to go.

Your "in-update" state does make sense, but it doesn't go far enough
and it's not really about parallel anything - it's simply "this
dentry is nailed down <here> with <this> name for now".

And _that_ is really useful, provided that it's reliable.  What we
need to avoid is d_drop()/d_rehash() windows, when that "operated
upon" dentry ceases to be visible.

Currently we can do that, provided that parent is held exclusive.
Any lookup will hit dcache miss and proceed to lookup_slow()
path, which will block on attempt to get the parent shared.

As soon as you switch to holding parent shared, that pattern becomes
a source of problems.

And if we deal with that, there's not much reason to nest this
dentry lock inside ->i_rwsem.  Then ->i_rwsem would become easy
to push inside the methods.

Right now the fundamental problem with your locking is that you
get dentry locks sandwiched between ->i_rwsem on parents and that
on children.  We can try to be clever with how we acquire them
(have ->d_parent rechecked before going to sleep, etc.), but
that's rather brittle.

_IF_ we push them outside of ->i_rwsem, the role of ->i_rwsem
would shrink to protecting (1) the directory internal representation,
(2) emptiness checks and (3) link counts.

What goes away is "we are holding it exclusive, so anything that
comes here with dcache miss won't get around to doing anything
until we unlock".