lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wi_wwTxPTnFXsG8zdaem5YDnSd4OsCeP78yJgueQCb-1g@mail.gmail.com>
Date:   Fri, 26 Aug 2022 12:06:55 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     NeilBrown <neilb@...e.de>
Cc:     Al Viro <viro@...iv.linux.org.uk>, Daire Byrne <daire@...g.com>,
        Trond Myklebust <trond.myklebust@...merspace.com>,
        Chuck Lever <chuck.lever@...cle.com>,
        Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
        linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 01/10] VFS: support parallel updates in the one directory.

On Thu, Aug 25, 2022 at 7:16 PM NeilBrown <neilb@...e.de> wrote:
>
> If a filesystem supports parallel modification in directories, it sets
> FS_PAR_DIR_UPDATE on the file_system_type.  lookup_open() and the new
> lookup_hash_update() notice the flag and take a shared lock on the
> directory, and rely on a lock-bit in d_flags, much like parallel lookup
> relies on DCACHE_PAR_LOOKUP.

Ugh.

I absolutely believe in the DCACHE_PAR_LOOKUP model, and in "parallel
updates" being important, but I *despise* locking code like this

+       if (wq && IS_PAR_UPDATE(dir))
+               inode_lock_shared_nested(dir, I_MUTEX_PARENT);
+       else
+               inode_lock_nested(dir, I_MUTEX_PARENT);

and I really really hope there's some better model for this.

That "wq" test in particular is just absolutely disgusting. So now it
doesn't just depend on whether the directory supports parallel
updates, now the caller can choose whether to do the parallel thing or
not, and that's how "create" is different from "rename".

And that last difference is, I think, the one that makes me go "No. HELL NO".

Instead of it being up to the filesystem to say "I can do parallel
creates, but I need to serialize renames", this whole thing has been
set up to be about the caller making that decision.

That's just feels horribly horribly wrong.

Yes, I realize that to you that's just a "later patches will do
renames", but what if it really is a filesystem issue where the
filesystem can easily handle new names, but needs something else for
renames because it has various cross-directory issues, perhaps?

So I feel this is fundamentally wrong, and this ugliness is a small
effect of that wrongness.

I think we should strive for

 (a) make that 'wq' and DCACHE_PAR_LOOKUP bit be unconditional

 (b) aim for the inode lock being taken *after* the _lookup_hash(),
since the VFS layer side has to be able to handle the concurrency on
the dcache side anyway

 (c) at that point, the serialization really ends up being about the
call into the filesystem, and aim to simply move the
inode_lock{_shared]_nested() into the filesystem so that there's no
need for a flag and related conditional locking at all.

Because right now I think the main reason we cannot move the lock into
the filesystem is literally that we've made the lock cover not just
the filesystem part, but the "lookup and create dentry" part too.

But once you have that "DCACHE_PAR_LOOKUP" bit and the
d_alloc_parallel() logic to serialize a _particular_ dentry being
created (as opposed to serializing all the sleeping ops to that
directly), I really think we should strive to move the locking - that
no longer helps the VFS dcache layer - closer to the filesystem call
and eventually into it.

Please? I think these patches are "mostly going in the right
direction", but at the same time I feel like there's some serious
mis-design going on.

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ