linux-kernel - Re: [PATCH/RFC 00/10 v5] Improve scalability of directory operations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <25352.56248.283092.213037@quad.stoffel.home>
Date:   Fri, 26 Aug 2022 10:42:00 -0400
From:   "John Stoffel" <john@...ffel.org>
To:     NeilBrown <neilb@...e.de>
Cc:     Al Viro <viro@...iv.linux.org.uk>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Daire Byrne <daire@...g.com>,
        Trond Myklebust <trond.myklebust@...merspace.com>,
        Chuck Lever <chuck.lever@...cle.com>,
        Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
        linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH/RFC 00/10 v5] Improve scalability of directory operations

>>>>> "NeilBrown" == NeilBrown  <neilb@...e.de> writes:

NeilBrown> [I made up "v5" - I haven't been counting]

My first comments, but I'm not a serious developer...

NeilBrown> VFS currently holds an exclusive lock on the directory while making
NeilBrown> changes: add, remove, rename.
NeilBrown> When multiple threads make changes in the one directory, the contention
NeilBrown> can be noticeable.
NeilBrown> In the case of NFS with a high latency link, this can easily be
NeilBrown> demonstrated.  NFS doesn't really need VFS locking as the server ensures
NeilBrown> correctness.

NeilBrown> Lustre uses a single(?) directory for object storage, and has patches
NeilBrown> for ext4 to support concurrent updates (Lustre accesses ext4 directly,
NeilBrown> not via the VFS).

NeilBrown> XFS (it is claimed) doesn't its own locking and doesn't need the VFS to
NeilBrown> help at all.

This sentence makes no sense to me... I assume you meant to say "...does
it's own locking..."

NeilBrown> This patch series allows filesystems to request a shared lock on
NeilBrown> directories and provides serialisation on just the affected name, not the
NeilBrown> whole directory.  It changes both the VFS and NFSD to use shared locks
NeilBrown> when appropriate, and changes NFS to request shared locks.

Are there any performance results?  Why wouldn't we just do a shared
locked across all VFS based filesystems?  

NeilBrown> The central enabling feature is a new dentry flag DCACHE_PAR_UPDATE
NeilBrown> which acts as a bit-lock.  The ->d_lock spinlock is taken to set/clear
NeilBrown> it, and wait_var_event() is used for waiting.  This flag is set on all
NeilBrown> dentries that are part of a directory update, not just when a shared
NeilBrown> lock is taken.

NeilBrown> When a shared lock is taken we must use alloc_dentry_parallel() which
NeilBrown> needs a wq which must remain until the update is completed.  To make use
NeilBrown> of concurrent create, kern_path_create() would need to be passed a wq.
NeilBrown> Rather than the churn required for that, we use exclusive locking when
NeilBrown> no wq is provided.

Is this a per-operation wq or a per-directory wq?  Can there be issues
if someone does something silly like having 1,000 directories, all of
which have multiple processes making parallel changes?  

Does it degrade gracefully if a wq can't be allocated?  

NeilBrown> One interesting consequence of this is that silly-rename becomes a
NeilBrown> little more complex.  As the directory may not be exclusively locked,
NeilBrown> the new silly-name needs to be locked (DCACHE_PAR_UPDATE) as well.
NeilBrown> A new LOOKUP_SILLY_RENAME is added which helps implement this using
NeilBrown> common code.

NeilBrown> While testing I found some odd behaviour that was caused by
NeilBrown> d_revalidate() racing with rename().  To resolve this I used
NeilBrown> DCACHE_PAR_UPDATE to ensure they cannot race any more.

NeilBrown> Testing, review, or other comments would be most welcome,