[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26532.55014.864941.551897@quad.stoffel.home>
Date: Thu, 6 Feb 2025 10:36:06 -0500
From: "John Stoffel" <john@...ffel.org>
To: NeilBrown <neilb@...e.de>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
Jan Kara <jack@...e.cz>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jeff Layton <jlayton@...nel.org>,
Dave Chinner <david@...morbit.com>,
linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/19 v7?] RFC: Allow concurrent and async changes in a directory
>>>>> "NeilBrown" == NeilBrown <neilb@...e.de> writes:
> This is my latest attempt at removing the requirement for an exclusive
> lock on a directory which performing updates in this. This version,
> inspired by Dave Chinner, goes a step further and allow async updates.
This initial sentence reads poorly to me. I think you maybe are
trying to say:
This is my latest attempt to removing the requirement for writers to
have an exclusive lock on a directory when performing updates on
entries in that directory. This allows for parallel updates by
multiple processes (connections? hosts? clients?) to improve scaling
of large filesystems.
I get what you're trying to do here, and I applaud it! I just
struggled over the intro here.
> The inode operation still requires the inode lock, at least a shared
> lock, but may return -EINPROGRES and then continue asynchronously
> without needing any ongoing lock on the directory.
> An exclusive lock on the dentry is held across the entire operation.
> This change requires various extra checks. rmdir must ensure there is
> no async creation still happening. rename between directories must
> ensure non of the relevant ancestors are undergoing async rename. There
> may be or checks that I need to consider - mounting?
> One other important change since my previous posting is that I've
> dropped the idea of taking a separate exclusive lock on the directory
> when the fs doesn't support shared locking. This cannot work as it
> doeesn't prevent lookups and filesystems don't expect a lookup while
> they are changing a directory. So instead we need to choose between
> exclusive or shared for the inode on a case-by-case basis.
> To make this choice we divide all ops into four groups: create, remove,
> rename, open/create. If an inode has no operations in the group that
> require an exclusive lock, then a flag is set on the inode so that
> various code knows that a shared lock is sufficient. If the flag is not
> set, an exclusive lock is obtained.
> I've also added rename handling and converted NFS to use all _async ops.
> The motivation for this comes from the general increase in scale of
> systems. We can support very large directories and many-core systems
> and applications that choose to use large directories can hit
> unnecessary contention.
> NFS can easily hit this when used over a high-latency link.
> Lustre already has code to allow concurrent directory updates in the
> back-end filesystem (ldiskfs - a slightly modified ext4).
> Lustre developers believe this would also benefit the client-side
> filesystem with large core counts.
> The idea behind the async support is to eventually connect this to
> io_uring so that one process can launch several concurrent directory
> operations. I have not looked deeply into io_uring and cannot be
> certain that the interface I've provided will be able to be used. I
> would welcome any advice on that matter, though I hope to find time to
> explore myself. For now if any _async op returns -EINPROGRESS we simply
> wait for the callback to indicate completion.
> Test status: only light testing. It doesn't easily blow up, but lockdep
> complains that repeated calls to d_update_wait() are bad, even though
> it has balanced acquire and release calls. Weird?
> Thanks,
> NeilBrown
> [PATCH 01/19] VFS: introduce vfs_mkdir_return()
> [PATCH 02/19] VFS: use global wait-queue table for d_alloc_parallel()
> [PATCH 03/19] VFS: use d_alloc_parallel() in lookup_one_qstr_excl()
> [PATCH 04/19] VFS: change kern_path_locked() and
> [PATCH 05/19] VFS: add common error checks to lookup_one_qstr()
> [PATCH 06/19] VFS: repack DENTRY_ flags.
> [PATCH 07/19] VFS: repack LOOKUP_ bit flags.
> [PATCH 08/19] VFS: introduce lookup_and_lock() and friends
> [PATCH 09/19] VFS: add _async versions of the various directory
> [PATCH 10/19] VFS: introduce inode flags to report locking needs for
> [PATCH 11/19] VFS: Add ability to exclusively lock a dentry and use
> [PATCH 12/19] VFS: enhance d_splice_alias to accommodate shared-lock
> [PATCH 13/19] VFS: lock dentry for ->revalidate to avoid races with
> [PATCH 14/19] VFS: Ensure no async updates happening in directory
> [PATCH 15/19] VFS: Change lookup_and_lock() to use shared lock when
> [PATCH 16/19] VFS: add lookup_and_lock_rename()
> [PATCH 17/19] nfsd: use lookup_and_lock_one() and
> [PATCH 18/19] nfs: change mkdir inode_operation to mkdir_async
> [PATCH 19/19] nfs: switch to _async for all directory ops.
Powered by blists - more mailing lists