lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-id: <173889472120.22054.9081398929242818418@noble.neil.brown.name>
Date: Fri, 07 Feb 2025 13:18:41 +1100
From: "NeilBrown" <neilb@...e.de>
To: "John Stoffel" <john@...ffel.org>
Cc: "Alexander Viro" <viro@...iv.linux.org.uk>,
 "Christian Brauner" <brauner@...nel.org>, "Jan Kara" <jack@...e.cz>,
 "Linus Torvalds" <torvalds@...ux-foundation.org>,
 "Jeff Layton" <jlayton@...nel.org>, "Dave Chinner" <david@...morbit.com>,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject:
 Re: [PATCH 00/19 v7?] RFC: Allow concurrent and async changes in a directory

On Fri, 07 Feb 2025, John Stoffel wrote:
> >>>>> "NeilBrown" == NeilBrown  <neilb@...e.de> writes:
> 
> > This is my latest attempt at removing the requirement for an exclusive
> > lock on a directory which performing updates in this.  This version,
> > inspired by Dave Chinner, goes a step further and allow async updates.
> 
> This initial sentence reads poorly to me.  I think you maybe are
> trying to say:
> 
>   This is my latest attempt to removing the requirement for writers to
>   have an exclusive lock on a directory when performing updates on
>   entries in that directory.  This allows for parallel updates by
>   multiple processes (connections? hosts? clients?) to improve scaling
>   of large filesystems. 
> 
> I get what you're trying to do here, and I applaud it!  I just
> struggled over the intro here.  

Yes, my intro was rather poorly worded.  I think your version is much
better.  Thanks.

NeilBrown

> 
> 
> > The inode operation still requires the inode lock, at least a shared
> > lock, but may return -EINPROGRES and then continue asynchronously
> > without needing any ongoing lock on the directory.
> 
> > An exclusive lock on the dentry is held across the entire operation.
> 
> > This change requires various extra checks.  rmdir must ensure there is
> > no async creation still happening.  rename between directories must
> > ensure non of the relevant ancestors are undergoing async rename.  There
> > may be or checks that I need to consider - mounting?
> 
> > One other important change since my previous posting is that I've
> > dropped the idea of taking a separate exclusive lock on the directory
> > when the fs doesn't support shared locking.  This cannot work as it
> > doeesn't prevent lookups and filesystems don't expect a lookup while
> > they are changing a directory.  So instead we need to choose between
> > exclusive or shared for the inode on a case-by-case basis.
> 
> > To make this choice we divide all ops into four groups: create, remove,
> > rename, open/create.  If an inode has no operations in the group that
> > require an exclusive lock, then a flag is set on the inode so that
> > various code knows that a shared lock is sufficient.  If the flag is not
> > set, an exclusive lock is obtained.
> 
> > I've also added rename handling and converted NFS to use all _async ops.
> 
> > The motivation for this comes from the general increase in scale of
> > systems.  We can support very large directories and many-core systems
> > and applications that choose to use large directories can hit
> > unnecessary contention.
> 
> > NFS can easily hit this when used over a high-latency link.
> > Lustre already has code to allow concurrent directory updates in the
> > back-end filesystem (ldiskfs - a slightly modified ext4).
> > Lustre developers believe this would also benefit the client-side
> > filesystem with large core counts.
> 
> > The idea behind the async support is to eventually connect this to
> > io_uring so that one process can launch several concurrent directory
> > operations.  I have not looked deeply into io_uring and cannot be
> > certain that the interface I've provided will be able to be used.  I
> > would welcome any advice on that matter, though I hope to find time to
> > explore myself.  For now if any _async op returns -EINPROGRESS we simply
> > wait for the callback to indicate completion.
> 
> > Test status:  only light testing.  It doesn't easily blow up, but lockdep
> > complains that repeated calls to d_update_wait() are bad, even though
> > it has balanced acquire and release calls. Weird?
> 
> > Thanks,
> > NeilBrown
> 
> >  [PATCH 01/19] VFS: introduce vfs_mkdir_return()
> >  [PATCH 02/19] VFS: use global wait-queue table for d_alloc_parallel()
> >  [PATCH 03/19] VFS: use d_alloc_parallel() in lookup_one_qstr_excl()
> >  [PATCH 04/19] VFS: change kern_path_locked() and
> >  [PATCH 05/19] VFS: add common error checks to lookup_one_qstr()
> >  [PATCH 06/19] VFS: repack DENTRY_ flags.
> >  [PATCH 07/19] VFS: repack LOOKUP_ bit flags.
> >  [PATCH 08/19] VFS: introduce lookup_and_lock() and friends
> >  [PATCH 09/19] VFS: add _async versions of the various directory
> >  [PATCH 10/19] VFS: introduce inode flags to report locking needs for
> >  [PATCH 11/19] VFS: Add ability to exclusively lock a dentry and use
> >  [PATCH 12/19] VFS: enhance d_splice_alias to accommodate shared-lock
> >  [PATCH 13/19] VFS: lock dentry for ->revalidate to avoid races with
> >  [PATCH 14/19] VFS: Ensure no async updates happening in directory
> >  [PATCH 15/19] VFS: Change lookup_and_lock() to use shared lock when
> >  [PATCH 16/19] VFS: add lookup_and_lock_rename()
> >  [PATCH 17/19] nfsd: use lookup_and_lock_one() and
> >  [PATCH 18/19] nfs: change mkdir inode_operation to mkdir_async
> >  [PATCH 19/19] nfs: switch to _async for all directory ops.
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ