lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 25 May 2021 16:58:48 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     David Howells <dhowells@...hat.com>
Cc:     Theodore Ts'o <tytso@....edu>,
        "Darrick J. Wong" <djwong@...nel.org>, Chris Mason <clm@...com>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        xfs <linux-xfs@...r.kernel.org>,
        linux-btrfs <linux-btrfs@...r.kernel.org>,
        linux-cachefs@...hat.com,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        NeilBrown <neilb@...e.com>
Subject: Re: How capacious and well-indexed are ext4, xfs and btrfs
 directories?

On May 25, 2021, at 4:31 PM, David Howells <dhowells@...hat.com> wrote:
> 
> Andreas Dilger <adilger@...ger.ca> wrote:
> 
>> As described elsewhere in the thread, allowing concurrent create and unlink
>> in a directory (rename probably not needed) would be invaluable for scaling
>> multi-threaded workloads.  Neil Brown posted a prototype patch to add this
>> to the VFS for NFS:
> 
> Actually, one thing I'm looking at is using vfs_tmpfile() to create a new file
> (or a replacement file when invalidation is required) and then using
> vfs_link() to attach directory entries in the background (possibly using
> vfs_link() with AT_LINK_REPLACE[1] instead of unlink+link).
> 
> Any thoughts on how that might scale?  vfs_tmpfile() doesn't appear to require
> the directory inode lock.  I presume the directory is required for security
> purposes in addition to being a way to specify the target filesystem.

I don't see how that would help much?  Yes, the tmpfile allocation would be
out-of-line vs. the directory lock, so this may reduce the lock hold time
by some fraction, but this would still need to hold the directory lock
when linking the tmpfile into the directory, in the same way that create
and unlink are serialized against other threads working in the same dir.

Having the directory locking scale with the size of the directory is what
will get orders of magnitude speedups for large concurrent workloads.
In ext4 this means write locking the directory leaf blocks independently,
with read locks for the interior index blocks unless new leaf blocks are
added (they are currently never removed).

It's the same situation as back with the BKL locking the entire kernel,
before we got fine-grained locking throughout the kernel.

> 
> David
> 
> [1] https://lore.kernel.org/linux-fsdevel/cover.1580251857.git.osandov@fb.com/
> 


Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)

Powered by blists - more mailing lists