lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 21 Sep 2020 11:20:49 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Michael Larabel <Michael@...haellarabel.com>,
        Matthieu Baerts <matthieu.baerts@...sares.net>,
        Amir Goldstein <amir73il@...il.com>,
        Ted Ts'o <tytso@...gle.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Jan Kara <jack@...e.cz>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking

On Sun, Sep 20, 2020 at 04:31:57PM -0700, Linus Torvalds wrote:
> On Sun, Sep 20, 2020 at 4:23 PM Dave Chinner <david@...morbit.com> wrote:
> >
> > FWIW, if the fs layer is already providing this level of IO
> > exclusion w.r.t. address space access, does it need to be replicated
> > at the address space level?
> 
> Honestly, I'd rather do it the other way, and go "if the vfs layer
> were to provide the IO exclusion, maybe the filesystems can drop it?

I'm not sure it can because of the diversity of filesystems and
their locking requirements. XFS spent many, many years ignoring the
VFS inode locking because it was a mutex and instead did all it's
own locking with rwsems internally.

> Because we end up having something like 60 different filesystems. It's
> *really* hard to know that "Yeah, this filesystem does it right".

Agreed, but I don't think moving the serialisation up to the VFS
will fix that because many filesysetms will still have to do their
own thing. e.g. cluster filesystems requiring cluster locks first,
not local inode rwsems....

> And if we do end up doing it at both levels, and end up having some of
> the locking duplicated, that's still better than "sometimes we don't
> do it at all", and have odd problems on the less usual (and often less
> well maintained) filesystems..

Sure.

However, the problem I'm trying to avoid arises when the filesystem
is able to do things concurrently that the new locking in the
VFS/address space can't do concurrently.

e.g. the range locking I'm working on for XFS allows truncate to run
concurrently with reads, writes, fallocate(), etc and it all just
works right now. Adding address space wide exclusive locking to the
page cache will likely defeat this - we can no longer run fallocate()
concurrently and so cannot solve the performance problems we have
with qemu+qcow2 where it uses fallocate() to initialise sparse
clusters and so fallocate() on a single uninitialised cluster
serialises all IO to the image file.

Hence I'd like to avoid introducing address-space wide serialisation
for invalidation at the page cache level just as we are about to
enable concurrency for operations that do page cache invalidation.
Range locking in the page cache would be fine, but global locking
will be .... problematic.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ