linux-ext4 - Re: Kernel Benchmarking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whc5CnTUWoeeCDj640Rng4nH8HdLsHgEdnz3NtPSRqqhQ@mail.gmail.com>
Date:   Thu, 17 Sep 2020 10:51:41 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Michael Larabel <Michael@...haellarabel.com>,
        Matthieu Baerts <matthieu.baerts@...sares.net>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Amir Goldstein <amir73il@...il.com>,
        "Ted Ts'o" <tytso@...gle.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Jan Kara <jack@...e.cz>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking

On Mon, Sep 14, 2020 at 10:47 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> (Note that it's a commit and has a SHA1, but it's from my "throw-away
> tree for testing", so it doesn't have my sign-off or any real commit
> message yet: I'll do that once it gets actual testing and comments).

Just to keep the list and people who were on this thread informed:
Michal ended up doing more benchmarking, and everything seems to line
up and yes, that patch continues to work fine with a 'unfairness'
value of 5.

So I've committed it to my git tree (not pushed out yet, I have other
pull requests etc I'm handling too), and we'll see if anybody can come
up with a better model for how to avoid the page locking being such a
pain. Or if somebody can figure out why fair locking causes problems
for that packetdrill load that Matthieu reported.

It does strike me that if the main source of contention comes from
that "we need to check that the mapping is still valid as we insert
the page into the page tables", then the page lock really isn't the
obvious lock to use.

It would be much more natural to use the mapping->i_mmap_rwsem, I feel.

Willy? Your "just check for uptodate without any lock" patch itself
feels wrong. That's what we do for plain reads, but the difference is
that a read is a one-time event and a race is fine: we get valid data,
it's just that it's only valid *concurrently* with the truncate or
hole-punching event (ie either all zeroes or old data is fine).

The reason faulting a page in is different from a read is that if you
then map in a stale page, it might have had the correct contents at
the time of the fault, but it will not have the correct contents going
forward.

So a page-in requires fundamentally stronger locking than a read()
does, because of how the page-in causes that "future lifetime" of the
page, in ways a read() event does not.

But truncation that does page cache removal already requires that
i_mmap_rwsem, and in fact the VM already very much uses that (ie when
walking the page mapping).

The other alternative might be just the mapping->private_lock. It's
not a reader-writer lock, but if we don't need to sleep (and I don't
think the final "check ->mapping" can sleep anyway since it has to be
done together with the page table lock), a spinlock would be fine.

                   Linus