linux-kernel - Re: [RFC PATCH 24/37] mm: implement speculative handling in __do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210408071343.GJ2531743@casper.infradead.org>
Date:   Thu, 8 Apr 2021 08:13:43 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Michel Lespinasse <michel@...pinasse.org>,
        Linux-MM <linux-mm@...ck.org>,
        Laurent Dufour <ldufour@...ux.ibm.com>,
        Michal Hocko <mhocko@...e.com>,
        Rik van Riel <riel@...riel.com>,
        Paul McKenney <paulmck@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Rom Lemarchand <romlem@...gle.com>,
        Linux-Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 24/37] mm: implement speculative handling in
 __do_fault()

On Thu, Apr 08, 2021 at 09:00:26AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 07, 2021 at 10:27:12PM +0100, Matthew Wilcox wrote:
> > Doing I/O without any lock held already works; it just uses the file
> > refcount.  It would be better to use a vma refcount, as I already said.
> 
> The original workload that I developed SPF for (waaaay back when) was
> prefaulting a single huge vma. Using a vma refcount was a total loss
> because it resulted in the same cacheline contention that down_read()
> was having.
> 
> As such, I'm always incredibly sad to see mention of vma refcounts.
> They're fundamentally not solving the problem :/

OK, let me outline my locking scheme because I think it's rather better
than Michel's.  The vma refcount is the slow path.

1. take the RCU read lock
2. walk the pgd/p4d/pud/pmd
3. allocate page tables if necessary.  *handwave GFP flags*.
4. walk the vma tree
5. call ->map_pages
6. take ptlock
7. insert page(s)
8. drop ptlock
if this all worked out, we're done, drop the RCU read lock and return.
9. increment vma refcount
10. drop RCU read lock
11. call ->fault
12. decrement vma refcount

Compared to today, where we bump the refcount on the file underlying the
vma, this is _better_ scalability -- different mappings of the same file
will not contend on the file's refcount.

I suspect your huge VMA was anon, and that wouldn't need a vma refcount
as faulting in new pages doesn't need to do I/O, just drop the RCU
lock, allocate and retry.