lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 8 Apr 2021 12:28:08 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Michel Lespinasse <michel@...pinasse.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Linux-MM <linux-mm@...ck.org>,
        Laurent Dufour <ldufour@...ux.ibm.com>,
        Michal Hocko <mhocko@...e.com>,
        Rik van Riel <riel@...riel.com>,
        Paul McKenney <paulmck@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Rom Lemarchand <romlem@...gle.com>,
        Linux-Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 24/37] mm: implement speculative handling in
 __do_fault()

On Thu, Apr 08, 2021 at 01:37:34AM -0700, Michel Lespinasse wrote:
> On Thu, Apr 08, 2021 at 08:13:43AM +0100, Matthew Wilcox wrote:
> > On Thu, Apr 08, 2021 at 09:00:26AM +0200, Peter Zijlstra wrote:
> > > On Wed, Apr 07, 2021 at 10:27:12PM +0100, Matthew Wilcox wrote:
> > > > Doing I/O without any lock held already works; it just uses the file
> > > > refcount.  It would be better to use a vma refcount, as I already said.
> > > 
> > > The original workload that I developed SPF for (waaaay back when) was
> > > prefaulting a single huge vma. Using a vma refcount was a total loss
> > > because it resulted in the same cacheline contention that down_read()
> > > was having.
> > > 
> > > As such, I'm always incredibly sad to see mention of vma refcounts.
> > > They're fundamentally not solving the problem :/
> > 
> > OK, let me outline my locking scheme because I think it's rather better
> > than Michel's.  The vma refcount is the slow path.
> > 
> > 1. take the RCU read lock
> > 2. walk the pgd/p4d/pud/pmd
> > 3. allocate page tables if necessary.  *handwave GFP flags*.
> > 4. walk the vma tree
> > 5. call ->map_pages
> > 6. take ptlock
> > 7. insert page(s)
> > 8. drop ptlock
> > if this all worked out, we're done, drop the RCU read lock and return.
> > 9. increment vma refcount
> > 10. drop RCU read lock
> > 11. call ->fault
> > 12. decrement vma refcount
> 
> Note that most of your proposed steps seem similar in principle to mine.
> Looking at the fast path (steps 1-8):
> - step 2 sounds like the speculative part of __handle_mm_fault()
> - (step 3 not included in my proposal)
> - step 4 is basically the lookup I currently have in the arch fault handler
> - step 6 sounds like the speculative part of map_pte_lock()
> 
> I have working implementations for each step, while your proposal
> summarizes each as a point item. It's not clear to me what to make of it;
> presumably you would be "filling in the blanks" in a different way
> than I have but you are not explaining how. Are you suggesting that
> the precautions taken in each step to avoid races with mmap writers
> would not be necessary in your proposal ? if that is the case, what is
> the alternative mechanism would you use to handle such races ?

I don't know if you noticed, I've been a little busy with memory folios.
I did tell you that on the call, but you don't seem to retain anything
I tell you on the call, so maybe I shouldn't bother calling in any more.

> Going back to the source of this, you suggested not copying the VMA,
> what is your proposed alternative ? Do you suggest that fault handlers
> should deal with the vma potentially mutating under them ? Or should
> mmap writers consider vmas as immutable and copy them whenever they
> want to change them ? or are you implying a locking mechanism that would
> prevent mmap writers from executing while the fault is running ?

The VMA should be immutable, as I explained to you before.

Powered by blists - more mailing lists