linux-kernel - Re: [QUESTION] about the maple tree and current status of mmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y7LsDgMxHh8NHzDY@casper.infradead.org>
Date:   Mon, 2 Jan 2023 14:37:02 +0000
From:   Matthew Wilcox <willy@...radead.org>
To:     Hyeonggon Yoo <42.hyeyoo@...il.com>
Cc:     linux-mm@...ck.org, liam.howlett@...cle.com, surenb@...gle.com,
        ldufour@...ux.ibm.com, michel@...pinasse.org, vbabka@...e.cz,
        linux-kernel@...r.kernel.org
Subject: Re: [QUESTION] about the maple tree and current status of mmap_lock
 scalability

On Mon, Jan 02, 2023 at 09:04:12PM +0900, Hyeonggon Yoo wrote:
> > https://www.infradead.org/~willy/linux/store-free-page-faults.html
> > outlines how I intend to proceed from Suren's current scheme (where
> > RCU is only used to protect the tree walk) to using RCU for the
> > entire page fault.
> 
> Thank you for sharing this your outlines.
> Okay, so the planned scheme is:
> 
> 	1. Try to process entire page fault under RCU protection
> 		- if failed, goto 2. if succeeded, goto 4.
> 
> 	2. Fall back to Suren's scheme (try to take VMA rwsem)
> 		- if failed, goto 3. if succeeded, goto 4.

Right.  The question is whether to restart the page fault under Suren's
scheme, or just grab the VMA rwsem and continue.  Experimentation
needed.

It's also worth noting that Michel has an alternative proposal, which
is to drop out of RCU protection before trying to allocate memory, then
re-enter RCU mode and check the sequence count hasn't changed on the
entire MM.  His proposal has the advantage of not trying to allocate
memory while holding the RCU read lock, but the disadvantage of having
to retry the page fault if anyone has called mmap() or munmap().  Which
alternative is better is going to depend on the workload; do we see more
calls to mmap()/munmap(), or do we need to enter page reclaim more often?
I think they're largely equivalent performance-wise in the fast path.
Another metric to consider is code complexity; he thinks his method
is easier to understand and I think mine is easier.  To be expected,
I suppose ;-)

> 	3. Fall back to mmap_lock
> 		- goto 4.
> 
> 	4. Finish page fault.
> 
> To implement 1, __p*d_alloc() need to take gfp flags
> not to sleep in RCU read-side critical section.
> 
> What about introducing PF_MEMALLOC_NOWAIT process flag forcing
> GFP_NOWAIT | __GFP_NOWARN
> 
> similar to PF_MEMALLOC_NO{FS,IO}, looking like this?
> 
> Will be less churn.

Certainly less churn, but also far more risky.  All of a sudden,
codepaths which used to always succeed will now start failing, and
either there aren't checks for memory allocation failures or those
paths have never been tested before.