[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YfiY9zRm8BhSp7eA@casper.infradead.org>
Date: Tue, 1 Feb 2022 02:20:39 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Michel Lespinasse <michel@...pinasse.org>,
Linux-MM <linux-mm@...ck.org>, linux-kernel@...r.kernel.org,
kernel-team@...com, Laurent Dufour <ldufour@...ux.ibm.com>,
Jerome Glisse <jglisse@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Michal Hocko <mhocko@...e.com>,
Vlastimil Babka <vbabka@...e.cz>,
Davidlohr Bueso <dave@...olabs.net>,
Liam Howlett <liam.howlett@...cle.com>,
Rik van Riel <riel@...riel.com>,
Paul McKenney <paulmck@...nel.org>,
Song Liu <songliubraving@...com>,
Suren Baghdasaryan <surenb@...gle.com>,
Minchan Kim <minchan@...gle.com>,
Joel Fernandes <joelaf@...gle.com>,
David Rientjes <rientjes@...gle.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Andy Lutomirski <luto@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH v2 00/35] Speculative page faults
On Mon, Jan 31, 2022 at 05:14:34PM -0800, Andrew Morton wrote:
> On Fri, 28 Jan 2022 05:09:31 -0800 Michel Lespinasse <michel@...pinasse.org> wrote:
> > The first step of a speculative page fault is to look up the vma and
> > read its contents (currently by making a copy of the vma, though in
> > principle it would be sufficient to only read the vma attributes that
> > are used in page faults). The mmap sequence count is used to verify
> > that there were no mmap writers concurrent to the lookup and copy steps.
> > Note that walking rbtrees while there may potentially be concurrent
> > writers is not an entirely new idea in linux, as latched rbtrees
> > are already doing this. This is safe as long as the lookup is
> > followed by a sequence check to verify that concurrency did not
> > actually occur (and abort the speculative fault if it did).
>
> I'm surprised that descending the rbtree locklessly doesn't flat-out
> oops the kernel. How are we assured that every pointer which is
> encountered actually points at the right thing? Against things
> which tear that tree down?
It doesn't necessarily point at the _right_ thing. You may get
entirely the wrong node in the tree if you race with a modification,
but, as Michel says, you check the seqcount before you even look at
the VMA (and if the seqcount indicates a modification, you throw away
the result and fall back to the locked version). The rbtree always
points to other rbtree nodes, so you aren't going to walk into some
completely wrong data structure.
> > The next step is to walk down the existing page table tree to find the
> > current pte entry. This is done with interrupts disabled to avoid
> > races with munmap().
>
> Sebastian, could you please comment on this from the CONFIG_PREEMPT_RT
> point of view?
I am not a fan of this approach. For other reasons, I think we want to
switch to RCU-freed page tables, and then we can walk the page tables
with the RCU lock held. Some architectures already RCU-free the page
tables, so I think it's just a matter of converting the rest.
Powered by blists - more mailing lists