[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <6C548A9A-3AF3-4EC1-B1E5-47A7FFBEB761@gmail.com>
Date: Thu, 27 Oct 2022 13:15:22 -0700
From: Nadav Amit <nadav.amit@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Jann Horn <jannh@...gle.com>,
John Hubbard <jhubbard@...dia.com>, x86@...nel.org,
willy@...radead.org, Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Andrea Arcangeli <aarcange@...hat.com>,
kirill.shutemov@...ux.intel.com, jroedel@...e.de, ubizjak@...il.com
Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment
On Oct 27, 2022, at 11:13 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> Anybody willing to try to write up the rules (and have each rule
> document *why* it's a rule - not just "by fiat", but an actual "these
> are the rules and this is *why* they are the rules").
>
> Because right now I think all of our rules are almost entirely just
> encoded in the code, with a couple of comments, and a few people who
> just remember why we do what we do.
I think it might be easier to come up with new rules instead of phrasing the
existing ones.
The approach I suggested before [1] is something like:
1. Turn x86’s TLB-generation mechanism to be generic. Turn the
TLB-generation into “pending TLB-generation”.
2. For each mm track “completed TLB-generation”, whenever an actual flush
takes place.
3. When you defer a TLB-flush, while holding the PTL:
a. Increase the TLB-generation.
b. Save the updated “table generation" in a new field in the
page-table’s page-struct.
4. When you are about to rely on a PTE value that is read from a page-table,
first check if a TLB flush is needed. The check is performed by comparing
the “table generation” with the “completed generation”. If the “table
generation” is behind, a TLB flush is needed.
[ You rely on the PTE value when you install new PTEs or change them ]
That’s about it. I might have not covered some issues with fast-GUP. But in
general I think it is a simple scheme. The thing I like about this scheme
the most is that it avoids relying on almost all the OS data-structures
(e.g., PageAnon()), making it much easier to grasp.
I can revive the patch-set if the overall approach is agreeable.
[1] https://lore.kernel.org/lkml/20210131001132.3368247-1-namit@vmware.com/
Powered by blists - more mailing lists