linux-kernel - Re: [PATCH 01/13] mm: Update ptep_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <6C548A9A-3AF3-4EC1-B1E5-47A7FFBEB761@gmail.com>
Date:   Thu, 27 Oct 2022 13:15:22 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Jann Horn <jannh@...gle.com>,
        John Hubbard <jhubbard@...dia.com>, x86@...nel.org,
        willy@...radead.org, Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Andrea Arcangeli <aarcange@...hat.com>,
        kirill.shutemov@...ux.intel.com, jroedel@...e.de, ubizjak@...il.com
Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

On Oct 27, 2022, at 11:13 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> Anybody willing to try to write up the rules (and have each rule
> document *why* it's a rule - not just "by fiat", but an actual "these
> are the rules and this is *why* they are the rules").
> 
> Because right now I think all of our rules are almost entirely just
> encoded in the code, with a couple of comments, and a few people who
> just remember why we do what we do.

I think it might be easier to come up with new rules instead of phrasing the
existing ones.

The approach I suggested before [1] is something like:

1. Turn x86’s TLB-generation mechanism to be generic. Turn the
   TLB-generation into “pending TLB-generation”.

2. For each mm track “completed TLB-generation”, whenever an actual flush
   takes place.

3. When you defer a TLB-flush, while holding the PTL:
  a. Increase the TLB-generation.
  b. Save the updated “table generation" in a new field in the
     page-table’s page-struct.

4. When you are about to rely on a PTE value that is read from a page-table,
   first check if a TLB flush is needed. The check is performed by comparing
   the “table generation” with the “completed generation”. If the “table
   generation” is behind, a TLB flush is needed.

   [ You rely on the PTE value when you install new PTEs or change them ]

That’s about it. I might have not covered some issues with fast-GUP. But in
general I think it is a simple scheme. The thing I like about this scheme
the most is that it avoids relying on almost all the OS data-structures
(e.g., PageAnon()), making it much easier to grasp.

I can revive the patch-set if the overall approach is agreeable.

[1] https://lore.kernel.org/lkml/20210131001132.3368247-1-namit@vmware.com/