linux-kernel - Re: [PATCH 01/13] mm: Update ptep_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <F9E42822-DA1D-4192-8410-3BAE42E9E4A9@gmail.com>
Date:   Thu, 27 Oct 2022 14:44:46 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Jann Horn <jannh@...gle.com>,
        John Hubbard <jhubbard@...dia.com>, x86@...nel.org,
        willy@...radead.org, Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Andrea Arcangeli <aarcange@...hat.com>,
        kirill.shutemov@...ux.intel.com, jroedel@...e.de, ubizjak@...il.com
Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

On Oct 27, 2022, at 1:31 PM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> So there are two levels of tlb flush optimizations
> 
> (a) avoiding them entirely in the first place
> 
> (b) the whole "once you have to flush, keep track of lazy modes and
> TLB generations, and flush ranges"
> 
> And honestly, I think you ignored (a), and that's where we do exactly
> those kinds of "this case doesn't need to flush AT ALL" things.

I did try to avoid TLB flushes by introducing pte_needs_flush() and avoiding
flushes based on the architectural PTE changes. There are even more x86
arch-based opportunities to further avoid TLB flushes (and then only flush
the TLB if spurious #PF occurs).

Personally, I still think that making decisions on flushes based on (mostly)
only the arch state makes the code more robust against misuse (e.g., see
various confusions between mmu_gather’s fullmm and need_flush_all).

Having said that, I will follow your feedback that the extra complexity
worth the extra performance.

Anyhow, admittedly, I need to give it more thought. For instance, in respect
to the code that you mentioned (in zap_pte_range), after reading it again,
it seems strange: how is ok to defer the TLB flush after the rmap was
removed, even if it is done while the PTL is held.
folio_clear_dirty_for_io() would not sync on the PTL afterwards, so the page
might be later re-dirtied using a stale cached PTE. Oh well.