[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wh8oi0qQtYDFTfm7d1s5C8mG7ig=NfzGWt4zbjXMzcdqQ@mail.gmail.com>
Date: Thu, 27 Oct 2022 13:31:22 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Nadav Amit <nadav.amit@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Jann Horn <jannh@...gle.com>,
John Hubbard <jhubbard@...dia.com>, x86@...nel.org,
willy@...radead.org, Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Andrea Arcangeli <aarcange@...hat.com>,
kirill.shutemov@...ux.intel.com, jroedel@...e.de, ubizjak@...il.com
Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment
On Thu, Oct 27, 2022 at 1:15 PM Nadav Amit <nadav.amit@...il.com> wrote:
>
> I think it might be easier to come up with new rules instead of phrasing the
> existing ones.
I'm ok with that, but I think you are missing a very important issue:
all the cases where we can short-circuit TLB invalidations *entirely*.
You don't mention those at all.
Those optimizations are *very* important. Process exit is one of the
most performance-critical pieces of code in the kernel on some loads,
because a lot of traditional unix loads have a *ton* of small
fork/exec/exit sequences, and the whole "do just one TLB flush" was at
least historically quite a big deal.
So one very big issue here is when zap_page_tables() can end up
skipping TLB flushes entirely, because nobody cares.
And no, the fix is not to turn it into some "just increment a
generation number".
We want to avoid *even that* cost for the whole "we don't actually
need a TLB flush at all, until we actually free the pages".
So there are two levels of tlb flush optimizations
(a) avoiding them entirely in the first place
(b) the whole "once you have to flush, keep track of lazy modes and
TLB generations, and flush ranges"
And honestly, I think you ignored (a), and that's where we do exactly
those kinds of "this case doesn't need to flush AT ALL" things.
So when you say
> The thing I like about this scheme
> the most is that it avoids relying on almost all the OS data-structures
> (e.g., PageAnon()), making it much easier to grasp.
I think it's because you've ignored a big part of the whole issue.
Linus
Powered by blists - more mailing lists