[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <A2252098-2EE1-4F8D-A1A2-A49EE8FD18A7@gmail.com>
Date: Fri, 11 Mar 2022 12:53:30 -0800
From: Nadav Amit <nadav.amit@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Linux-MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Cooper <andrew.cooper3@...rix.com>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Peter Xu <peterx@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
Nick Piggin <npiggin@...il.com>, x86@...nel.org
Subject: Re: [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on
change_huge_pmd()
> On Mar 11, 2022, at 12:41 PM, Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 3/11/22 11:07, Nadav Amit wrote:
>> From: Nadav Amit <namit@...are.com>
>>
>> Calls to change_protection_range() on THP can trigger, at least on x86,
>> two TLB flushes for one page: one immediately, when pmdp_invalidate() is
>> called by change_huge_pmd(), and then another one later (that can be
>> batched) when change_protection_range() finishes.
>>
>> The first TLB flush is only necessary to prevent the dirty bit (and with
>> a lesser importance the access bit) from changing while the PTE is
>> modified. However, this is not necessary as the x86 CPUs set the
>> dirty-bit atomically with an additional check that the PTE is (still)
>> present. One caveat is Intel's Knights Landing that has a bug and does
>> not do so.
>
> First of all, thank you for your diligence here. This is a super
> obscure issue. I think I put handling for it in the kernel and I'm not
> sure I would have even thought about this angle.
>
> That said, I'm not sure this is all necessary.
>
> Yes, the Dirty bit can get set unexpectedly in some PTEs. But, the
> question is whether it is *VALUABLE* and needs to be preserved. The
> current kernel code pretty much just lets the hardware set the Dirty bit
> and then ignores it. If it were valuable, ignoring it would have been a
> bad thing. We'd be losing data on today's kernels because the hardware
> told us about a write that happened but that the kernel ignored.
>
> My mental model of what the microcode responsible for the erratum does
> is something along these lines:
>
> if (write)
> pte |= _PAGE_DIRTY;
> if (!pte_present(pte))
> #PF
>
> The PTE is marked dirty, but the write never actually executes. The
> thread that triggered the A/D setting *also* gets a fault.
>
This makes perfect sense. I guess I misunderstood or forgot the erratum.
But feel free to recheck. It would allow to remove the KNL check, and
probably the first patch in this series. But I don’t think it would
allow to get rid of pmdp_invalidate_ad() since I do not fell comfortable
just to use pmdp_establish() directly: I do not know about other
architectures well enough to say that they have the same atomicity
guarantees when it comes to A/D bits.
> I'll double-check with some Intel folks to make sure I'm not missing
> something. But, either way, I don't think we should be going to this
> much trouble for the good ol' Xeon Phi. I doubt there are many still
> around and I *REALLY* doubt they're running new kernels.
>
> *If* we need this (and I'm not convinced we do), my first instinct would
> be to just do this instead:
>
> clear_cpu_cap(c, X86_FEATURE_PSE);
>
> on KNL systems. If anyone cares, they know where to find us.
I think that it is not necessary and your understanding of the erratum
is the right one. Let me know if you find it is not the case.
BTW: Thanks for the quick response, and sorry for the time it took me
to send v3.
Powered by blists - more mailing lists