[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXGwYXXpjPgDwjKMEZJkuGJ8ZuCpMpc7fTvo58PNtu-czA@mail.gmail.com>
Date: Tue, 6 May 2025 19:25:18 +0200
From: Ard Biesheuvel <ardb@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Ard Biesheuvel <ardb+git@...gle.com>, linux-kernel@...r.kernel.org, x86@...nel.org,
Ingo Molnar <mingo@...nel.org>
Subject: Re: [RFC PATCH 3/3] x86/boot: Use alternatives based selector for
5-level paging constants
On Tue, 6 May 2025 at 19:03, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Tue, 6 May 2025 at 09:35, Ard Biesheuvel <ardb@...nel.org> wrote:
> >
> > I think the first two patches are important, though, as they are about
> > robustness/consistency rather than optimization.
>
> That first patch already has another copy of that insane inline asm
> optimization.
>
> Let's fix this properly, not this disgusting way.
>
> Let's get rid of that USE_EARLY_PGTABLE_L5 crazy case entirely, and
> fix the few places where it is currently used.
>
> That code is bogus *anyway*, because your argument for these patches
> is "one single truth", but the fact is, there's at least *SIX*
> different values that get set depending on this value: pgdir_shift,
> ptrs_per_p4d, vmalloc_base, etc.
Yes, there are many. But that is not what I meant by source of truth.
Currently, we are conflating two things
- whether the CPU has the LA57 capability
- whether the CPU is using the LA57 capability to support 5-level paging.
The CPU feature gets set too late to be useful, and then needs to be
cleared again if we happen to run with 4 levels of paging.
> All of those change depending on
> whether we do 5-level page tables or not, so the whole argument that
> "pgtable_l5_enabled() is special" is just wrong to begin with.
>
In my original patch, which is the one Ingo objected to,
pgtable_l5_enabled() is unambiguously based on whether CR4.LA57 is
set.
> Any code that makes pgtable_l5_enabled() will fundamentally then just
> have *another* inconsistency, namely the inconsistency between that
> "is_enabled" and all the other values that L5 paging actually
> modifies.
>
Not sure I follow you here.
> So I don't think your patches even fix anything. They only paper over
> one very particular issue.
>
> For example, as far as I can tell, the only real reason for it in
> arch/x86/kernel/cpu/common.c is *one* single use where we do that
>
> if (!pgtable_l5_enabled())
> setup_clear_cpu_cap(X86_FEATURE_LA57);
>
> in early_identify_cpu().
>
Yes, this is needed because we conflate the CPU capability with
whether or not we are running with 5 levels of paging.
> And then we have __early_make_pgtable(), but that's the SAME FILE that
> has all the magical special __pgtable_l5_enabled logic anyway. So that
> damn well could just write out the actual real logic, instead of using
> that "is L5 enabled" helper function THAT IT IS ITSELF INITIALIZING.
>
> So I reall ythink this whole issue goes much deeper, and is much more
> broken than your patches imply. And I think your patches in many ways
> make it *worse*, because they may make that pgtable_l5_enabled() be
> set up early, but that just hides all the *other* issues that aren't.,
>
> As a very real example of that, just look at what happens in
> arch/x86/mm/mem_encrypt_identity.c
>
> It does all that page table setup, but if __pgtable_l5_enabled hassn't
> been set up yet, then all those *othger* values that go with it also
> haven't been set up yet, so now it uses the *wrong* value for
> ptrs_per_p4d etc.
>
That code runs after __startup_64() so it should observe the correct
value. But it's flaky as hell, which is why I am trying to fix this.
I'd much rather get rid of those variables entirely, which is what I
proposed initially.
> And I just checked: that code very much does use ptrs_per_p4d,
> although it's hidden by
>
> memset(p4d, 0, sizeof(*p4d) * PTRS_PER_P4D);
>
> so I really think this is all papering over things. That code CANNOT
> WORK before __pgtable_l5_enabled hass been initialized in its
> *current* location.
>
> IOW, I think your patches only make things *less* consistent, not more.
>
In the light of the above, care to comment on the previous approach?
https://lore.kernel.org/all/20250504095230.2932860-28-ardb+git@google.com/
That also uses the ALTERNATIVE_TERNARY() so the CR4 access gets
patched away, and I'm happy to take suggestions how to improve that.
But it replaces those global variables with expressions that are
directly derived from CR4.LA57, which I think is likely the most
robust approach.
I'd still like to split off the LA57 CPU capability from the 5-level
paging state, but that needs careful consideration of the existing
users: for instance, KVM might need the capability in some cases, but
the current state in others. There is also some IOMMU driver code that
refers to the feature.
Powered by blists - more mailing lists