[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZHfD88N3PhqReu2z@google.com>
Date: Wed, 31 May 2023 15:02:27 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Jon Kohler <jon@...anix.com>, Dave Hansen <dave.hansen@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"x86@...nel.org" <x86@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>,
"Chang S. Bae" <chang.seok.bae@...el.com>,
Kyle Huey <me@...ehuey.com>,
"neelnatu@...gle.com" <neelnatu@...gle.com>,
Andrew Cooper <andrew.cooper3@...rix.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set
On Wed, May 31, 2023, Borislav Petkov wrote:
> On Wed, May 31, 2023 at 08:18:34PM +0000, Sean Christopherson wrote:
> > Assert that the to-be-checked bit passed to cpu_feature_enabled() is a
> > compile-time constant instead of applying the DISABLED_MASK_BIT_SET()
> > logic if and only if the bit is a constant. Conditioning the check on
> > the bit being constant instead of requiring the bit to be constant could
> > result in compiler specific kernel behavior, e.g. running on hardware that
> > supports a disabled feature would return %false if the compiler resolved
> > the bit to a constant, but %true if not.
>
> Uff, more mirroring CPUID inconsistencies.
>
> So *actually*, we should clear all those build-time disabled bits from
> x86_capability so that this doesn't happen.
Heh, I almost suggested that, but there is a non-zero amount of code that wants
to ignore the disabled bits and query the "raw" CPUID information. In quotes
because the kernel still massages x86_capability. Changing that behavior will
require auditing a lot of code, because in most cases any breakage will be mostly
silent, e.g. loss of features/performance and not explosions.
E.g. KVM emulates UMIP when it's not supported in hardware, and so advertises UMIP
support irrespective of hardware/host support. But emulating UMIP is imperfect
and suboptimal (requires intercepting L*DT instructions), so KVM intercepts L*DT
instructions iff UMIP is not supported in hardware, as detected by
boot_cpu_has(X86_FEATURE_UMIP).
The comment for cpu_feature_enabled() even calls out this type of use case:
Use the cpu_has() family if you want true runtime testing of CPU features, like
in hypervisor code where you are supporting a possible guest feature where host
support for it is not relevant.
That said, the behavior of cpu_has() is wildly inconsistent, e.g. LA57 is
indirectly cleared in x86_capability if it's a disabled bit because of this code
in early_identify_cpu().
if (!pgtable_l5_enabled())
setup_clear_cpu_cap(X86_FEATURE_LA57);
KVM works around that by manually doing CPUID to query hardware directly:
/* Set LA57 based on hardware capability. */
if (cpuid_ecx(7) & F(LA57))
kvm_cpu_cap_set(X86_FEATURE_LA57);
So yeah, I 100% agree the current state is messy and would love to have
cpu_feature_enabled() be a pure optimization with respect to boot_cpu_has(), but
it's not as trivial at it looks.
Powered by blists - more mailing lists