linux-kernel - Re: [PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZHfD88N3PhqReu2z@google.com>
Date:   Wed, 31 May 2023 15:02:27 -0700
From:   Sean Christopherson <seanjc@...gle.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Jon Kohler <jon@...anix.com>, Dave Hansen <dave.hansen@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        "Chang S. Bae" <chang.seok.bae@...el.com>,
        Kyle Huey <me@...ehuey.com>,
        "neelnatu@...gle.com" <neelnatu@...gle.com>,
        Andrew Cooper <andrew.cooper3@...rix.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set

On Wed, May 31, 2023, Borislav Petkov wrote:
> On Wed, May 31, 2023 at 08:18:34PM +0000, Sean Christopherson wrote:
> > Assert that the to-be-checked bit passed to cpu_feature_enabled() is a
> > compile-time constant instead of applying the DISABLED_MASK_BIT_SET()
> > logic if and only if the bit is a constant.  Conditioning the check on
> > the bit being constant instead of requiring the bit to be constant could
> > result in compiler specific kernel behavior, e.g. running on hardware that
> > supports a disabled feature would return %false if the compiler resolved
> > the bit to a constant, but %true if not.
> 
> Uff, more mirroring CPUID inconsistencies.
> 
> So *actually*, we should clear all those build-time disabled bits from
> x86_capability so that this doesn't happen.

Heh, I almost suggested that, but there is a non-zero amount of code that wants
to ignore the disabled bits and query the "raw" CPUID information.  In quotes
because the kernel still massages x86_capability.  Changing that behavior will
require auditing a lot of code, because in most cases any breakage will be mostly
silent, e.g. loss of features/performance and not explosions.

E.g. KVM emulates UMIP when it's not supported in hardware, and so advertises UMIP
support irrespective of hardware/host support.  But emulating UMIP is imperfect
and suboptimal (requires intercepting L*DT instructions), so KVM intercepts L*DT
instructions iff UMIP is not supported in hardware, as detected by
boot_cpu_has(X86_FEATURE_UMIP).

The comment for cpu_feature_enabled() even calls out this type of use case:

  Use the cpu_has() family if you want true runtime testing of CPU features, like
  in hypervisor code where you are supporting a possible guest feature where host
  support for it is not relevant.

That said, the behavior of cpu_has() is wildly inconsistent, e.g. LA57 is
indirectly cleared in x86_capability if it's a disabled bit because of this code
in early_identify_cpu().

	if (!pgtable_l5_enabled())
		setup_clear_cpu_cap(X86_FEATURE_LA57);

KVM works around that by manually doing CPUID to query hardware directly:

	/* Set LA57 based on hardware capability. */
	if (cpuid_ecx(7) & F(LA57))
		kvm_cpu_cap_set(X86_FEATURE_LA57);

So yeah, I 100% agree the current state is messy and would love to have
cpu_feature_enabled() be a pure optimization with respect to boot_cpu_has(), but
it's not as trivial at it looks.