linux-kernel - Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3997787e-402d-4b2b-0f90-4a672c77703f@redhat.com>
Date:   Thu, 14 Oct 2021 17:01:07 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        "Liu, Jing2" <jing2.liu@...el.com>,
        LKML <linux-kernel@...r.kernel.org>
Cc:     "x86@...nel.org" <x86@...nel.org>,
        "Bae, Chang Seok" <chang.seok.bae@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        Jing Liu <jing2.liu@...ux.intel.com>,
        "seanjc@...gle.com" <seanjc@...gle.com>,
        "Cooper, Andrew" <andrew.cooper3@...rix.com>
Subject: Re: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core

On 14/10/21 16:09, Thomas Gleixner wrote:
> On Thu, Oct 14 2021 at 11:01, Paolo Bonzini wrote:
>> On 14/10/21 10:02, Liu, Jing2 wrote:
>> Based on the input from Andy and Thomas, the new way would be like this:
>>
>> 1) host_fpu must always be checked for reallocation in
>> kvm_load_guest_fpu (or in the FPU functions that it calls, that depends
>> on the rest of Thomas's patches).  That's because arch_prctl can enable
>> AMX for QEMU at any point after KVM_CREATE_VCPU.
> 
> No.
> 
>     1) QEMU starts
>     2) QEMU requests permissions via prctl()
>     3) QEMU creates vCPU threads
> 
> Doing it the other way around makes no sense at all and wont work.

Sure, but KVM needs to do something that makes sense even for userspaces 
that are not QEMU.

For example, there could be a program that uses AMX *itself* and does 
not expose it to the guest.  In that case, the arch_prctl can come at 
the point AMX is needed, which can be after the program creates vCPU 
threads.  That's for host_fpu.

For the guest_fpu, I agree that the arch_prctl must come before creating 
vCPUs.

>> 2) every use of vcpu->arch.guest_supported_xcr0 is changed to only
>> include those dynamic-feature bits that were enabled via arch_prctl.
>> That is, something like:
>>
>> static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu)
>> {
>> 	return vcpu->arch.guest_supported_xcr0 &
>> 		(~xfeatures_mask_user_dynamic | \
>> 		 current->thread.fpu.dynamic_state_perm);
> 
> Bah. You can't get enough from poking in internals, right?
> 
> vcpu_create()
> 
>    fpu_init_fpstate_user(guest_fpu, supported_xcr0)
> 
> That will (it does not today) do:
> 
>       guest_fpu::__state_perm = supported_xcr0 & xstate_get_group_perm();
> 
> The you have the information you need right in the guest FPU.

Good, I wasn't aware of the APIs that will be there.

>> int kvm_alloc_fpu_dynamic_features(struct kvm_vcpu *vcpu)
>> {
>> 	u64 allowed_dynamic = current->thread.fpu.dynamic_state_perm;
> 
> That's not a valid assumption.
> 
>> 	u64 enabled_dynamic =
>> 		vcpu->arch.xcr0 & xfeatures_mask_user_dynamic;
>>
>> 	/* All dynamic features have to be arch_prctl'd first.  */
>> 	WARN_ON_ONCE(enabled_dynamic & ~allowed_dynamic);
>>
>> 	if (!vcpu->arch.xfd_passthrough) {
>> 		/* All dynamic states will #NM?  Wait and see.  */
>> 		if ((enabled_dynamic & ~vcpu->arch.xfd) == 0)
>> 			return 0;
>>
>> 		kvm_x86_ops.enable_xfd_passthrough(vcpu);
>> 	}
>>
>> 	/* current->thread.fpu was already handled by arch_prctl.  */
> 
> No. current->thread.fpu has the default buffer unless QEMU used AMX or
> something forced it to allocate a larger buffer.
> 
>> 	return fpu_alloc_features(vcpu->guest_fpu,
>> 		vcpu->guest_fpu.dynamic_state_perm | enabled_dynamic);
> 
> This unconditionally calls into that allocation for every XCR0/XFD
> trap ?

Calls into the function, but doesn't necessarily allocate anything. 
What you wrote below looks correct to me, thanks.

Paolo

> Also you really should not wait until _all_ dynamic states are cleared
> in guest XFD.  Because a guest which has bit 18 and 19 available but only > uses one of them is going to trap on every other context switch due to
> XFD writes.