linux-kernel - RE: [PATCH v3 22/22] kvm: x86: Disable interception for IA32

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN9PR11MB52767789D59239DF5DD524758C469@BN9PR11MB5276.namprd11.prod.outlook.com>
Date:   Fri, 31 Dec 2021 09:42:58 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     "Christopherson,, Sean" <seanjc@...gle.com>
CC:     "Liu, Jing2" <jing2.liu@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "corbet@....net" <corbet@....net>,
        "shuah@...nel.org" <shuah@...nel.org>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        "jing2.liu@...ux.intel.com" <jing2.liu@...ux.intel.com>,
        "Zeng, Guang" <guang.zeng@...el.com>,
        "Wang, Wei W" <wei.w.wang@...el.com>,
        "Zhong, Yang" <yang.zhong@...el.com>
Subject: RE: [PATCH v3 22/22] kvm: x86: Disable interception for IA32_XFD on
 demand

> From: Tian, Kevin
> Sent: Thursday, December 30, 2021 3:05 PM
> 
> the new change is like below.
> 
> static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
>  {
> 	/*
> 	 * Save xfd_err to guest_fpu before interrupt is enabled, so the
> 	 * guest value is not clobbered by the host activity before the guest
> 	 * has chance to consume it.
> 	 *
> 	 * Since trapping #NM is started when xfd write interception is
> 	 * disabled, using this flag to guard the saving operation. This
> 	 * implies no-op for a non-xfd #NM due to L1 interception.
> 	 *
> 	 * Queuing exception is done in vmx_handle_exit.
> 	 */
> 	if (vcpu->arch.xfd_no_write_intercept)
> 		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
> }
> 
> in the final series it will first check vcpu->arch.guest_fpu.fpstate->xfd
> before the disable interception patch is applied and then becomes
> the above form, similar to your suggestion on
> vmx_update_exception_bitmap().
> 
> whether to check msr_bitmap vs. an extra flag is an orthogonal open.
> 
> Then:
> 
> handle_exception_nmi(struct kvm_vcpu *vcpu)
> {
> 	...
> 	if (is_machine_check(intr_info) || is_nmi(intr_info))
> 		return 1; /* handled by handle_exception_nmi_irqoff() */
> 
> 	/*
> 	 * Queue the exception here instead of in handle_nm_fault_irqoff().
> 	 * This ensures the nested_vmx check is not skipped so vmexit can
> 	 * be reflected to L1 (when it intercepts #NM) before reaching this
> 	 * point.
> 	 */
> 	if (is_nm_fault(intr_info)) {
> 		kvm_queue_exception(vcpu, NM_VECTOR);
> 		return 1;
> 	}
> 
> 	...
> }
> 
> Then regarding to test non-AMX nested #NM usage, it might be difficult
> to trigger it from modern OS. As commented by Linux #NM handler, it's
> expected only for XFD or math emulation when fpu is missing. So we plan
> to run a selftest in L1 which sets CR0.TS and then touch fpu registers. and
> for L1 kernel we will run two binaries with one trapping #NM and the other
> not.
> 

We have verified this scenario and didn't find problem.

Basically the selftest is like below:

	guest_code()
	{
		cr0 = read_cr0();
		cr0 |= X86_CR0_TS;
		write_cr0(cr0);

		asm volatile("fnop");
	}

	guest_nm_handler()
	{
		cr0 = read_cr0();
		cr0 &= ~X86_CR0_TS;
		write_cr0(cr0);
	}

We run the selftest in L1 to create a nested scenario.

When L1 intercepts #NM:

	(L2) fnop
	(L0) #NM vmexit
	(L0) reflect a virtual vmexit (reason #NM) to L1
	(L1) #NM vmexit
	(L1) queue #NM exception to L2
	(L2) guest_nm_handler()
	(L2) fnop (succeed)

When L1 doesn't intercept #NM:
	(L2) fnop
	(L0) #NM vmexit
	(L0) queue #NM exception to L2
	(L2) guest_nm_handler()
	(L2) fnop (succeed)

Please suggest if any more test is necessary here.

Thanks
Kevin