linux-kernel - RE: [PATCH 15/19] kvm: x86: Save and restore guest XFD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sun, 12 Dec 2021 01:50:21 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        "Zhong, Yang" <yang.zhong@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>
CC:     "Christopherson,, Sean" <seanjc@...gle.com>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        "jing2.liu@...ux.intel.com" <jing2.liu@...ux.intel.com>,
        "Liu, Jing2" <jing2.liu@...el.com>,
        "Zhong, Yang" <yang.zhong@...el.com>
Subject: RE: [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly

> From: Thomas Gleixner <tglx@...utronix.de>
> Sent: Saturday, December 11, 2021 9:29 PM
> 
> Kevin,
> 
> On Sat, Dec 11 2021 at 03:07, Kevin Tian wrote:
> >> From: Thomas Gleixner <tglx@...utronix.de>
> >> #NM in the guest is slow path, right? So why are you trying to optimize
> >> for it?
> >
> > This is really good information. The current logic is obviously
> > based on the assumption that #NM is frequently triggered.
> 
> More context.
> 
> When an application want's to use AMX, it invokes the prctl() which
> grants permission. If permission is granted then still the kernel FPU
> state buffers are default size and XFD is armed.
> 
> When a thread of that process issues the first AMX (tile) instruction,
> then #NM is raised.
> 
> The #NM handler does:
> 
>     1) Read MSR_XFD_ERR. If 0, goto regular #NM
> 
>     2) Write MSR_XFD_ERR to 0
> 
>     3) Check whether the process has permission granted. If not,
>        raise SIGILL and return.
> 
>     4) Allocate and install a larger FPU state buffer for the task.
>        If allocation fails, raise SIGSEGV and return.
> 
>     5) Disarm XFD for that task
> 
> That means one thread takes at max. one AMX/XFD related #NM during its
> lifetime, which means two VMEXITs.
> 
> If there are other XFD controlled facilities in the future, then it will
> be NR_USED_XFD_CONTROLLED_FACILITIES * 2 VMEXITs per thread which
> uses
> them. Not the end of the world either.
> 
> Looking at the targeted application space it's pretty unlikely that
> tasks which utilize AMX are going to be so short lived that the overhead
> of these VMEXITs really matters.
> 
> This of course can be revisited when there is a sane use case, but
> optimizing for it prematurely does not buy us anything else than
> pointless complexity.

I get all above.

I guess the original open is also about the frequency of #NM not due 
to XFD. For Linux guest looks it's not a problem since CR0.TS is not set 
now when math emulation is not required:

DEFINE_IDTENTRY(exc_device_not_available)
{
	...
	/* This should not happen. */
	if (WARN(cr0 & X86_CR0_TS, "CR0.TS was set")) {
		/* Try to fix it up and carry on. */
		write_cr0(cr0 & ~X86_CR0_TS);
	} else {
		/*
		 * Something terrible happened, and we're better off trying
		 * to kill the task than getting stuck in a never-ending
		 * loop of #NM faults.
		 */
		die("unexpected #NM exception", regs, 0);
	}
}

It may affect guest which still uses CR0.TS to do lazy save. But likely
modern OSes all move to eager save approach so always trapping #NM
should be fine.

Is this understanding correct?

Thanks
Kevin