[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 12 Dec 2021 01:50:21 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>,
"Zhong, Yang" <yang.zhong@...el.com>,
"x86@...nel.org" <x86@...nel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>
CC: "Christopherson,, Sean" <seanjc@...gle.com>,
"Nakajima, Jun" <jun.nakajima@...el.com>,
"jing2.liu@...ux.intel.com" <jing2.liu@...ux.intel.com>,
"Liu, Jing2" <jing2.liu@...el.com>,
"Zhong, Yang" <yang.zhong@...el.com>
Subject: RE: [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly
> From: Thomas Gleixner <tglx@...utronix.de>
> Sent: Saturday, December 11, 2021 9:29 PM
>
> Kevin,
>
> On Sat, Dec 11 2021 at 03:07, Kevin Tian wrote:
> >> From: Thomas Gleixner <tglx@...utronix.de>
> >> #NM in the guest is slow path, right? So why are you trying to optimize
> >> for it?
> >
> > This is really good information. The current logic is obviously
> > based on the assumption that #NM is frequently triggered.
>
> More context.
>
> When an application want's to use AMX, it invokes the prctl() which
> grants permission. If permission is granted then still the kernel FPU
> state buffers are default size and XFD is armed.
>
> When a thread of that process issues the first AMX (tile) instruction,
> then #NM is raised.
>
> The #NM handler does:
>
> 1) Read MSR_XFD_ERR. If 0, goto regular #NM
>
> 2) Write MSR_XFD_ERR to 0
>
> 3) Check whether the process has permission granted. If not,
> raise SIGILL and return.
>
> 4) Allocate and install a larger FPU state buffer for the task.
> If allocation fails, raise SIGSEGV and return.
>
> 5) Disarm XFD for that task
>
> That means one thread takes at max. one AMX/XFD related #NM during its
> lifetime, which means two VMEXITs.
>
> If there are other XFD controlled facilities in the future, then it will
> be NR_USED_XFD_CONTROLLED_FACILITIES * 2 VMEXITs per thread which
> uses
> them. Not the end of the world either.
>
> Looking at the targeted application space it's pretty unlikely that
> tasks which utilize AMX are going to be so short lived that the overhead
> of these VMEXITs really matters.
>
> This of course can be revisited when there is a sane use case, but
> optimizing for it prematurely does not buy us anything else than
> pointless complexity.
I get all above.
I guess the original open is also about the frequency of #NM not due
to XFD. For Linux guest looks it's not a problem since CR0.TS is not set
now when math emulation is not required:
DEFINE_IDTENTRY(exc_device_not_available)
{
...
/* This should not happen. */
if (WARN(cr0 & X86_CR0_TS, "CR0.TS was set")) {
/* Try to fix it up and carry on. */
write_cr0(cr0 & ~X86_CR0_TS);
} else {
/*
* Something terrible happened, and we're better off trying
* to kill the task than getting stuck in a never-ending
* loop of #NM faults.
*/
die("unexpected #NM exception", regs, 0);
}
}
It may affect guest which still uses CR0.TS to do lazy save. But likely
modern OSes all move to eager save approach so always trapping #NM
should be fine.
Is this understanding correct?
Thanks
Kevin
Powered by blists - more mailing lists