linux-kernel - Re: [PATCH RFC 3/7] kvm: x86: XSAVE state and XFD MSRs context switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8cc8ab19-7438-8521-9bc3-d3f6d6e0d5c4@linux.intel.com>
Date:   Mon, 22 Feb 2021 16:36:10 +0800
From:   "Liu, Jing2" <jing2.liu@...ux.intel.com>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>
Cc:     kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        jing2.liu@...el.com, "x86@...nel.org" <x86@...nel.org>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH RFC 3/7] kvm: x86: XSAVE state and XFD MSRs context switch

On 2/9/2021 2:12 AM, Paolo Bonzini wrote:
> On 08/02/21 19:04, Sean Christopherson wrote:
>>> That said, the case where we saw MSR autoload as faster involved 
>>> EFER, and
>>> we decided that it was due to TLB flushes (commit f6577a5fa15d, 
>>> "x86, kvm,
>>> vmx: Always use LOAD_IA32_EFER if available", 2014-11-12). Do you 
>>> know if
>>> RDMSR/WRMSR is always slower than MSR autoload?
>> RDMSR/WRMSR may be marginally slower, but only because the autoload 
>> stuff avoids
>> serializing the pipeline after every MSR.
>
> That's probably adding up quickly...
>
>> The autoload paths are effectively
>> just wrappers around the WRMSR ucode, plus some extra VM-Enter 
>> specific checks,
>> as ucode needs to perform all the normal fault checks on the index 
>> and value.
>> On the flip side, if the load lists are dynamically constructed, I 
>> suspect the
>> code overhead of walking the lists negates any advantages of the load 
>> lists.
>
> ... but yeah this is not very encouraging.
Thanks for reviewing the patches.

> Context switch time is a problem for XFD.  In a VM that uses AMX, most 
> threads in the guest will have nonzero XFD but the vCPU thread itself 
> will have zero XFD.  So as soon as one thread in the VM forces the 
> vCPU thread to clear XFD, you pay a price on all vmexits and vmentries.
>

Spec says,
"If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i,
the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1;
instead, it saves bit i of XSTATE_BV field of the XSAVE header as 0
(indicating that the state component is in its initialized state).
With the exception of XSAVE, no data is saved for the state
component (XSAVE saves the initial value of the state component..."

Thus, the key point is not losing the non initial AMX state on vmexit
and vmenter. If AMX state is in initialized state, it doesn't matter.
Otherwise, XFD[i] should not be armed with a nonzero value.

If we don't want to extremely set XFD=0 every time on vmexit, it would
be useful to first detect if guest AMX state is initial or not.
How about using XINUSE notation here?
(Details in SDM vol1 13.6 PROCESSOR TRACKING OF
XSAVE-MANAGED STATE, and vol2 XRSTOR/XRSTORS instruction operation part)
The main idea is processor tracks the status of various state components
by XINUSE, and it shows if the state component is in use or not.
When XINUSE[i]=0, state component i is in initial configuration.
Otherwise, kvm should take care of XFD on vmexit.

> However, running the host with _more_ bits set than necessary in XFD 
> should not be a problem as long as the host doesn't use the AMX 
> instructions. 
Does "running the host" mean running in kvm? why need more bits 
(host_XFD|guest_XFD),
I'm trying to think about the case that guest_XFD is not enough? e.g.
In guest, it only need bit i when guest supports it and guest uses
the passthru XFD[i] for detecting dynamic usage;
In kvm, kvm doesn't use AMX instructions; and "system software should not
use XFD to implement a 'lazy restore' approach to management of the 
XTILEDATA
state component."
Out of kvm, kernel ensures setting correct XFD for threads when scheduling;

Thanks,
Jing

> So perhaps Jing can look into keeping XFD=0 for as little time as 
> possible, and XFD=host_XFD|guest_XFD as much as possible.
>
> Paolo
>