linux-kernel - Re: Question about AMD SVM's virtual NMI support in Linux kernel mainline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cdad0a8b-0994-d2f3-7c68-e632ce4facf7@amd.com>
Date:   Fri, 25 Aug 2023 14:36:17 +0530
From:   Santosh Shukla <santosh.shukla@....com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     陈培鸿(乘鸿) 
        <chenpeihong.cph@...baba-inc.com>, mlevitsk <mlevitsk@...hat.com>,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Question about AMD SVM's virtual NMI support in Linux kernel
 mainline

Hi Sean,


On 8/24/2023 7:52 PM, Sean Christopherson wrote:
> +kvm and lkml, didn't realize they weren't Cc'd in the original mail.
> 
> On Thu, Aug 24, 2023, Santosh Shukla wrote:
>> Hi Sean,
>>
>> On 8/23/2023 8:33 PM, Sean Christopherson wrote:
>>> On Fri, Aug 18, 2023, 陈培鸿(乘鸿) wrote:
>>>> According to the results, I found that in the case of concurrent NMIs, some
>>>> NMIs are still injected through eventinj instead of vNMI.
>>>
>>> Key word is "some", having two NMIs arrive "simultaneously" is uncommon.  In quotes
>>> because if a vCPU is scheduled out or otherwise delayed, two NMIs can be recognized
>>> by KVM at the same time even if there was a sizeable delay between when they were
>>> sent.
>>>
>>>> Based on the above explanations, I summarize my questions as follows:
>>>> 1. According to the specification of AMD SVM vNMI, with vNMI enabled, will
>>>> some NMIs be injected through eventinj under the condition of concurrent
>>>> NMIs?
>>>
>>> Yes.
>>>
>>>> 2. If yes, what is the role of vNMI? Is it just an addition to eventinj? What
>>>> benefits is it designed to expect? Is there any benchmark data support?
>>>
>>> Manually injecting NMIs isn't problematic from a performance perspective.  KVM
>>> takes control of the vCPU, i.e. forces a VM-Exit, to pend a virtual NMI, so there's
>>> no extra VM-Exit.
>>>
>>> The value added by vNMI support is that KVM doesn't need to manually track/detect
>>> when NMIs become unblocked in the guest.  SVM doesn't provide a hardware-supported
>>> NMI-window exiting, so KVM has to intercept _and_ single-step IRET, which adds two
>>> VM-Exits for _every_ NMI when vNMI isn't available (and it's a complete mess for
>>> things like SEV-ES).
>>>
>>>> 3. If not, does it mean that the current SVM's vNMI support scheme in the
>>>> Linux mainline code is flawed? How should it be fixed?
>>>
>>> The approach as a whole isn't flawed, it's the best KVM can do given SVM's vNMI
>>> architecture and KVM's ABI with respect to "concurrent" NMIs.
>>>
>>> Hrm, though there does appear to be a bug in the injecting path.  KVM doesn't
>>> manually set V_NMI_BLOCKING_MASK, and will unnecessarily enable IRET interception
>>> when manually injecting a vNMI.  Intercepting IRET should be unnecessary because
>>> hardware will automatically accept the pending vNMI when NMIs become unblocked.
>>> And I don't see anything in the APM that suggests hardware will set V_NMI_BLOCKING_MASK
>>> when software directly injects an NMI.
>>>
>>> So I think we need:
>>>
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index d381ad424554..c956a9f500a2 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -3476,6 +3476,11 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>>>         if (svm->nmi_l1_to_l2)
>>>                 return;
>>>  
>>> +       if (is_vnmi_enabled(svm)) {
>>> +               svm->vmcb->control.int_ctl |= V_NMI_BLOCKING_MASK;
>>> +               return;
>>> +       }
>>> +
>>>         svm->nmi_masked = true;
>>>         svm_set_iret_intercept(svm);
>>>         ++vcpu->stat.nmi_injections;
>>> --
>>>
>>> or if hardware does set V_NMI_BLOCKING_MASK in this case, just:
>>>
>>
>> Yes, HW does set BLOCKING_MASK when HW takes the pending vNMI event.
> 
> I'm not asking about the pending vNMI case, which is clearly spelled out in the
> APM.  I'm asking about directly injecting an NMI via:
> 
> 	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
>

Yes. This is documented in APM as well.
https://www.amd.com/system/files/TechDocs/24593.pdf : "15.21.10 NMI Virtualization"

"
If Event Injection is used to inject an NMI when NMI Virtualization is enabled,
VMRUN sets V_NMI_MASK in the guest state.
"
 
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index d381ad424554..201a1a33ecd2 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -3473,7 +3473,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>>>  
>>>         svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
>>>  
>>> -       if (svm->nmi_l1_to_l2)
>>> +       if (svm->nmi_l1_to_l2 || is_vnmi_enabled(svm))
>>>                 return;
>>>  
>>>         svm->nmi_masked = true;
>>> --
>>>
>>
>> Above proposal make sense to me, I was reviewing source code flow
>> for a scenarios when two consecutive need to me delivered to Guest.
>> Example, process_nmi will pend the first NMI and then second NMI will
>> be injected through EVTINJ, as because (kvm_x86_inject_nmi)
>> will get called and that will set the _iret_intercept. 
>>
>> With your proposal inject_nmi will be set the env_inj NMI w/o the IRET,
>> I think we could check for "is_vnmi_enabled" before the programming
>> the "evt_inj"?
> 
> No, because the whole point of this path is to directly inject an NMI when NMIs
> are NOT blocked in the guest AND there is already a pending vNMI.

I agree, Your patch looks fine.

I'll do the testing on above patch and share the test feedback.

Thanks,
Santosh