lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ec578526-989d-0913-e40e-9e463fb85a8f@intel.com>
Date:   Fri, 14 Jan 2022 15:52:35 +0800
From:   Zeng Guang <guang.zeng@...el.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Kim Phillips <kim.phillips@....com>,
        Jarkko Sakkinen <jarkko@...nel.org>,
        Jethro Beekman <jethro@...tanix.com>,
        "Huang, Kai" <kai.huang@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Hu, Robert" <robert.hu@...el.com>,
        "Gao, Chao" <chao.gao@...el.com>
Subject: Re: [PATCH v5 5/8] KVM: x86: Support interrupt dispatch in x2APIC
 mode with APIC-write VM exit

On 1/14/2022 5:29 AM, Sean Christopherson wrote:
> On Fri, Dec 31, 2021, Zeng Guang wrote:
>> In VMX non-root operation, new behavior applies to
> "new behavior" is ambiguous, it's not clear if it refers to new hardware behavior,
> new KVM behavior, etc...
>
>> virtualize WRMSR to vICR in x2APIC mode. Depending
> Please wrap at ~75 chars, this is too narrow.
>
>> on settings of the VM-execution controls, CPU would
>> produce APIC-write VM-exit following the 64-bit value
>> written to offset 300H on the virtual-APIC page(vICR).
>> KVM needs to retrieve the value written by CPU and
>> emulate the vICR write to deliver an interrupt.
>>
>> Current KVM doesn't consider to handle the 64-bit setting
>> on vICR in trap-like APIC-write VM-exit. Because using
>> kvm_lapic_reg_write() to emulate writes to APIC_ICR requires
>> the APIC_ICR2 is already programmed correctly. But in the
>> above APIC-write VM-exit, CPU writes the whole 64 bits to
>> APIC_ICR rather than program higher 32 bits and lower 32
>> bits to APIC_ICR2 and APIC_ICR respectively. So, KVM needs
>> to retrieve the whole 64-bit value and program higher 32 bits
>> to APIC_ICR2 first.
> I think this is simply saying:
>
>    Upcoming Intel CPUs will support virtual x2APIC MSR writes to the vICR,
>    i.e. will trap and generate an APIC-write VM-Exit instead of intercepting
>    the WRMSR.  Add support for handling "nodecode" x2APIC writes, which were
>    previously impossible.
>
>    Note, x2APIC MSR writes are 64 bits wide.
>
> and then the shortlog can be:
>
>    KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode
>
> The "interrupt dispatch" part is quite confusing because it's not really germane
> to the change; yes, the vICR write does (eventually) dispatch an IRQ, but that
> has nothing to do with the code being modified.

I would take commit message as you suggested. Thanks.

>> Signed-off-by: Zeng Guang <guang.zeng@...el.com>
>> ---
>>   arch/x86/kvm/lapic.c | 12 +++++++++---
>>   arch/x86/kvm/lapic.h |  5 +++++
>>   2 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index f206fc35deff..3ce7142ba00e 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -2186,15 +2186,21 @@ EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
>>   /* emulate APIC access in a trap manner */
>>   void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
>>   {
>> -	u32 val = 0;
>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>> +	u64 val = 0;
>>   
>>   	/* hw has done the conditional check and inst decode */
>>   	offset &= 0xff0;
>>   
>> -	kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
>> +	/* exception dealing with 64bit data on vICR in x2apic mode */
>> +	if ((offset == APIC_ICR) && apic_x2apic_mode(apic)) {
> Sorry, I failed to reply to your response in the previous version.  I suggested
> a WARN_ON(offset != APIC_ICR), but you were concerned that apic_x2apic_mode()
> would be expensive to check before @offset.  I don't think that's a valid concern
> as apic_x2apic_mode() is simply:
>
> 	apic->vcpu->arch.apic_base & X2APIC_ENABLE
>
> And is likely well-predicted by the CPU, especially in single tenant or pinned
> scenarios where the pCPU is running a single VM/vCPU, i.e. will amost never see
> X2APIC_ENABLE toggling.
>
> So I stand behind my previous feedback[*] that we should split on x2APIC.
>
>> +		val = kvm_lapic_get_reg64(apic, offset);
>> +		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(val>>32));
>> +	} else
>> +		kvm_lapic_reg_read(apic, offset, 4, &val);
> Needs curly braces.  But again, I stand behind my previous feedback that this
> would be better written as:
>
>          if (apic_x2apic_mode(apic)) {
>                  if (WARN_ON_ONCE(offset != APIC_ICR))
>                          return 1;
>
>                  kvm_lapic_reg_read(apic, offset, 8, &val);
>                  kvm_lapic_reg_write64(apic, offset, val);
>          } else {
>                  kvm_lapic_reg_read(apic, offset, 4, &val);
>                  kvm_lapic_reg_write(apic, offset, val);
>          }
>
> after a patch (provided in earlier feedback) to introduce kvm_lapic_reg_write64().
>
> [*] https://lore.kernel.org/all/YTvcJZSd1KQvNmaz@google.com

kvm_lapic_reg_read() is limited to read up to 4 bytes. It needs extension to support 64bit
read. And another concern is here getting reg value only specific from vICR(no other regs
need take care), going through whole path on kvm_lapic_reg_read() could be time-consuming
unnecessarily. Is it proper that calling kvm_lapic_get_reg64() to retrieve vICR value directly?

The change could be like follows:

         if (apic_x2apic_mode(apic)) {
                 if (WARN_ON_ONCE(offset != APIC_ICR))
                         return 1;

                 val = kvm_lapic_get_reg64(apic, offset);
                 kvm_lapic_reg_write64(apic, offset, val);
         } else {
                 kvm_lapic_reg_read(apic, offset, 4, &val);
                 kvm_lapic_reg_write(apic, offset, val);
         }

  

>>   	/* TODO: optimize to just emulate side effect w/o one more write */
>> -	kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
>> +	kvm_lapic_reg_write(apic, offset, (u32)val);
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
>>   

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ