lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 31 Dec 2017 09:08:11 +0100
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Wanpeng Li <kernellwp@...il.com>
Cc:     linux-kernel@...r.kernel.org, kvm <kvm@...r.kernel.org>
Subject: Re: [PATCH 2/4] KVM: nVMX: track dirty state of non-shadowed VMCS
 fields

On 25/12/2017 04:03, Wanpeng Li wrote:
> 2017-12-21 20:43 GMT+08:00 Paolo Bonzini <pbonzini@...hat.com>:
>> VMCS12 fields that are not handled through shadow VMCS are rarely
>> written, and thus they are also almost constant in the vmcs02.  We can
>> thus optimize prepare_vmcs02 by skipping all the work for non-shadowed
>> fields in the common case.
>>
>> This patch introduces the (pretty simple) tracking infrastructure; the
>> next patches will move work to prepare_vmcs02_full and save a few hundred
>> clock cycles per VMRESUME on a Haswell Xeon E5 system:
>>
>>                                         before  after
>>         cpuid                           14159   13869
>>         vmcall                          15290   14951
>>         inl_from_kernel                 17703   17447
>>         outl_to_kernel                  16011   14692
>>         self_ipi_sti_nop                16763   15825
>>         self_ipi_tpr_sti_nop            17341   15935
>>         wr_tsc_adjust_msr               14510   14264
>>         rd_tsc_adjust_msr               15018   14311
>>         mmio-wildcard-eventfd:pci-mem   16381   14947
>>         mmio-datamatch-eventfd:pci-mem  18620   17858
>>         portio-wildcard-eventfd:pci-io  15121   14769
>>         portio-datamatch-eventfd:pci-io 15761   14831
>>
>> (average savings 748, stdev 460).
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
>> ---
>>  arch/x86/kvm/vmx.c | 29 ++++++++++++++++++++++++++++-
>>  1 file changed, 28 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 2ee842990976..8b6013b529b3 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -441,6 +441,7 @@ struct nested_vmx {
>>          * data hold by vmcs12
>>          */
>>         bool sync_shadow_vmcs;
>> +       bool dirty_vmcs12;
>>
>>         bool change_vmcs01_virtual_x2apic_mode;
>>         /* L2 must run next, and mustn't decide to exit to L1. */
>> @@ -7879,8 +7880,10 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
>>  {
>>         unsigned long field;
>>         gva_t gva;
>> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
>>         unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>>         u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>> +
>>         /* The value to write might be 32 or 64 bits, depending on L1's long
>>          * mode, and eventually we need to write that into a field of several
>>          * possible lengths. The code below first zero-extends the value to 64
>> @@ -7923,6 +7926,20 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
>>                 return kvm_skip_emulated_instruction(vcpu);
>>         }
>>
>> +       switch (field) {
>> +#define SHADOW_FIELD_RW(x) case x:
>> +#include "vmx_shadow_fields.h"
> 
> What's will happen here if enable_shadow_vmcs == false?

If enable_shadow_vmcs == true, these fields never even go through
handle_vmwrite, so they have to be written always by prepare_vmcs02.
Other fields go through handle_vmwrite and set dirty_vmcs12.

If enable_shadow_vmcs == false, or if the fields do not exist in the
hardware VMCS (e.g. PML index on Haswell systems), writes go through
handle_vmwrite, but they still don't need to go through
prepare_vmcs02_full.  So the switch statement recognizes these flags, so
that they don't set dirty_vmcs12.

Thanks,

Paolo
> 
> Regards,
> Wanpeng Li
> 
>> +               /*
>> +                * The fields that can be updated by L1 without a vmexit are
>> +                * always updated in the vmcs02, the others go down the slow
>> +                * path of prepare_vmcs02.
>> +                */
>> +               break;
>> +       default:
>> +               vmx->nested.dirty_vmcs12 = true;
>> +               break;
>> +       }
>> +
>>         nested_vmx_succeed(vcpu);
>>         return kvm_skip_emulated_instruction(vcpu);
>>  }
>> @@ -7937,6 +7954,7 @@ static void set_current_vmptr(struct vcpu_vmx *vmx, gpa_t vmptr)
>>                              __pa(vmx->vmcs01.shadow_vmcs));
>>                 vmx->nested.sync_shadow_vmcs = true;
>>         }
>> +       vmx->nested.dirty_vmcs12 = true;
>>  }
>>
>>  /* Emulate the VMPTRLD instruction */
>> @@ -10569,6 +10587,11 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>         return 0;
>>  }
>>
>> +static void prepare_vmcs02_full(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>> +                              bool from_vmentry)
>> +{
>> +}
>> +
>>  /*
>>   * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
>>   * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
>> @@ -10864,7 +10887,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>>                         vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
>>                         vmx_flush_tlb(vcpu, true);
>>                 }
>> -
>>         }
>>
>>         if (enable_pml) {
>> @@ -10913,6 +10935,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>>         /* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
>>         vmx_set_efer(vcpu, vcpu->arch.efer);
>>
>> +       if (vmx->nested.dirty_vmcs12) {
>> +               prepare_vmcs02_full(vcpu, vmcs12, from_vmentry);
>> +               vmx->nested.dirty_vmcs12 = false;
>> +       }
>> +
>>         /* Shadow page tables on either EPT or shadow page tables. */
>>         if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12),
>>                                 entry_failure_code))
>> --
>> 1.8.3.1
>>
>>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ