linux-kernel - Re: [PATCH] KVM: nVMX: do not pin the VMCS12

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANRm+Cwurxx5cYZWFc8K1-zqk+srxRSkiFKhQ1im3Wm8sFeQjQ@mail.gmail.com>
Date:   Fri, 28 Jul 2017 09:28:51 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     David Matlack <dmatlack@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        kvm list <kvm@...r.kernel.org>,
        Jim Mattson <jmattson@...gle.com>
Subject: Re: [PATCH] KVM: nVMX: do not pin the VMCS12

2017-07-28 1:20 GMT+08:00 David Matlack <dmatlack@...gle.com>:
> On Thu, Jul 27, 2017 at 6:54 AM, Paolo Bonzini <pbonzini@...hat.com> wrote:
>> Since the current implementation of VMCS12 does a memcpy in and out
>> of guest memory, we do not need current_vmcs12 and current_vmcs12_page
>> anymore.  current_vmptr is enough to read and write the VMCS12.
>
> This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
> VMCS12 page by using kvm_write_guest. nested_release_page() only marks
> the struct page dirty.
>
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
>> ---
>>  arch/x86/kvm/vmx.c | 23 ++++++-----------------
>>  1 file changed, 6 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index b37161808352..142f16ebdca2 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -416,9 +416,6 @@ struct nested_vmx {
>>
>>         /* The guest-physical address of the current VMCS L1 keeps for L2 */
>>         gpa_t current_vmptr;
>> -       /* The host-usable pointer to the above */
>> -       struct page *current_vmcs12_page;
>> -       struct vmcs12 *current_vmcs12;
>>         /*
>>          * Cache of the guest's VMCS, existing outside of guest memory.
>>          * Loaded from guest memory during VMPTRLD. Flushed to guest
>> @@ -7183,10 +7180,6 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
>>         if (vmx->nested.current_vmptr == -1ull)
>>                 return;
>>
>> -       /* current_vmptr and current_vmcs12 are always set/reset together */
>> -       if (WARN_ON(vmx->nested.current_vmcs12 == NULL))
>> -               return;
>> -
>>         if (enable_shadow_vmcs) {
>>                 /* copy to memory all shadowed fields in case
>>                    they were modified */
>> @@ -7199,13 +7192,11 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
>>         vmx->nested.posted_intr_nv = -1;
>>
>>         /* Flush VMCS12 to guest memory */
>> -       memcpy(vmx->nested.current_vmcs12, vmx->nested.cached_vmcs12,
>> -              VMCS12_SIZE);
>> +       kvm_vcpu_write_guest_page(&vmx->vcpu,
>> +                                 vmx->nested.current_vmptr >> PAGE_SHIFT,
>> +                                 vmx->nested.cached_vmcs12, 0, VMCS12_SIZE);
>
> Have you hit any "suspicious RCU usage" error messages during VM

Yeah, I observe this splat when testing Paolo's patch today.

[87214.855344] =============================
[87214.855346] WARNING: suspicious RCU usage
[87214.855348] 4.13.0-rc2+ #2 Tainted: G           OE
[87214.855350] -----------------------------
[87214.855352] ./include/linux/kvm_host.h:573 suspicious
rcu_dereference_check() usage!
[87214.855353]
other info that might help us debug this:

[87214.855355]
rcu_scheduler_active = 2, debug_locks = 1
[87214.855357] 1 lock held by qemu-system-x86/17059:
[87214.855359]  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc051bb12>]
vcpu_load+0x22/0x80 [kvm]
[87214.855396]
stack backtrace:
[87214.855399] CPU: 3 PID: 17059 Comm: qemu-system-x86 Tainted: G
     OE   4.13.0-rc2+ #2
[87214.855401] Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY,
BIOS FBKTC1AUS 02/16/2016
[87214.855403] Call Trace:
[87214.855408]  dump_stack+0x99/0xce
[87214.855413]  lockdep_rcu_suspicious+0xc5/0x100
[87214.855423]  kvm_vcpu_gfn_to_memslot+0x166/0x180 [kvm]
[87214.855432]  kvm_vcpu_write_guest_page+0x24/0x50 [kvm]
[87214.855438]  free_nested.part.76+0x76/0x270 [kvm_intel]
[87214.855443]  vmx_free_vcpu+0x7a/0xc0 [kvm_intel]
[87214.855454]  kvm_arch_destroy_vm+0x104/0x1d0 [kvm]
[87214.855463]  kvm_put_kvm+0x17a/0x2b0 [kvm]
[87214.855473]  kvm_vm_release+0x21/0x30 [kvm]
[87214.855477]  __fput+0xfb/0x240
[87214.855482]  ____fput+0xe/0x10
[87214.855485]  task_work_run+0x7e/0xb0
[87214.855490]  do_exit+0x323/0xcf0
[87214.855494]  ? get_signal+0x318/0x930
[87214.855498]  ? _raw_spin_unlock_irq+0x2c/0x60
[87214.855503]  do_group_exit+0x50/0xd0
[87214.855507]  get_signal+0x24f/0x930
[87214.855514]  do_signal+0x37/0x750
[87214.855518]  ? __might_fault+0x3e/0x90
[87214.855523]  ? __might_fault+0x85/0x90
[87214.855527]  ? exit_to_usermode_loop+0x2b/0x100
[87214.855531]  ? __this_cpu_preempt_check+0x13/0x20
[87214.855535]  exit_to_usermode_loop+0xab/0x100
[87214.855539]  syscall_return_slowpath+0x153/0x160
[87214.855542]  entry_SYSCALL_64_fastpath+0xc0/0xc2
[87214.855545] RIP: 0033:0x7ff40d24a26d


Regards,
Wanpeng Li

> teardown with this patch? We did when we replaced memcpy with
> kvm_write_guest a while back. IIRC it was due to kvm->srcu not being
> held in one of the teardown paths. kvm_write_guest() expects it to be
> held in order to access memslots.
>
> We fixed this by skipping the VMCS12 flush during VMXOFF. I'll send
> that patch along with a few other nVMX dirty tracking related patches
> I've been meaning to get upstreamed.
>
>>
>> -       kunmap(vmx->nested.current_vmcs12_page);
>> -       nested_release_page(vmx->nested.current_vmcs12_page);
>>         vmx->nested.current_vmptr = -1ull;
>> -       vmx->nested.current_vmcs12 = NULL;
>>  }
>>
>>  /*
>> @@ -7623,14 +7614,13 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
>>                 }
>>
>>                 nested_release_vmcs12(vmx);
>> -               vmx->nested.current_vmcs12 = new_vmcs12;
>> -               vmx->nested.current_vmcs12_page = page;
>>                 /*
>>                  * Load VMCS12 from guest memory since it is not already
>>                  * cached.
>>                  */
>> -               memcpy(vmx->nested.cached_vmcs12,
>> -                      vmx->nested.current_vmcs12, VMCS12_SIZE);
>> +               memcpy(vmx->nested.cached_vmcs12, new_vmcs12, VMCS12_SIZE);
>> +               kunmap(page);
>
> + nested_release_page_clean(page);
>
>> +
>>                 set_current_vmptr(vmx, vmptr);
>>         }
>>
>> @@ -9354,7 +9344,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>>
>>         vmx->nested.posted_intr_nv = -1;
>>         vmx->nested.current_vmptr = -1ull;
>> -       vmx->nested.current_vmcs12 = NULL;
>>
>>         vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED;
>>
>> --
>> 1.8.3.1
>>