linux-kernel - Re: [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <YhkAJ+nw2lCzRxsg@google.com>
Date:   Fri, 25 Feb 2022 16:13:27 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     "Woodhouse, David" <dwmw@...zon.co.uk>
Cc:     "borntraeger@...ux.ibm.com" <borntraeger@...ux.ibm.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "frankja@...ux.ibm.com" <frankja@...ux.ibm.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "imbrenda@...ux.ibm.com" <imbrenda@...ux.ibm.com>,
        "david@...hat.com" <david@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] KVM: Don't actually set a request when evicting vCPUs
 for GFN cache invd

On Fri, Feb 25, 2022, Woodhouse, David wrote:
> On Wed, 2022-02-23 at 16:53 +0000, Sean Christopherson wrote:
> > Don't actually set a request bit in vcpu->requests when making a request
> > purely to force a vCPU to exit the guest.  Logging a request but not
> > actually consuming it would cause the vCPU to get stuck in an infinite
> > loop during KVM_RUN because KVM would see the pending request and bail
> > from VM-Enter to service the request.
> 
> Hm, it might be that we *do* want to do some work.
> 
> I think there's a problem with the existing kvm_host_map that we
> haven't yet resolved with the new gfn_to_pfn_cache.
> 
> Look for the calls to 'kvm_vcpu_unmap(…, true)' in e.g. vmx/nested.c
> 
> Now, what if a vCPU is in guest mode, doesn't vmexit back to the L1,
> its userspace thread takes a signal and returns to userspace.
> 
> The pages referenced by those maps may have been written, but because
> the cache is still valid, they haven't been marked as dirty in the KVM
> dirty logs yet.
> 
> So, a traditional live migration workflow once it reaches convergence
> would pause the vCPUs, copy the final batch of dirty pages to the
> destination, then destroy the VM on the source.
> 
> And AFAICT those mapped pages don't actually get marked dirty until
> nested_vmx_free_cpu() calls vmx_leave_nested(). Which will probably
> trigger the dirty log WARN now, since there's no active vCPU context
> for logging, right?
> 
> And the latest copy of those pages never does get copied to the
> destination.
> 
> Since I didn't spot that problem until today, the pfn_to_gfn_cache
> design inherited it too. The 'dirty' flag remains set in the GPC until
> a subsequent revalidate or explicit unmap.
> 
> Since we need an active vCPU context to do dirty logging (thanks, dirty
> ring)... and since any time vcpu_run exits to userspace for any reason
> might be the last time we ever get an active vCPU context... I think
> that kind of fundamentally means that we must flush dirty state to the
> log on *every* return to userspace, doesn't it?

I would rather add a variant of mark_page_dirty_in_slot() that takes a vCPU, which
we whould have in all cases.  I see no reason to require use of kvm_get_running_vcpu().