linux-kernel - Re: [EXTERNAL] [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YhklbH6ZyYrZmmGw@google.com>
Date:   Fri, 25 Feb 2022 18:52:28 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     David Woodhouse <dwmw2@...radead.org>
Cc:     "borntraeger@...ux.ibm.com" <borntraeger@...ux.ibm.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "frankja@...ux.ibm.com" <frankja@...ux.ibm.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "imbrenda@...ux.ibm.com" <imbrenda@...ux.ibm.com>,
        "david@...hat.com" <david@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [EXTERNAL] [PATCH v2] KVM: Don't actually set a request when
 evicting vCPUs for GFN cache invd

On Fri, Feb 25, 2022, David Woodhouse wrote:
> On Fri, 2022-02-25 at 17:27 +0000, Sean Christopherson wrote:
> > On Fri, Feb 25, 2022, David Woodhouse wrote:
> > > On Fri, 2022-02-25 at 16:13 +0000, Sean Christopherson wrote:
> > > > On Fri, Feb 25, 2022, Woodhouse, David wrote:
> > > > > Since we need an active vCPU context to do dirty logging (thanks, dirty
> > > > > ring)... and since any time vcpu_run exits to userspace for any reason
> > > > > might be the last time we ever get an active vCPU context... I think
> > > > > that kind of fundamentally means that we must flush dirty state to the
> > > > > log on *every* return to userspace, doesn't it?
> > > > 
> > > > I would rather add a variant of mark_page_dirty_in_slot() that takes a vCPU, which
> > > > we whould have in all cases.  I see no reason to require use of kvm_get_running_vcpu().
> > > 
> > > We already have kvm_vcpu_mark_page_dirty(), but it can't use just 'some
> > > vcpu' because the dirty ring is lockless. So if you're ever going to
> > > use anything other than kvm_get_running_vcpu() we need to add locks.
> > 
> > Heh, actually, scratch my previous comment.  I was going to respond that
> > kvm_get_running_vcpu() is mutually exclusive with all other ioctls() on the same
> > vCPU by virtue of vcpu->mutex, but I had forgotten that kvm_get_running_vcpu()
> > really should be "kvm_get_loaded_vcpu()".  I.e. as long as KVM is in a vCPU-ioctl
> > path, kvm_get_running_vcpu() will be non-null.
> > 
> > > And while we *could* do that, I don't think it would negate the
> > > fundamental observation that *any* time we return from vcpu_run to
> > > userspace, that could be the last time. Userspace might read the dirty
> > > log for the *last* time, and any internally-cached "oh, at some point
> > > we need to mark <this> page dirty" is lost because by the time the vCPU
> > > is finally destroyed, it's too late.
> > 
> > Hmm, isn't that an existing bug?  I think the correct fix would be to flush all
> > dirty vmcs12 pages to the memslot in vmx_get_nested_state().  Userspace _must_
> > invoke that if it wants to migrated a nested vCPU.
> 
> Yes, AFAICT it's an existing bug in the way the kvm_host_map code works
> today. Your suggestion makes sense as *long* as we consider it OK to
> retrospectively document that userspace must extract the nested state
> *before* doing the final read of the dirty log.
> 
> I am not aware that we have a clearly documented "the dirty log may
> keep changing until XXX" anyway. But you're proposing that we change
> it, I think. There may well be VMMs which assume that no pages will be
> dirtied unless they are actually *running* a vCPU.
> 
> Which is why I was proposing that we flush the dirty status to the log
> *every* time we leave vcpu_run back to userspace. But I'll not die on
> that hill, if you make a good case for your proposal being OK.

Drat, I didn't consider the ABI aspect.  Flushing on every exit to userspace would
indeed be more robust.

> > > I think I'm going to rip out the 'dirty' flag from the gfn_to_pfn_cache
> > > completely and add a function (to be called with an active vCPU
> > > context) which marks the page dirty *now*.
> > 
> > Hrm, something like?
> > 
> >   1. Drop @dirty from kvm_gfn_to_pfn_cache_init()
> >   2. Rename @dirty => @old_dirty in kvm_gfn_to_pfn_cache_refresh()
> >   3. Add an API to mark the associated slot dirty without unmapping
> > 
> > I think that makes sense.
> 
> Except I'll drop 'dirty' from kvm_gfn_to_pfn_cache_refresh() too.
> There's no scope for a deferred "oh, I meant to tell you that was
> dirty" even in that case, is there? Use the API we add in your #3.

But won't we end up with a bunch of call sites that attempt to determine whether
not the dirty status needs to be flushed?  I'm specifically thinking of scenarios
where the status needs to be conditionally flushed, e.g. if the backing pfn doesn't
change, then it's ok to not mark the page dirty.  Not handling that in the refresh
helper will either lead to unnecessary dirtying or duplicate code/work.