linux-kernel - Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131120194727.GA3353@amt.cnet>
Date:	Wed, 20 Nov 2013 17:47:27 -0200
From:	Marcelo Tosatti <mtosatti@...hat.com>
To:	Xiao Guangrong <xiaoguangrong.eric@...il.com>
Cc:	Gleb Natapov <gleb@...hat.com>, avi.kivity@...il.com,
	"pbonzini@...hat.com Bonzini" <pbonzini@...hat.com>,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte
 writable

On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote:
> > But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
> > executing? 
> 
> Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be generated
> when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages the vcpus
> should be stopped. 
> 
> > 
> > With fast page fault:
> > 
> >      if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
> >              /* The window in here... */
> >              mark_page_dirty(vcpu->kvm, gfn);
> > 
> > And the $SUBJECT set_spte reordering, the rule becomes
> > 
> > A call to GET_DIRTY_LOG guarantees to return correct information about 
> > dirty pages before invocation of the previous GET_DIRTY_LOG call.
> > 
> > (see example 1: the next GET_DIRTY_LOG will return the dirty information
> > there).
> > 
> 
> It seems no.
> 
> The first GET_DIRTY_LOG can happen before fast-page-fault，
> the second GET_DIRTY_LOG happens in the window between cmpxchg()
> and mark_page_dirty(), for the second one, the information is still “incorrect”.

Its correct for the previous GET_DIRTY_LOG call.

> > The rule for sptes that is, because kvm_write_guest does not match the
> > documentation at all.
> 
> You mean the case of “kvm_write_guest” is valid (I do not know why it is)?
> Or anything else?
> 
> > 
> > So before example 1 and this patch, the rule (well for sptes at least) was
> > 
> > "Given a memory slot, return a bitmap containing any pages dirtied
> > since the last call to this ioctl.  Bit 0 is the first page in the
> > memory slot.  Ensure the entire structure is cleared to avoid padding
> > issues."
> > 
> > Can you explain why it is OK to relax this rule?
> 
> It’s because:
> 1) it doesn’t break current use cases, i.e. Live migration and FB-flushing.
> 2) the current code, like kvm_write_guest  has already broken the documentation
>    (the guest page has been written but missed in the dirty bitmap).
> 3) it’s needless to implement a exact get-dirty-pages since the dirty pages can
>    no be exactly got except stopping vcpus. 
> 
> So i think we'd document this case instead. No?

Lets figure out the requirements, then. I don't understand why
FB-flushing is safe (think kvm-autotest: one pixel off the entire
test fails).

Before fast page fault: Pages are either write protected or the
corresponding dirty bitmap bit is set. Any write faults to dirty logged
sptes while GET_DIRTY log is executing in the protected section are
allowed to instantiate writeable spte after GET_DIRTY log is finished.

After fast page fault: Pages can be writeable and the dirty bitmap not
set. Therefore data can be dirty before GET_DIRTY executes and still
fail to be returned in the bitmap.

Since this patchset does not introduce change in behaviour (fast pf
did), no reason to avoid merging this.

BTW, since GET_DIRTY log does not require to report concurrent (to
GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte
list, is it?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/