linux-kernel - Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <ECBA1CAA-823A-4332-A35D-F0A64CD35C6A@gmail.com>
Date:	Thu, 21 Nov 2013 12:26:48 +0800
From:	Xiao Guangrong <xiaoguangrong.eric@...il.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
Cc:	Gleb Natapov <gleb@...hat.com>, avi.kivity@...il.com,
	"pbonzini@...hat.com Bonzini" <pbonzini@...hat.com>,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable


On Nov 21, 2013, at 3:47 AM, Marcelo Tosatti <mtosatti@...hat.com> wrote:

> On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote:
>>> But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
>>> executing? 
>> 
>> Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be generated
>> when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages the vcpus
>> should be stopped. 
>> 
>>> 
>>> With fast page fault:
>>> 
>>>     if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>>>             /* The window in here... */
>>>             mark_page_dirty(vcpu->kvm, gfn);
>>> 
>>> And the $SUBJECT set_spte reordering, the rule becomes
>>> 
>>> A call to GET_DIRTY_LOG guarantees to return correct information about 
>>> dirty pages before invocation of the previous GET_DIRTY_LOG call.
>>> 
>>> (see example 1: the next GET_DIRTY_LOG will return the dirty information
>>> there).
>>> 
>> 
>> It seems no.
>> 
>> The first GET_DIRTY_LOG can happen before fast-page-fault，
>> the second GET_DIRTY_LOG happens in the window between cmpxchg()
>> and mark_page_dirty(), for the second one, the information is still “incorrect”.
> 
> Its correct for the previous GET_DIRTY_LOG call.

Oh, yes.

> 
>>> The rule for sptes that is, because kvm_write_guest does not match the
>>> documentation at all.
>> 
>> You mean the case of “kvm_write_guest” is valid (I do not know why it is)?
>> Or anything else?
>> 
>>> 
>>> So before example 1 and this patch, the rule (well for sptes at least) was
>>> 
>>> "Given a memory slot, return a bitmap containing any pages dirtied
>>> since the last call to this ioctl.  Bit 0 is the first page in the
>>> memory slot.  Ensure the entire structure is cleared to avoid padding
>>> issues."
>>> 
>>> Can you explain why it is OK to relax this rule?
>> 
>> It’s because:
>> 1) it doesn’t break current use cases, i.e. Live migration and FB-flushing.
>> 2) the current code, like kvm_write_guest  has already broken the documentation
>>   (the guest page has been written but missed in the dirty bitmap).
>> 3) it’s needless to implement a exact get-dirty-pages since the dirty pages can
>>   no be exactly got except stopping vcpus. 
>> 
>> So i think we'd document this case instead. No?
> 
> Lets figure out the requirements, then. I don't understand why
> FB-flushing is safe (think kvm-autotest: one pixel off the entire
> test fails).

I did not read FB-flushing code, i guess the reason why it can work is:
FB-flushing do periodicity get-dirty-page and flush it.  After guest writes
the page, the page will be flushed in the next GET_DIRTY_LOG.

> 
> Before fast page fault: Pages are either write protected or the
> corresponding dirty bitmap bit is set. Any write faults to dirty logged
> sptes while GET_DIRTY log is executing in the protected section are
> allowed to instantiate writeable spte after GET_DIRTY log is finished.
> 
> After fast page fault: Pages can be writeable and the dirty bitmap not
> set. Therefore data can be dirty before GET_DIRTY executes and still
> fail to be returned in the bitmap.

It’s right. The current GET_DIRTY fail to get the dirty page but the
next GET_DIRTY can get it properly since the current GET_DIRTY
need to flush all TLBs that waits for fast-page-fault finish.

I do not think it’s a big problem since single GET_DIRTY is useless as
i explained in the previous mail.

> 
> Since this patchset does not introduce change in behaviour (fast pf
> did), no reason to avoid merging this.

Yes, thank you, Marcelo! :)

> 
> BTW, since GET_DIRTY log does not require to report concurrent (to
> GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte
> list, is it?

You mean the “rewalk” we introduced in pte_list_walk_lockless() in this patchset?
I think this rewalk is needed because it’s caused by meet a unexpected nulls that
means we’re walking on the unexpected rmap. If we do not do this, some writable
sptes will be missed.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/