[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <ECBA1CAA-823A-4332-A35D-F0A64CD35C6A@gmail.com>
Date: Thu, 21 Nov 2013 12:26:48 +0800
From: Xiao Guangrong <xiaoguangrong.eric@...il.com>
To: Marcelo Tosatti <mtosatti@...hat.com>
Cc: Gleb Natapov <gleb@...hat.com>, avi.kivity@...il.com,
"pbonzini@...hat.com Bonzini" <pbonzini@...hat.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable
On Nov 21, 2013, at 3:47 AM, Marcelo Tosatti <mtosatti@...hat.com> wrote:
> On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote:
>>> But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
>>> executing?
>>
>> Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be generated
>> when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages the vcpus
>> should be stopped.
>>
>>>
>>> With fast page fault:
>>>
>>> if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>>> /* The window in here... */
>>> mark_page_dirty(vcpu->kvm, gfn);
>>>
>>> And the $SUBJECT set_spte reordering, the rule becomes
>>>
>>> A call to GET_DIRTY_LOG guarantees to return correct information about
>>> dirty pages before invocation of the previous GET_DIRTY_LOG call.
>>>
>>> (see example 1: the next GET_DIRTY_LOG will return the dirty information
>>> there).
>>>
>>
>> It seems no.
>>
>> The first GET_DIRTY_LOG can happen before fast-page-fault,
>> the second GET_DIRTY_LOG happens in the window between cmpxchg()
>> and mark_page_dirty(), for the second one, the information is still “incorrect”.
>
> Its correct for the previous GET_DIRTY_LOG call.
Oh, yes.
>
>>> The rule for sptes that is, because kvm_write_guest does not match the
>>> documentation at all.
>>
>> You mean the case of “kvm_write_guest” is valid (I do not know why it is)?
>> Or anything else?
>>
>>>
>>> So before example 1 and this patch, the rule (well for sptes at least) was
>>>
>>> "Given a memory slot, return a bitmap containing any pages dirtied
>>> since the last call to this ioctl. Bit 0 is the first page in the
>>> memory slot. Ensure the entire structure is cleared to avoid padding
>>> issues."
>>>
>>> Can you explain why it is OK to relax this rule?
>>
>> It’s because:
>> 1) it doesn’t break current use cases, i.e. Live migration and FB-flushing.
>> 2) the current code, like kvm_write_guest has already broken the documentation
>> (the guest page has been written but missed in the dirty bitmap).
>> 3) it’s needless to implement a exact get-dirty-pages since the dirty pages can
>> no be exactly got except stopping vcpus.
>>
>> So i think we'd document this case instead. No?
>
> Lets figure out the requirements, then. I don't understand why
> FB-flushing is safe (think kvm-autotest: one pixel off the entire
> test fails).
I did not read FB-flushing code, i guess the reason why it can work is:
FB-flushing do periodicity get-dirty-page and flush it. After guest writes
the page, the page will be flushed in the next GET_DIRTY_LOG.
>
> Before fast page fault: Pages are either write protected or the
> corresponding dirty bitmap bit is set. Any write faults to dirty logged
> sptes while GET_DIRTY log is executing in the protected section are
> allowed to instantiate writeable spte after GET_DIRTY log is finished.
>
> After fast page fault: Pages can be writeable and the dirty bitmap not
> set. Therefore data can be dirty before GET_DIRTY executes and still
> fail to be returned in the bitmap.
It’s right. The current GET_DIRTY fail to get the dirty page but the
next GET_DIRTY can get it properly since the current GET_DIRTY
need to flush all TLBs that waits for fast-page-fault finish.
I do not think it’s a big problem since single GET_DIRTY is useless as
i explained in the previous mail.
>
> Since this patchset does not introduce change in behaviour (fast pf
> did), no reason to avoid merging this.
Yes, thank you, Marcelo! :)
>
> BTW, since GET_DIRTY log does not require to report concurrent (to
> GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte
> list, is it?
You mean the “rewalk” we introduced in pte_list_walk_lockless() in this patchset?
I think this rewalk is needed because it’s caused by meet a unexpected nulls that
means we’re walking on the unexpected rmap. If we do not do this, some writable
sptes will be missed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists