linux-kernel - Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <B166B522-2CDB-44CA-B594-687B45A1CF01@linux.vnet.ibm.com>
Date:	Wed, 20 Nov 2013 22:20:09 +0800
From:	Xiao Guangrong <xiaoguangrong.eric@...il.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
Cc:	Gleb Natapov <gleb@...hat.com>, avi.kivity@...il.com,
	"pbonzini@...hat.com Bonzini" <pbonzini@...hat.com>,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable


On Nov 20, 2013, at 8:29 AM, Marcelo Tosatti <mtosatti@...hat.com> wrote:

> On Wed, Aug 07, 2013 at 12:06:49PM +0800, Xiao Guangrong wrote:
>> On 08/07/2013 09:48 AM, Marcelo Tosatti wrote:
>>> On Tue, Jul 30, 2013 at 09:02:02PM +0800, Xiao Guangrong wrote:
>>>> Make sure we can see the writable spte before the dirt bitmap is visible
>>>> 
>>>> We do this is for kvm_vm_ioctl_get_dirty_log() write-protects the spte based
>>>> on the dirty bitmap, we should ensure the writable spte can be found in rmap
>>>> before the dirty bitmap is visible. Otherwise, we cleared the dirty bitmap and
>>>> failed to write-protect the page
>>>> 
>>>> Signed-off-by: Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>
>>>> ---
>>>> arch/x86/kvm/mmu.c | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>> 
>>> Can you explain why this is safe, with regard to the rule 
>>> at edde99ce05290e50 ?
>> 
>> BTW, this log fixed this case:
>> 
>>    VCPU 0                KVM migration control
>> 
>>                          write-protects all pages
>> #Pf happen then the page
>> become writable, set dirty
>> bit on the bitmap
>> 
>>                         swap the bitmap, current bitmap is empty
>> 
>> write the page (no dirty log)
>> 
>>                         stop the guest and push
>>                         the remaining dirty pages
>> Stopped
>>                         See current bitmap is empty that means
>>                         no page is dirty.
>>> 
>>> "The rule is that all pages are either dirty in the current bitmap,
>>> or write-protected, which is violated here."
>> 
>> Actually, this rule is not complete true, there's the 3th case:
>> the window between write guest page and set dirty bitmap is valid.
>> In that window, page is write-free and not dirty logged.
>> 
>> This case is based on the fact that at the final step of live migration,
>> kvm should stop the guest and push the remaining dirty pages to the
>> destination.
>> 
>> They're some examples in the current code:
>> example 1, in fast_pf_fix_direct_spte():
>> 	if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>> 		/* The window in here... */
>> 		mark_page_dirty(vcpu->kvm, gfn);
>> 
>> example 2, in kvm_write_guest_page():
>> 	r = __copy_to_user((void __user *)addr + offset, data, len);
>> 	if (r)
>> 		return -EFAULT;
>> 	/*
>> 	 * The window is here, the page is dirty but not logged in
>>        * The bitmap.
>> 	 */
>> 	mark_page_dirty(kvm, gfn);
>> 	return 0;
> 

Hi Marcelo,

> Why is this valid ? That is, the obviously correct rule is
> 
> "that all pages are either dirty in the current bitmap,
> or write-protected, which is violated here."
> 
> With the window above, GET_DIRTY_LOG can be called 100 times while the 
> page is dirty, but the corresponding bit not set in the dirty bitmap.
> 
> It violates the documentation:
> 
> /* for KVM_GET_DIRTY_LOG */
> struct kvm_dirty_log {
>       __u32 slot;
>       __u32 padding;
>       union {
>               void __user *dirty_bitmap; /* one bit per page */
>               __u64 padding;
>       };
> };
> 
> Given a memory slot, return a bitmap containing any pages dirtied
> since the last call to this ioctl.  Bit 0 is the first page in the
> memory slot.  Ensure the entire structure is cleared to avoid padding
> issues.
> 
> The point about migration, is that GET_DIRTY_LOG is strictly correct
> because it stops vcpus.
> 
> But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
> executing? 

Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be generated
when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages the vcpus
should be stopped. 

> 
> With fast page fault:
> 
>      if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>              /* The window in here... */
>              mark_page_dirty(vcpu->kvm, gfn);
> 
> And the $SUBJECT set_spte reordering, the rule becomes
> 
> A call to GET_DIRTY_LOG guarantees to return correct information about 
> dirty pages before invocation of the previous GET_DIRTY_LOG call.
> 
> (see example 1: the next GET_DIRTY_LOG will return the dirty information
> there).
> 

It seems no.

The first GET_DIRTY_LOG can happen before fast-page-fault，
the second GET_DIRTY_LOG happens in the window between cmpxchg()
and mark_page_dirty(), for the second one, the information is still “incorrect”.

> The rule for sptes that is, because kvm_write_guest does not match the
> documentation at all.

You mean the case of “kvm_write_guest” is valid (I do not know why it is)?
Or anything else?

> 
> So before example 1 and this patch, the rule (well for sptes at least) was
> 
> "Given a memory slot, return a bitmap containing any pages dirtied
> since the last call to this ioctl.  Bit 0 is the first page in the
> memory slot.  Ensure the entire structure is cleared to avoid padding
> issues."
> 
> Can you explain why it is OK to relax this rule?

It’s because:
1) it doesn’t break current use cases, i.e. Live migration and FB-flushing.
2) the current code, like kvm_write_guest  has already broken the documentation
   (the guest page has been written but missed in the dirty bitmap).
3) it’s needless to implement a exact get-dirty-pages since the dirty pages can
   no be exactly got except stopping vcpus. 

So i think we'd document this case instead. No?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/