linux-kernel - Re: [PATCH 00/13] KVM: MMU: fast page fault

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4F8C4284.9080201@redhat.com>
Date:	Mon, 16 Apr 2012 19:02:12 +0300
From:	Avi Kivity <avi@...hat.com>
To:	Takuya Yoshikawa <takuya.yoshikawa@...il.com>
CC:	kvm-ppc@...r.kernel.org,
	Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Xiao Guangrong <xiaoguangrong.eric@...il.com>,
	LKML <linux-kernel@...r.kernel.org>, KVM <kvm@...r.kernel.org>
Subject: Re: [PATCH 00/13] KVM: MMU: fast page fault

On 04/16/2012 06:49 PM, Takuya Yoshikawa wrote:
> > This doesn't work for EPT, which lacks a dirty bit.  But we can emulate
> > it: take a free bit and call it spte.NOTDIRTY, when it is set, we also
> > clear spte.WRITE, and teach the mmu that if it sees spte.NOTDIRTY and
> > can just set spte.WRITE and clear spte.NOTDIRTY.  Now that looks exactly
> > like Xiao's lockless write enabling.
>
> How do we sync with dirty_bitmap?

In Xiao's patch we call mark_page_dirty() at fault time.  With the
write-protect-less approach, we look at spte.DIRTY (or spte.NOTDIRTY)
during GET_DIRTY_LOG, or when the spte is torn down.

> > Another note: O(1) write protection is not mutually exclusive with rmap
> > based write protection.  In GET_DIRTY_LOG, you write protect everything,
> > and proceed to write enable on faults.  When you reach the page table
> > level, you perform the rmap check to see if you should write protect or
> > not.  With role.direct=1 the check is very cheap (and sometimes you can
> > drop the entire page table and replace it with a large spte).
>
> I understand that there are many possible combinations.
>
> But the question is whether the complexity is really worth it.

We don't know yet.  I'm just throwing ideas around.

> Once, when we were searching a way to find atomic bitmap switch, you said
> to me that we should do our best not to add overheads to VCPU threads.
>
> From then, I tried my best to mitigate the latency problem without adding
> code to VCPU thread paths: if we add cond_resched patch, we will get a simple
> solution to the current known problem -- probably 64GB guests will work well
> without big latencies, once QEMU gets improved.

Sure, I'm not advocating doing the most nifty idea.  After all I'm the
one that suffers most from it.  Everything should be proven to improve,
and the improvement should be material, not just a random measurement
that doesn't matter to anyone.

>
> 	I also surveyed other known hypervisors internally.  We can easily see
> 	hundreds of ms latency during migration.  But people rarely complain
> 	about that if they are stable and usable in most situations.

There is also the unavoidable latency during the final stop-and-copy
phase, at least without post-copy.  And the migration thread (when we
have one) is hardly latency sensitive.

> Although O(1) is actually O(1) for GET_DIRTY_LOG thread, it adds some
> overheads to page fault handling.  We may need to hold mmu_lock for properly
> handling O(1)'s write protection and ~500 write protections will not be so
> cheap.  And there is no answer to the question how to achive slot-wise write
> protection.
>
> Of course, we may need such a tree-wide write protection when we want to
> support guests with hundreds of GB, or TB, of memory.  Sadly it's not now.
>
>
> Well, if you need the best answer now, we should discuss the whole design:
> KVM Forum may be a good place for that.

We don't need the best answer now, I'm satisfied with incremental
improvements.  But it's good to have the ideas out in the open, maybe
some of them will be adopted, or maybe they'll trigger a better idea.

(btw O(1) write protection is equally applicable to ordinary fork())

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/