[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C319699.9000104@redhat.com>
Date: Mon, 05 Jul 2010 11:23:53 +0300
From: Avi Kivity <avi@...hat.com>
To: Xiao Guangrong <xiaoguangrong@...fujitsu.com>
CC: Marcelo Tosatti <mtosatti@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
KVM list <kvm@...r.kernel.org>
Subject: Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk
and pte prefetch
On 07/05/2010 05:52 AM, Xiao Guangrong wrote:
>
> Avi Kivity wrote:
>
>
>>> Umm, if we move the check before the judgment, it'll check every level,
>>> actually, only the opened mapping and the laster level need checked, so
>>> for the performance reason, maybe it's better to keep two check-point.
>>>
>>>
>> What exactly are the conditions when you want to check?
>>
>> Perhaps we do need to check every level. A write to a PDE (or higher
>> level) will clear the corresponding spte, but a fault immediately
>> afterwards can re-establish the spte to point to the new page.
>>
>>
> Looks into the code more carefully, maybe this code is wrong:
>
>
> if (!direct) {
> r = kvm_read_guest_atomic(vcpu->kvm,
> - gw->pte_gpa[level - 2],
> + gw->pte_gpa[level - 1],
> &curr_pte, sizeof(curr_pte));
> - if (r || curr_pte != gw->ptes[level - 2]) {
> + if (r || curr_pte != gw->ptes[level - 1]) {
> kvm_mmu_put_page(sp, sptep);
> kvm_release_pfn_clean(pfn);
> sptep = NULL;
>
> It should check the 'level' mapping not 'level - 1', in the later description
> i'll explain it.
>
Right, this fixes the check for the top level, but it removes a check
from the bottom level.
We need to move this to the top of the loop so we check all levels. I
guess this is why you needed to add a new check point. But since we
loop at least glevels times, we don't need two check points.
> Since the 'walk_addr' functions is out of mmu_lock protection, we should
> handle the guest modify its mapping between 'walk_addr' and 'fetch'.
> Now, let's consider every case may be happened while handle guest #PF.
>
> One case is host handle guest written after 'fetch', just like this order:
>
> VCPU 0 VCPU 1
> walk_addr
> guest modify its mapping
> fetch
> host handle this written(pte_write or invlpg)
>
> This case is not broken anything, even if 'fetch' setup the wrong mapping, the
> later written handler will fix it.
>
Ok.
> Another case is host handle guest written before 'fetch', like this:
>
> CPU 0 VCPU 1
> walk_addr
> guest modify its mapping
> host handle this written(pte_write or invlpg)
> fetch
>
> We should notice this case, since the later 'fetch' will setup the wrong mapping.
>
> For example, the guest mapping which 'walk_addr' got is:
>
> GPML4E -> GPDPE -> GPDE -> GPTE -> GFNA
> (Just take small page for example, other mapping way is also applied)
>
> And, for good to describe, we abstract 'fetch''s work:
>
> for_each_shadow_entry(vcpu, addr, iterator) {
> if (iterator.level == hlevel)
> Mapping the later level
>
> if (is_shadow_present_pte(*sptep)&&
> !is_large_pte(*sptep)) <------ Point A
> continue;
>
> /* handle the non-present/wrong middle level */
>
> find/create the corresponding sp<----- Point B
>
> if (the guest mapping is modified)<----- Point C
> break;
> setup this level's mapping<----- Point D
>
> }
>
> [
> Note: the later level means PTE, PDE if 2M page size, PDPE if 1G page size
> the middle level means PDE, PDPE if not using large page size / PML4E
> ]
>
> There are two cases:
>
> 1: Guest modify the middle level, for example, guest modify the GPDE.
> a: the GPDE has corresponding sp entry, after VCPU1 handle this written,
> the corresponding host mapping is like this:
> HPML4E -> HPDPE -> HPDE
> [ HPDE.P = 0 since VCPU1 written handler zapped it in pte_wirte ]
>
> Under this case, it can broke Point A's judgment and Point C can detect
> the guest mapping is modified, so it exits this function without setup the
> mapping.
>
> And, we should check the guest mapping at GPDE not GPTE since it's GPDE
> modified not GPTE, it's the explanation for the front fix.
>
Agree.
> b: the GPDE not has corresponding sp entry(the area is firstly accessed),
> corresponding host mapping is like this:
> HPML4E -> HPDPE
> [ HPDPE.P = 0]
>
> under this case, VCPU1 happily write GPDE without #PF since the GPDE not has shadow
> page, it's not write-protected.
>
> while we handle HPDPE, we will create the shadow page for GPDE
> at Point B, then the host mapping is like this:
> HPML4E -> HPDPE -> HPDE
> [ HPDE.P = 0 ]
> it's the same as 1.a, so do the same work as 1.a
>
> Note: there is a trick: we should put 'Point C' behind of 'Point B', since after we
> create sp for GPDE, the later modify GPDE will cause #PF, it can ensure later
> check is valid
>
> 2: Guest modify the later level.
> Form 'fetch''s abstract, we can see the late level is not checked by 'Point C', if
> guest modified the GPTE's mapping, the wrong-mapping will be setup by 'fetch', this
> is just this path does
>
>
Ok. So moving the check to before point A, and s/level - 2/level - 1/
should work, yes?
Should be slightly simpler since we don't need to kvm_mmu_put_page(sp,
sptep) any more.
Thanks for the detailed explanation.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists