linux-kernel - Re: [PATCH 00/13] KVM: MMU: fast page fault

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4F745C4F.4060404@redhat.com>
Date:	Thu, 29 Mar 2012 14:57:51 +0200
From:	Avi Kivity <avi@...hat.com>
To:	Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>
CC:	Marcelo Tosatti <mtosatti@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>, KVM <kvm@...r.kernel.org>
Subject: Re: [PATCH 00/13] KVM: MMU: fast page fault

On 03/29/2012 01:40 PM, Xiao Guangrong wrote:
> >> * Implementation
> >> We can freely walk the page between walk_shadow_page_lockless_begin and
> >> walk_shadow_page_lockless_end, it can ensure all the shadow page is valid.
> >>
> >> In the most case, cmpxchg is fair enough to change the access bit of spte,
> >> but the write-protect path on softmmu/nested mmu is a especial case: it is
> >> a read-check-modify path: read spte, check W bit, then clear W bit.
> > 
> > We also set gpte.D and gpte.A, no? How do you handle that?
> > 
>
>
> We still need walk gust page table before fast page fault to check
> whether the access is valid.

Ok.  But that's outside the mmu lock.

We can also skip that: if !sp->unsync, copy access rights and gpte.A/D
into spare bits in the spte, and use that to check.

>
> >>  In order
> >> to avoid marking spte writable after/during page write-protect, we do the
> >> trick like below:
> >>
> >>       fast page fault path:
> >>             lock RCU
> >>             set identification in the spte
> > 
> > What if you can't (already taken)?  Spin?  Slow path?
>
>
> In this patch, it allows to concurrently access on the same spte:
> it freely set its identification on the spte, because i did not
> want to introduce other atomic operations.
>
> You remind me that there may be a risk: if many vcpu fault on the
> same spte, it will retry the spte forever. Hmm, how about fix it
> like this:
>
> if ( spte.identification = 0) {
> 	set spte.identification = vcpu.id
> 	goto cmpxchg-path
> }
>
> if (spte.identification == vcpu.id)
> 	goto cmpxchg-path
>
> return to guest and retry the address again;
>
> cmpxchg-path:
> 	do checks and cmpxchg
>
> It can ensure the spte can be updated.

What's the difference between this and


  if test_and_set_bit(spte.lock)
       return_to_guest
  else
       do checks and cmpxchg

?

>
> >> The identification should be unique to avoid the below race:
> >>
> >>      VCPU 0                VCPU 1            VCPU 2
> >>       lock RCU
> >>    spte + identification
> >>    check conditions
> >>                        do write-protect, clear
> >>                           identification
> >>                                               lock RCU
> >>                                         set identification
> >>      cmpxchg + w - identification
> >>         OOPS!!!
> > 
> > Is it not sufficient to use just two bits?
> > 
> > pf_lock - taken by page fault path
> > wp_lock - taken by write protect path
> > 
> > pf cmpxchg checks both bits.
> > 
>
>
> If we just use two byte as identification, it has to use atomic
> operations to maintain these bits? or i misunderstood?

Ah, you're using byte writes to set the identification? remember I
didn't read the patches yet.

Please check the optimization manual, I remember there are warnings
there about using different sized accesses for the same location,
particularly when locked ops are involved.  Maybe bitops are cheaper. 
If the cost is similar, they're at least more similar to the rest of the
kernel.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/