linux-kernel - Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 27 Apr 2012 13:53:09 +0800
From:	Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
CC:	Avi Kivity <avi@...hat.com>, LKML <linux-kernel@...r.kernel.org>,
	KVM <kvm@...r.kernel.org>
Subject: Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

On 04/27/2012 07:45 AM, Marcelo Tosatti wrote:


>> +static bool
>> +fast_pf_fix_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>> +		  u64 *sptep, u64 spte)
>> +{
>> +	gfn_t gfn;
>> +
>> +	spin_lock(&vcpu->kvm->mmu_lock);
>> +
>> +	/* The spte has been changed. */
>> +	if (*sptep != spte)
>> +		goto exit;
>> +
>> +	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
>> +
>> +	*sptep = spte | PT_WRITABLE_MASK;
>> +	mark_page_dirty(vcpu->kvm, gfn);
>> +
>> +exit:
>> +	spin_unlock(&vcpu->kvm->mmu_lock);
> 
> There was a misunderstanding. 


Sorry, Marcelo! It is still not clear for me now. :(

> The suggestion is to change codepaths that
> today assume that a side effect of holding mmu_lock is that sptes are
> not updated or read before attempting to introduce any lockless spte
> updates.
> 
> example)
> 
>         u64 mask, old_spte = *sptep;
> 
>         if (!spte_has_volatile_bits(old_spte) || (new_spte & mask) == mask)
>                 __update_clear_spte_fast(sptep, new_spte);
>         else
>                 old_spte = __update_clear_spte_slow(sptep, new_spte);
> 
> The local old_spte copy might be stale by the
> time "spte_has_volatile_bits(old_spte)" reads the writable bit.
> 
> example)
> 
> 
> VCPU0                                       VCPU1
> set_spte
> mmu_spte_update decides fast write 
> mov newspte to sptep
> (non-locked write instruction)
> newspte in store buffer
> 
>                                             lockless spte read path reads stale value
> 
> spin_unlock(mmu_lock) 
> 
> Depending on the what stale value CPU1 read and what decision it took
> this can be a problem, say the new bits (and we do not want to verify
> every single case). The spte write from inside mmu_lock should always be
> atomic now?
> 
> So these details must be exposed to the reader, they are hidden now
> (note mmu_spte_update is already a maze, its getting too deep).
> 


Actually, in this patch, all the spte update is under mmu-lock, and we
lockless-ly read spte , but the spte will be verified again after holding
mmu-lock.

+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/* The spte has been changed. */
+	if (*sptep != spte)
+		goto exit;
+
+	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+
+	*sptep = spte | PT_WRITABLE_MASK;
+	mark_page_dirty(vcpu->kvm, gfn);
+
+exit:
+	spin_unlock(&vcpu->kvm->mmu_lock);

Is not the same as both read/update spte under mmu-lock?

Hmm, this is what you want?

+static bool
+fast_pf_fix_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+		  u64 *sptep, u64 spte)
+{
+	gfn_t gfn;
+
+	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+	mmu_spte_update(sptep, spte | PT_WRITABLE_MASK);
+	mark_page_dirty(vcpu->kvm, gfn);
+
+	return true;
+}
+
+/*
+ * Return value:
+ * - true: let the vcpu to access on the same address again.
+ * - false: let the real page fault path to fix it.
+ */
+static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
+			    int level, u32 error_code)
+{
+	struct kvm_shadow_walk_iterator iterator;
+	struct kvm_mmu_page *sp;
+	bool ret = false;
+	u64 spte = 0ull;
+
+	if (!page_fault_can_be_fast(vcpu, gfn, error_code))
+		return false;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	for_each_shadow_entry(vcpu, gva, iterator) {
+		spte = *iterator.sptep;
+
+		if (!is_shadow_present_pte(spte) || iterator.level < level)
+			break;
+	}
+
+	/*
+	 * If the mapping has been changed, let the vcpu fault on the
+	 * same address again.
+	 */
+	if (!is_rmap_spte(spte)) {
+		ret = true;
+		goto exit;
+	}
+
+	if (!is_last_spte(spte, level))
+		goto exit;
+
+	/*
+	 * Check if it is a spurious fault caused by TLB lazily flushed.
+	 *
+	 * Need not check the access of upper level table entries since
+	 * they are always ACC_ALL.
+	 */
+	 if (is_writable_pte(spte)) {
+		ret = true;
+		goto exit;
+	}
+
+	/*
+	 * Currently, to simplify the code, only the spte write-protected
+	 * by dirty-log can be fast fixed.
+	 */
+	if (!spte_wp_by_dirty_log(spte))
+		goto exit;
+
+	sp = page_header(__pa(iterator.sptep));
+
+	ret = fast_pf_fix_spte(vcpu, sp, iterator.sptep, spte);
+
+exit:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return ret;
+}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/