linux-kernel - Re: [PATCH 03/12] KVM: MMU: lazily drop large spte

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 2 Aug 2013 11:55:24 -0300
From:	Marcelo Tosatti <mtosatti@...hat.com>
To:	Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>
Cc:	gleb@...hat.com, avi.kivity@...il.com, pbonzini@...hat.com,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH 03/12] KVM: MMU: lazily drop large spte

On Tue, Jul 30, 2013 at 09:02:01PM +0800, Xiao Guangrong wrote:
> Currently, kvm zaps the large spte if write-protected is needed, the later
> read can fault on that spte. Actually, we can make the large spte readonly
> instead of making them un-present, the page fault caused by read access can
> be avoided
> 
> The idea is from Avi:
> | As I mentioned before, write-protecting a large spte is a good idea,
> | since it moves some work from protect-time to fault-time, so it reduces
> | jitter.  This removes the need for the return value.
> 
> [
>   It has fixed the issue reported in 6b73a9606 by stopping fast page fault
>   marking the large spte to writable
> ]

Xiao,

Can you please write a comment explaining why are the problems 
with shadow vs large read-only sptes (can't recall anymore),
and then why it is now safe to do it.

Comments below.

> Signed-off-by: Xiao Guangrong <xiaoguangrong@...ux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c | 36 +++++++++++++++++-------------------
>  1 file changed, 17 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index cf163ca..35d4b50 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1181,8 +1181,7 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
>  
>  /*
>   * Write-protect on the specified @sptep, @pt_protect indicates whether
> - * spte writ-protection is caused by protecting shadow page table.
> - * @flush indicates whether tlb need be flushed.
> + * spte write-protection is caused by protecting shadow page table.
>   *
>   * Note: write protection is difference between drity logging and spte
>   * protection:
> @@ -1191,10 +1190,9 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
>   * - for spte protection, the spte can be writable only after unsync-ing
>   *   shadow page.
>   *
> - * Return true if the spte is dropped.
> + * Return true if tlb need be flushed.
>   */
> -static bool
> -spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect)
> +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect)
>  {
>  	u64 spte = *sptep;
>  
> @@ -1204,17 +1202,11 @@ spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect)
>  
>  	rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);
>  
> -	if (__drop_large_spte(kvm, sptep)) {
> -		*flush |= true;
> -		return true;
> -	}
> -
>  	if (pt_protect)
>  		spte &= ~SPTE_MMU_WRITEABLE;
>  	spte = spte & ~PT_WRITABLE_MASK;
>  
> -	*flush |= mmu_spte_update(sptep, spte);
> -	return false;
> +	return mmu_spte_update(sptep, spte);
>  }
>  
>  static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
> @@ -1226,11 +1218,8 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
>  
>  	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
>  		BUG_ON(!(*sptep & PT_PRESENT_MASK));
> -		if (spte_write_protect(kvm, sptep, &flush, pt_protect)) {
> -			sptep = rmap_get_first(*rmapp, &iter);
> -			continue;
> -		}
>  
> +		flush |= spte_write_protect(kvm, sptep, pt_protect);
>  		sptep = rmap_get_next(&iter);
>  	}
>  
> @@ -2701,6 +2690,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
>  			break;
>  		}
>  
> +		drop_large_spte(vcpu, iterator.sptep);
> +
>  		if (!is_shadow_present_pte(*iterator.sptep)) {
>  			u64 base_addr = iterator.addr;
>  
> @@ -2855,7 +2846,7 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>   * - false: let the real page fault path to fix it.
>   */
>  static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
> -			    u32 error_code)
> +			    u32 error_code, bool force_pt_level)
>  {
>  	struct kvm_shadow_walk_iterator iterator;
>  	struct kvm_mmu_page *sp;
> @@ -2884,6 +2875,13 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
>  		goto exit;
>  
>  	/*
> +	 * Can not map the large spte to writable if the page is dirty
> +	 * logged.
> +	 */
> +	if (sp->role.level > PT_PAGE_TABLE_LEVEL && force_pt_level)
> +		goto exit;
> +

It is not safe to derive slot->dirty_bitmap like this: 
since dirty log is enabled via RCU update, "is dirty bitmap enabled"
info could be stale by the time you check it here via the parameter,
so you can instantiate a large spte (because force_pt_level == false),
while you should not.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/