lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 15 Jun 2023 11:26:51 -0700
From:   Sean Christopherson <seanjc@...gle.com>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Alistair Popple <apopple@...dia.com>,
        Anup Patel <anup@...infault.org>,
        Ben Gardon <bgardon@...gle.com>,
        Borislav Petkov <bp@...en8.de>,
        Catalin Marinas <catalin.marinas@....com>,
        Chao Peng <chao.p.peng@...ux.intel.com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Fabiano Rosas <farosas@...ux.ibm.com>,
        Gaosheng Cui <cuigaosheng1@...wei.com>,
        Gavin Shan <gshan@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        James Morse <james.morse@....com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Jonathan Corbet <corbet@....net>,
        Marc Zyngier <maz@...nel.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Michael Larabel <michael@...haellarabel.com>,
        Mike Rapoport <rppt@...nel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Oliver Upton <oliver.upton@...ux.dev>,
        Paul Mackerras <paulus@...abs.org>,
        Peter Xu <peterx@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Thomas Huth <thuth@...hat.com>, Will Deacon <will@...nel.org>,
        Zenghui Yu <yuzenghui@...wei.com>, kvmarm@...ts.linux.dev,
        kvm@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, linuxppc-dev@...ts.ozlabs.org,
        linux-trace-kernel@...r.kernel.org, x86@...nel.org,
        linux-mm@...gle.com
Subject: Re: [PATCH mm-unstable v2 09/10] kvm/x86: add kvm_arch_test_clear_young()

On Fri, May 26, 2023, Yu Zhao wrote:
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 08340219c35a..6875a819e007 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1232,6 +1232,40 @@ bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  	return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn);
>  }
>  
> +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range)
> +{
> +	struct kvm_mmu_page *root;
> +	int offset = ffs(shadow_accessed_mask) - 1;
> +
> +	if (kvm_shadow_root_allocated(kvm))

This needs a comment.

> +		return true;
> +
> +	rcu_read_lock();
> +
> +	list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) {

As requested in v1[1], please add a macro for a lockless walk.

[1] https://lkml.kernel.org/r/Y%2Fed0XYAPx%2B7pukA%40google.com

> +		struct tdp_iter iter;
> +
> +		if (kvm_mmu_page_as_id(root) != range->slot->as_id)
> +			continue;
> +
> +		tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) {
> +			u64 *sptep = rcu_dereference(iter.sptep);
> +
> +			VM_WARN_ON_ONCE(!page_count(virt_to_page(sptep)));

Hrm, I don't like adding this in KVM.  The primary MMU might guarantee that this
callback is invoked if and only if the SPTE is backed by struct page memory, but
there's no reason to assume that's true in KVM.  If we want the sanity check, then
this needs to use kvm_pfn_to_refcounted_page().

And it should use KVM's MMU_WARN_ON(), which is a mess and effectively dead code,
but I'm working on changing that[*], i.e. by the time this gets to Linus' tree,
the sanity check should have a much cleaner implementation.

[2] https://lore.kernel.org/all/20230511235917.639770-8-seanjc@google.com

> +
> +			if (!(iter.old_spte & shadow_accessed_mask))
> +				continue;
> +
> +			if (kvm_should_clear_young(range, iter.gfn))
> +				clear_bit(offset, (unsigned long *)sptep);

If/when you rebase on https://github.com/kvm-x86/linux/tree/next, can you pull
out the atomic bits of tdp_mmu_clear_spte_bits() and use that new helper? E.g.

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index fae559559a80..914c34518829 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -58,15 +58,18 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
        return old_spte;
 }
 
+static inline u64 tdp_mmu_clear_spte_bits_atomic(tdp_ptep_t sptep, u64 mask)
+{
+       atomic64_t *sptep_atomic = (atomic64_t *)rcu_dereference(sptep);
+
+       return (u64)atomic64_fetch_and(~mask, sptep_atomic);
+}
+
 static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte,
                                          u64 mask, int level)
 {
-       atomic64_t *sptep_atomic;
-
-       if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) {
-               sptep_atomic = (atomic64_t *)rcu_dereference(sptep);
-               return (u64)atomic64_fetch_and(~mask, sptep_atomic);
-       }
+       if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level))
+               return tdp_mmu_clear_spte_bits_atomic(sptep, mask);
 
        __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask);
        return old_spte;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ