[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufa8+NZcctoYXOZQXNdvKTUxAMDF3sC=X+3mv4juMUyrLA@mail.gmail.com>
Date: Wed, 31 May 2023 15:12:39 -0600
From: Yu Zhao <yuzhao@...gle.com>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Alistair Popple <apopple@...dia.com>,
Anup Patel <anup@...infault.org>,
Ben Gardon <bgardon@...gle.com>,
Borislav Petkov <bp@...en8.de>,
Catalin Marinas <catalin.marinas@....com>,
Chao Peng <chao.p.peng@...ux.intel.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Fabiano Rosas <farosas@...ux.ibm.com>,
Gaosheng Cui <cuigaosheng1@...wei.com>,
Gavin Shan <gshan@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
James Morse <james.morse@....com>,
"Jason A. Donenfeld" <Jason@...c4.com>,
Jason Gunthorpe <jgg@...pe.ca>,
Jonathan Corbet <corbet@....net>,
Marc Zyngier <maz@...nel.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Michael Larabel <michael@...haellarabel.com>,
Mike Rapoport <rppt@...nel.org>,
Nicholas Piggin <npiggin@...il.com>,
Paul Mackerras <paulus@...abs.org>,
Peter Xu <peterx@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>,
Suzuki K Poulose <suzuki.poulose@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Thomas Huth <thuth@...hat.com>, Will Deacon <will@...nel.org>,
Zenghui Yu <yuzenghui@...wei.com>, kvmarm@...ts.linux.dev,
kvm@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, linuxppc-dev@...ts.ozlabs.org,
linux-trace-kernel@...r.kernel.org, x86@...nel.org,
linux-mm@...gle.com
Subject: Re: [PATCH mm-unstable v2 05/10] kvm/arm64: add kvm_arch_test_clear_young()
On Wed, May 31, 2023 at 1:56 PM Oliver Upton <oliver.upton@...ux.dev> wrote:
>
> Hi Yu,
>
> On Fri, May 26, 2023 at 05:44:30PM -0600, Yu Zhao wrote:
> > Implement kvm_arch_test_clear_young() to support the fast path in
> > mmu_notifier_ops->test_clear_young().
> >
> > It focuses on a simple case, i.e., hardware sets the accessed bit in
> > KVM PTEs and VMs are not protected, where it can rely on RCU and
> > cmpxchg to safely clear the accessed bit without taking
> > kvm->mmu_lock. Complex cases fall back to the existing slow path
> > where kvm->mmu_lock is then taken.
> >
> > Signed-off-by: Yu Zhao <yuzhao@...gle.com>
> > ---
> > arch/arm64/include/asm/kvm_host.h | 6 ++++++
> > arch/arm64/kvm/mmu.c | 36 +++++++++++++++++++++++++++++++
> > 2 files changed, 42 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7e7e19ef6993..da32b0890716 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1113,4 +1113,10 @@ static inline void kvm_hyp_reserve(void) { }
> > void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
> > bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
> >
> > +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young
> > +static inline bool kvm_arch_has_test_clear_young(void)
> > +{
> > + return cpu_has_hw_af() && !is_protected_kvm_enabled();
> > +}
>
> I would *strongly* suggest you consider supporting test_clear_young on
> systems that do software Access Flag management. FEAT_HAFDBS is an
> *optional* extension to the architecture, so we're going to support
> software AF management for a very long time in KVM. It is also a valid
> fallback option in the case of hardware errata which render HAFDBS
> broken.
Hi Oliver,
It's not about willingness but resources. Ideally we want to make
everything perfect, but in reality, we can only move forward one step
a time.
If I looked at your request from ARM's POV, I would agree with you.
But my goal is to lay the foundation for all architectures that could
benefit, so I may not be able to cover a lot for each architecture.
Specifically, I don't have the bandwidth to test the !FEAT_HAFDBS case
for ARM.
So here are some options I could offer, ordered by my preferences:
1. We proceed as it is for now. I *will* find someone from my team (or
yours) to follow up -- this way we can make sure !FEAT_HAFDBS is well
tested.
2. I drop the cpu_has_hw_af() check above. Not that I think there is
much risk, I'm just trying to be cautious.
3. I drop the entire ARM support (and include the RISC-V support which
I previously deprioritized). We revisit after the test is done.
Sounds reasonable?
> So, we should expect (and support) systems of all shapes and sizes that
> do software AF. I'm sure we'll hear about more in the not-too-distant
> future...
>
> For future reference (even though I'm suggesting you support software
> AF), decisions such of these need an extremely verbose comment
> describing the rationale behind the decision.
>
> > +
> > #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c3b3e2afe26f..26a8d955b49c 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
>
> Please do not implement page table walkers outside of hyp/pgtable.c
>
> > @@ -1678,6 +1678,42 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> > range->start << PAGE_SHIFT);
> > }
> >
> > +static int stage2_test_clear_young(const struct kvm_pgtable_visit_ctx *ctx,
> > + enum kvm_pgtable_walk_flags flags)
> > +{
> > + kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
> > +
> > + VM_WARN_ON_ONCE(!page_count(virt_to_page(ctx->ptep)));
>
> This sort of sanity checking is a bit excessive. Isn't there a risk of
> false negatives here too? IOW, if we tragically mess up RCU in the page
> table code, what's stopping a prematurely freed page from being
> allocated to another user?
Yes, but from my aforementioned POV (the breadth I'm focusing on),
this is a good practice. I can live without this assertion if you feel
strongly about it.
> > + if (!kvm_pte_valid(new))
> > + return 0;
> > +
> > + if (new == ctx->old)
> > + return 0;
> > +
> > + if (kvm_should_clear_young(ctx->arg, ctx->addr / PAGE_SIZE))
> > + stage2_try_set_pte(ctx, new);
> > +
> > + return 0;
> > +}
> > +
> > +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range)
> > +{
> > + u64 start = range->start * PAGE_SIZE;
> > + u64 end = range->end * PAGE_SIZE;
> > + struct kvm_pgtable_walker walker = {
> > + .cb = stage2_test_clear_young,
> > + .arg = range,
> > + .flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_SHARED,
> > + };
> > +
> > + BUILD_BUG_ON(is_hyp_code());
>
> Delete this assertion.
Will do.
Powered by blists - more mailing lists