lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufa8+NZcctoYXOZQXNdvKTUxAMDF3sC=X+3mv4juMUyrLA@mail.gmail.com>
Date:   Wed, 31 May 2023 15:12:39 -0600
From:   Yu Zhao <yuzhao@...gle.com>
To:     Oliver Upton <oliver.upton@...ux.dev>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Alistair Popple <apopple@...dia.com>,
        Anup Patel <anup@...infault.org>,
        Ben Gardon <bgardon@...gle.com>,
        Borislav Petkov <bp@...en8.de>,
        Catalin Marinas <catalin.marinas@....com>,
        Chao Peng <chao.p.peng@...ux.intel.com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Fabiano Rosas <farosas@...ux.ibm.com>,
        Gaosheng Cui <cuigaosheng1@...wei.com>,
        Gavin Shan <gshan@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        James Morse <james.morse@....com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Jonathan Corbet <corbet@....net>,
        Marc Zyngier <maz@...nel.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Michael Larabel <michael@...haellarabel.com>,
        Mike Rapoport <rppt@...nel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Paul Mackerras <paulus@...abs.org>,
        Peter Xu <peterx@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Thomas Huth <thuth@...hat.com>, Will Deacon <will@...nel.org>,
        Zenghui Yu <yuzenghui@...wei.com>, kvmarm@...ts.linux.dev,
        kvm@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, linuxppc-dev@...ts.ozlabs.org,
        linux-trace-kernel@...r.kernel.org, x86@...nel.org,
        linux-mm@...gle.com
Subject: Re: [PATCH mm-unstable v2 05/10] kvm/arm64: add kvm_arch_test_clear_young()

On Wed, May 31, 2023 at 1:56 PM Oliver Upton <oliver.upton@...ux.dev> wrote:
>
> Hi Yu,
>
> On Fri, May 26, 2023 at 05:44:30PM -0600, Yu Zhao wrote:
> > Implement kvm_arch_test_clear_young() to support the fast path in
> > mmu_notifier_ops->test_clear_young().
> >
> > It focuses on a simple case, i.e., hardware sets the accessed bit in
> > KVM PTEs and VMs are not protected, where it can rely on RCU and
> > cmpxchg to safely clear the accessed bit without taking
> > kvm->mmu_lock. Complex cases fall back to the existing slow path
> > where kvm->mmu_lock is then taken.
> >
> > Signed-off-by: Yu Zhao <yuzhao@...gle.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  6 ++++++
> >  arch/arm64/kvm/mmu.c              | 36 +++++++++++++++++++++++++++++++
> >  2 files changed, 42 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7e7e19ef6993..da32b0890716 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1113,4 +1113,10 @@ static inline void kvm_hyp_reserve(void) { }
> >  void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
> >  bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
> >
> > +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young
> > +static inline bool kvm_arch_has_test_clear_young(void)
> > +{
> > +     return cpu_has_hw_af() && !is_protected_kvm_enabled();
> > +}
>
> I would *strongly* suggest you consider supporting test_clear_young on
> systems that do software Access Flag management. FEAT_HAFDBS is an
> *optional* extension to the architecture, so we're going to support
> software AF management for a very long time in KVM. It is also a valid
> fallback option in the case of hardware errata which render HAFDBS
> broken.

Hi Oliver,

It's not about willingness but resources. Ideally we want to make
everything perfect, but in reality, we can only move forward one step
a time.

If I looked at your request from ARM's POV, I would agree with you.
But my goal is to lay the foundation for all architectures that could
benefit, so I may not be able to cover a lot for each architecture.
Specifically, I don't have the bandwidth to test the !FEAT_HAFDBS case
for ARM.

So here are some options I could offer, ordered by my preferences:
1. We proceed as it is for now. I *will* find someone from my team (or
yours) to follow up -- this way we can make sure !FEAT_HAFDBS is well
tested.
2. I drop the cpu_has_hw_af() check above. Not that I think there is
much risk, I'm just trying to be cautious.
3. I drop the entire ARM support (and include the RISC-V support which
I previously deprioritized). We revisit after the test is done.

Sounds reasonable?

> So, we should expect (and support) systems of all shapes and sizes that
> do software AF. I'm sure we'll hear about more in the not-too-distant
> future...
>
> For future reference (even though I'm suggesting you support software
> AF), decisions such of these need an extremely verbose comment
> describing the rationale behind the decision.
>
> > +
> >  #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c3b3e2afe26f..26a8d955b49c 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
>
> Please do not implement page table walkers outside of hyp/pgtable.c
>
> > @@ -1678,6 +1678,42 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> >                                          range->start << PAGE_SHIFT);
> >  }
> >
> > +static int stage2_test_clear_young(const struct kvm_pgtable_visit_ctx *ctx,
> > +                                enum kvm_pgtable_walk_flags flags)
> > +{
> > +     kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
> > +
> > +     VM_WARN_ON_ONCE(!page_count(virt_to_page(ctx->ptep)));
>
> This sort of sanity checking is a bit excessive. Isn't there a risk of
> false negatives here too? IOW, if we tragically mess up RCU in the page
> table code, what's stopping a prematurely freed page from being
> allocated to another user?

Yes, but from my aforementioned POV (the breadth I'm focusing on),
this is a good practice. I can live without this assertion if you feel
strongly about it.

> > +     if (!kvm_pte_valid(new))
> > +             return 0;
> > +
> > +     if (new == ctx->old)
> > +             return 0;
> > +
> > +     if (kvm_should_clear_young(ctx->arg, ctx->addr / PAGE_SIZE))
> > +             stage2_try_set_pte(ctx, new);
> > +
> > +     return 0;
> > +}
> > +
> > +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range)
> > +{
> > +     u64 start = range->start * PAGE_SIZE;
> > +     u64 end = range->end * PAGE_SIZE;
> > +     struct kvm_pgtable_walker walker = {
> > +             .cb     = stage2_test_clear_young,
> > +             .arg    = range,
> > +             .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_SHARED,
> > +     };
> > +
> > +     BUILD_BUG_ON(is_hyp_code());
>
> Delete this assertion.

Will do.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ