lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 20 Jun 2023 20:49:56 +1000
From:   "Nicholas Piggin" <npiggin@...il.com>
To:     "Yu Zhao" <yuzhao@...gle.com>
Cc:     "Andrew Morton" <akpm@...ux-foundation.org>,
        "Paolo Bonzini" <pbonzini@...hat.com>,
        "Alistair Popple" <apopple@...dia.com>,
        "Anup Patel" <anup@...infault.org>,
        "Ben Gardon" <bgardon@...gle.com>,
        "Borislav Petkov" <bp@...en8.de>,
        "Catalin Marinas" <catalin.marinas@....com>,
        "Chao Peng" <chao.p.peng@...ux.intel.com>,
        "Christophe Leroy" <christophe.leroy@...roup.eu>,
        "Dave Hansen" <dave.hansen@...ux.intel.com>,
        "Fabiano Rosas" <farosas@...ux.ibm.com>,
        "Gaosheng Cui" <cuigaosheng1@...wei.com>,
        "Gavin Shan" <gshan@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
        "Ingo Molnar" <mingo@...hat.com>,
        "James Morse" <james.morse@....com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        "Jason Gunthorpe" <jgg@...pe.ca>,
        "Jonathan Corbet" <corbet@....net>,
        "Marc Zyngier" <maz@...nel.org>,
        "Masami Hiramatsu" <mhiramat@...nel.org>,
        "Michael Ellerman" <mpe@...erman.id.au>,
        "Michael Larabel" <michael@...haellarabel.com>,
        "Mike Rapoport" <rppt@...nel.org>,
        "Oliver Upton" <oliver.upton@...ux.dev>,
        "Paul Mackerras" <paulus@...abs.org>,
        "Peter Xu" <peterx@...hat.com>,
        "Sean Christopherson" <seanjc@...gle.com>,
        "Steven Rostedt" <rostedt@...dmis.org>,
        "Suzuki K Poulose" <suzuki.poulose@....com>,
        "Thomas Gleixner" <tglx@...utronix.de>,
        "Thomas Huth" <thuth@...hat.com>, "Will Deacon" <will@...nel.org>,
        "Zenghui Yu" <yuzenghui@...wei.com>, <kvmarm@...ts.linux.dev>,
        <kvm@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
        <linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <linux-mm@...ck.org>, <linuxppc-dev@...ts.ozlabs.org>,
        <linux-trace-kernel@...r.kernel.org>, <x86@...nel.org>,
        <linux-mm@...gle.com>
Subject: Re: [PATCH mm-unstable v2 06/10] kvm/powerpc: make radix page
 tables RCU safe

On Tue Jun 20, 2023 at 6:00 PM AEST, Yu Zhao wrote:
> On Tue, Jun 20, 2023 at 12:33 AM Nicholas Piggin <npiggin@...il.com> wrote:
> >
> > On Sat May 27, 2023 at 9:44 AM AEST, Yu Zhao wrote:
> > > KVM page tables are currently not RCU safe against remapping, i.e.,
> > > kvmppc_unmap_free_pmd_entry_table() et al. The previous
> >
> > Minor nit but the "page table" is not RCU-safe against something. It
> > is RCU-freed, and therefore some algorithm that accesses it can have
> > the existence guarantee provided by RCU (usually there still needs
> > to be more to it).
> >
> > > mmu_notifier_ops members rely on kvm->mmu_lock to synchronize with
> > > that operation.
> > >
> > > However, the new mmu_notifier_ops member test_clear_young() provides
> > > a fast path that does not take kvm->mmu_lock. To implement
> > > kvm_arch_test_clear_young() for that path, orphan page tables need to
> > > be freed by RCU.
> >
> > Short version: clear the referenced bit using RCU instead of MMU lock
> > to protect against page table freeing, and there is no problem with
> > clearing the bit in a table that has been freed.
> >
> > Seems reasonable.
>
> Thanks. All above points taken.
>
> > > Unmapping, specifically kvm_unmap_radix(), does not free page tables,
> > > hence not a concern.
> >
> > Not sure if you really need to make the distinction about why the page
> > table is freed, we might free them via unmapping. The point is just
> > anything that frees them while there can be concurrent access, right?
>
> Correct.
>
> > > Signed-off-by: Yu Zhao <yuzhao@...gle.com>
> > > ---
> > >  arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > > index 461307b89c3a..3b65b3b11041 100644
> > > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > > @@ -1469,13 +1469,15 @@ int kvmppc_radix_init(void)
> > >  {
> > >       unsigned long size = sizeof(void *) << RADIX_PTE_INDEX_SIZE;
> > >
> > > -     kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor);
> > > +     kvm_pte_cache = kmem_cache_create("kvm-pte", size, size,
> > > +                                       SLAB_TYPESAFE_BY_RCU, pte_ctor);
> > >       if (!kvm_pte_cache)
> > >               return -ENOMEM;
> > >
> > >       size = sizeof(void *) << RADIX_PMD_INDEX_SIZE;
> > >
> > > -     kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, 0, pmd_ctor);
> > > +     kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size,
> > > +                                       SLAB_TYPESAFE_BY_RCU, pmd_ctor);
> > >       if (!kvm_pmd_cache) {
> > >               kmem_cache_destroy(kvm_pte_cache);
> > >               return -ENOMEM;
> >
> > KVM PPC HV radix PUD level page tables use the arch/powerpc allocators
> > (for some reason), which are not RCU freed. I think you need them too?
>
> We don't. The use of the arch/powerpc allocator for PUD tables seems
> appropriate to me because, unlike PMD/PTE tables, we never free PUD
> tables during the lifetime of a VM:

Ah you're right, the pud_free only comes from the double alloc case
so it's never visible to concurrent threads.

> * We don't free PUD/PMD/PTE tables when they become empty, i.e., not
> mapping any pages but still attached. (We could in theory, as
> x86/aarch64 do.)

We may try to do that at some point, but that's not related to your
patch for now so no worries.

Thanks,
Nick

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ