lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJF2gTSB-4hDo8ncSLVKvbnOOEwLSrV716kQM9d9HrzXFs7D8A@mail.gmail.com>
Date:   Mon, 11 Dec 2023 16:41:28 +0800
From:   Guo Ren <guoren@...nel.org>
To:     Alexandre Ghiti <alexghiti@...osinc.com>
Cc:     paul.walmsley@...ive.com, palmer@...belt.com,
        akpm@...ux-foundation.org, catalin.marinas@....com,
        willy@...radead.org, david@...hat.com, muchun.song@...ux.dev,
        will@...nel.org, peterz@...radead.org, rppt@...nel.org,
        paulmck@...nel.org, atishp@...shpatra.org, anup@...infault.org,
        alex@...ti.fr, mike.kravetz@...cle.com, dfustini@...libre.com,
        wefu@...hat.com, jszhang@...nel.org, falcon@...ylab.org,
        linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Guo Ren <guoren@...ux.alibaba.com>
Subject: Re: [PATCH] riscv: pgtable: Enhance set_pte to prevent OoO risk

On Mon, Dec 11, 2023 at 1:52 PM Alexandre Ghiti <alexghiti@...osinc.com> wrote:
>
> Hi Guo,
>
> On Fri, Dec 8, 2023 at 4:10 PM <guoren@...nel.org> wrote:
> >
> > From: Guo Ren <guoren@...ux.alibaba.com>
> >
> > When changing from an invalid pte to a valid one for a kernel page,
> > there is no need for tlb_flush. It's okay for the TSO memory model, but
> > there is an OoO risk for the Weak one. eg:
> >
> > sd t0, (a0) // a0 = pte address, pteval is changed from invalid to valid
> > ...
> > ld t1, (a1) // a1 = va of above pte
> >
> > If the ld instruction is executed speculatively before the sd
> > instruction. Then it would bring an invalid entry into the TLB, and when
> > the ld instruction retired, a spurious page fault occurred. Because the
> > vmemmap has been ignored by vmalloc_fault, the spurious page fault would
> > cause kernel panic.
> >
> > This patch was inspired by the commit: 7f0b1bf04511 ("arm64: Fix barriers
> > used for page table modifications"). For RISC-V, there is no requirement
> > in the spec to guarantee all tlb entries are valid and no requirement to
> > PTW filter out invalid entries. Of course, micro-arch could give a more
> > robust design, but here, use a software fence to guarantee.
> >
> > Signed-off-by: Guo Ren <guoren@...ux.alibaba.com>
> > Signed-off-by: Guo Ren <guoren@...nel.org>
> > ---
> >  arch/riscv/include/asm/pgtable.h | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index 294044429e8e..2fae5a5438e0 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -511,6 +511,13 @@ static inline int pte_same(pte_t pte_a, pte_t pte_b)
> >  static inline void set_pte(pte_t *ptep, pte_t pteval)
> >  {
> >         *ptep = pteval;
> > +
> > +       /*
> > +        * Only if the new pte is present and kernel, otherwise TLB
> > +        * maintenance or update_mmu_cache() have the necessary barriers.
> > +        */
> > +       if (pte_val(pteval) & (_PAGE_PRESENT | _PAGE_GLOBAL))
> > +               RISCV_FENCE(rw,rw);
>
> Only a sfence.vma can guarantee that the PTW actually sees a new
> mapping, a fence is not enough. That being said, new kernel mappings
> (vmalloc ones) are correctly handled in the kernel by using
> flush_cache_vmap(). Did you observe something that this patch fixes?
Thx for the reply!

The sfence.vma is too expensive, so the situation is tricky. See the
arm64 commit: 7f0b1bf04511 ("arm64: Fix barriers used for page table
modifications"), which is similar. That is, linux assumes invalid pte
won't get into TLB. Think about memory hotplug:

mm/sparse.c: sparse_add_section() {
...
        memmap = section_activate(nid, start_pfn, nr_pages, altmap, pgmap);
        if (IS_ERR(memmap))
                return PTR_ERR(memmap);

        /*
         * Poison uninitialized struct pages in order to catch invalid flags
         * combinations.
         */
        page_init_poison(memmap, sizeof(struct page) * nr_pages);
...
}
The section_activate would use set_pte to setup vmemmap, and
page_init_poison would access these pages' struct.

That means:
sd t0, (a0) // a0 = struct page's pte address, pteval is changed from
invalid to valid
 ...
lw/sw t1, (a1) // a1 = va of struct page

If the lw/sw instruction is executed speculatively before the set_pte,
we need a fence to prevent this.

>
> Thanks,
>
> Alex
>
> >  }
> >
> >  void flush_icache_pte(pte_t pte);
> > --
> > 2.40.1
> >



-- 
Best Regards
 Guo Ren

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ