linux-kernel - Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANN689F=LKGprNx9_Wb5HOvT-Fvv8WUR_T2DJPhy0u2HeT-A7g@mail.gmail.com>
Date:   Tue, 16 Jun 2020 23:29:56 -0700
From:   Michel Lespinasse <walken@...gle.com>
To:     Stafford Horne <shorne@...il.com>
Cc:     Atish Patra <atishp@...shpatra.org>,
        Palmer Dabbelt <palmer@...belt.com>,
        linux-riscv <linux-riscv@...ts.infradead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Bjorn Topel <bjorn.topel@...il.com>
Subject: Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V

On Tue, Jun 16, 2020 at 11:07 PM Stafford Horne <shorne@...il.com> wrote:
> On Wed, Jun 17, 2020 at 02:35:39PM +0900, Stafford Horne wrote:
> > On Tue, Jun 16, 2020 at 01:47:24PM -0700, Michel Lespinasse wrote:
> > > This makes me wonder actually - maybe there is a latent bug that got
> > > exposed after my change added the rwsem_is_locked assertion to the
> > > lockdep_assert_held one. If that is the case, it may be helpful to
> > > bisect when that issue first appeared, by testing before my patchset
> > > with VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_lock)) added to
> > > walk_page_range() / walk_page_range_novma() / walk_page_vma() ...
> >
> > Hello,
> >
> > I tried to bisect it, but I think this issue goes much further back.
> >
> > Just with the below patch booting fails all the way back to v5.7.
> >
> > What does this mean by they way, why would mmap_assert_locked() want to assert
> > that the rwsem_is_locked() is not true?

It's the opposite - VM_BUG_ON(cond) triggers if cond is true, so in
other words it asserts that cond is false. Yeah, I agree it is kinda
confusing. But in our case, it asserts that the rwsem is locked, which
is what we want.

> The openrisc code that was walking the page ranges was not locking mm. I have
> added the  below patch to v5.8-rc1 and it seems to work fine.  I will send a
> better patch in a bit.
>
> iff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
> index c152a68811dd..bd5f05dd9174 100644
> --- a/arch/openrisc/kernel/dma.c
> +++ b/arch/openrisc/kernel/dma.c
> @@ -74,8 +74,10 @@ void *arch_dma_set_uncached(void *cpu_addr, size_t size)
>          * We need to iterate through the pages, clearing the dcache for
>          * them and setting the cache-inhibit bit.
>          */
> +       mmap_read_lock(&init_mm);
>         error = walk_page_range(&init_mm, va, va + size, &set_nocache_walk_ops,
>                         NULL);
> +       mmap_read_unlock(&init_mm);
>         if (error)
>                 return ERR_PTR(error);
>         return cpu_addr;
> @@ -85,9 +87,11 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size)
>  {
>         unsigned long va = (unsigned long)cpu_addr;
>
> +       mmap_read_lock(&init_mm);
>         /* walk_page_range shouldn't be able to fail here */
>         WARN_ON(walk_page_range(&init_mm, va, va + size,
>                         &clear_nocache_walk_ops, NULL));
> +       mmap_read_unlock(&init_mm);
>  }

Thanks a lot for getting to the bottom of this. I think this is the proper fix.