[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANN689F=LKGprNx9_Wb5HOvT-Fvv8WUR_T2DJPhy0u2HeT-A7g@mail.gmail.com>
Date: Tue, 16 Jun 2020 23:29:56 -0700
From: Michel Lespinasse <walken@...gle.com>
To: Stafford Horne <shorne@...il.com>
Cc: Atish Patra <atishp@...shpatra.org>,
Palmer Dabbelt <palmer@...belt.com>,
linux-riscv <linux-riscv@...ts.infradead.org>,
LKML <linux-kernel@...r.kernel.org>,
Bjorn Topel <bjorn.topel@...il.com>
Subject: Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V
On Tue, Jun 16, 2020 at 11:07 PM Stafford Horne <shorne@...il.com> wrote:
> On Wed, Jun 17, 2020 at 02:35:39PM +0900, Stafford Horne wrote:
> > On Tue, Jun 16, 2020 at 01:47:24PM -0700, Michel Lespinasse wrote:
> > > This makes me wonder actually - maybe there is a latent bug that got
> > > exposed after my change added the rwsem_is_locked assertion to the
> > > lockdep_assert_held one. If that is the case, it may be helpful to
> > > bisect when that issue first appeared, by testing before my patchset
> > > with VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_lock)) added to
> > > walk_page_range() / walk_page_range_novma() / walk_page_vma() ...
> >
> > Hello,
> >
> > I tried to bisect it, but I think this issue goes much further back.
> >
> > Just with the below patch booting fails all the way back to v5.7.
> >
> > What does this mean by they way, why would mmap_assert_locked() want to assert
> > that the rwsem_is_locked() is not true?
It's the opposite - VM_BUG_ON(cond) triggers if cond is true, so in
other words it asserts that cond is false. Yeah, I agree it is kinda
confusing. But in our case, it asserts that the rwsem is locked, which
is what we want.
> The openrisc code that was walking the page ranges was not locking mm. I have
> added the below patch to v5.8-rc1 and it seems to work fine. I will send a
> better patch in a bit.
>
> iff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
> index c152a68811dd..bd5f05dd9174 100644
> --- a/arch/openrisc/kernel/dma.c
> +++ b/arch/openrisc/kernel/dma.c
> @@ -74,8 +74,10 @@ void *arch_dma_set_uncached(void *cpu_addr, size_t size)
> * We need to iterate through the pages, clearing the dcache for
> * them and setting the cache-inhibit bit.
> */
> + mmap_read_lock(&init_mm);
> error = walk_page_range(&init_mm, va, va + size, &set_nocache_walk_ops,
> NULL);
> + mmap_read_unlock(&init_mm);
> if (error)
> return ERR_PTR(error);
> return cpu_addr;
> @@ -85,9 +87,11 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size)
> {
> unsigned long va = (unsigned long)cpu_addr;
>
> + mmap_read_lock(&init_mm);
> /* walk_page_range shouldn't be able to fail here */
> WARN_ON(walk_page_range(&init_mm, va, va + size,
> &clear_nocache_walk_ops, NULL));
> + mmap_read_unlock(&init_mm);
> }
Thanks a lot for getting to the bottom of this. I think this is the proper fix.
Powered by blists - more mailing lists