[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201124140310.GA811510@elver.google.com>
Date: Tue, 24 Nov 2020 15:03:10 +0100
From: Marco Elver <elver@...gle.com>
To: Mark Rutland <mark.rutland@....com>
Cc: Will Deacon <will@...nel.org>,
"Paul E. McKenney" <paulmck@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Anders Roxell <anders.roxell@...aro.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Alexander Potapenko <glider@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Jann Horn <jannh@...gle.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>,
kasan-dev <kasan-dev@...glegroups.com>, rcu@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Tejun Heo <tj@...nel.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
linux-arm-kernel@...ts.infradead.org, boqun.feng@...il.com,
tglx@...utronix.de
Subject: Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH]
kfence: Avoid stalling...)
On Mon, Nov 23, 2020 at 07:32PM +0000, Mark Rutland wrote:
> On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> > On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > > looking at that and I think he is close to having something workable.
> > > >
> > > > Mark -- is there anything Marco and Paul can try out?
> > >
> > > I initially traced some issues back to commit:
> > >
> > > 044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> > >
> > > ... and that change of semantic could cause us to miss edges in some
> > > cases, but IIUC mostly where we haven't done the right thing in
> > > exception entry/return.
> > >
> > > I don't think my patches address this case yet, but my WIP (currently
> > > just fixing user<->kernel transitions) is at:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> > >
> > > I'm looking into the kernel<->kernel transitions now, and I know that we
> > > mess up RCU management for a small window around arch_cpu_idle, but it's
> > > not immediately clear to me if either of those cases could cause this
> > > report.
> >
> > Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> > problem (still get warnings and then a panic). :-/
>
> I've just updated that branch with a new version which I hope covers
> kernel<->kernel transitions too. If you get a chance, would you mind
> giving that a spin?
>
> The HEAD commit should be:
>
> a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")
Thank you! Your series appears to work and fixes the stalls and
deadlocks (3 trials)! I noticed there are a bunch of warnings in the log
that might be relevant (see attached).
Note, I also reverted
sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
and that still works.
Thanks,
-- Marco
> Otherwise, I intend to clean that up and post it tomorrow (without the
> additional debug hacks). I've thrown my local Syzkaller instance at it
> in the mean time (and if I get the chance tomrrow I'll try to get
> rcutorture setup), and the only report I'm seeing so far looks genuine:
>
> | BUG: sleeping function called from invalid context in sta_info_move_state
>
> ... as that was reported on x86 too, per:
>
> https://syzkaller.appspot.com/bug?id=6c7899acf008be2ddcddb46a2567c2153193632a
>
> Thanks,
> Mark.
View attachment "vm.log" of type "text/plain" (29421 bytes)
Powered by blists - more mailing lists