[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e288ef193f743782df48667b6b03122bd025119f.camel@kernel.org>
Date: Tue, 09 Jun 2020 11:17:53 -0500
From: Tom Zanussi <zanussi@...nel.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Ramon Fried <rfried.dev@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
linux-rt-users <linux-rt-users@...r.kernel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Carsten Emde <C.Emde@...dl.org>,
John Kacur <jkacur@...hat.com>, Daniel Wagner <wagi@...om.org>,
Clark Williams <williams@...hat.com>,
Zhang Xiao <xiao.zhang@...driver.com>
Subject: Re: [PATCH RT 1/2] tasklet: Address a race resulting in
double-enqueue
Hi Sebastian,
On Tue, 2020-06-09 at 17:47 +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-04 15:51:14 [-0500], Tom Zanussi wrote:
> > >
> > > Hi, This patch introduced a regression in our kernel
> > > (v4.19.124-rt53-rebase), It occurs when we're jumping to crush
> > > kernel
> > > using kexec, in the initialization of the emmc driver.
> > > I'm still debugging the root cause, but I thought of mentioning
> > > this
> > > in the mailing list if you have any idea why this could occur.
> > > The issue doesn't happen on normal boot, only when I specifically
> > > crash the kernel into the crash kernel.
> > > Thanks,
> > > Ramon.
> >
> > I'm not very familiar with crashing the kernel into the crash
> > kernel.
> > Can you explain in enough detail how to set things up to reproduce
> > this
> > and how to trigger it? Does it happen every time?
> >
> > > From looking at the backtrace, it's hitting the WARN_ON() in the
> >
> > cmpxchg() loop below, because TASKLET_STATE is just
> > TASKLET_STATE_CHAINED.
> >
> > It seems that the only way to turn off TASKLET_STATE_CHAINED is via
> > this cmpxchg(), but TASKLET_STATE_RUN can be independently turned
> > off
> > elsewhere (tasklet_unlock() and tasklet_tryunlock()), so if that
> > happens and this loop is hit, you could loop until loops runs out
> > and
> > hit this warning.
>
> But clearing TASKLET_STATE_RUN independently happens by the task,
> that
> set it / part of tasklet_schedule().
> tasklet_tryunlock() does a cmpxchg() with only the RUN bit so it
> won't
> work if the additional CHAINED bit is set.
>
> The tasklet itself (which may run on another CPU) sets the RUN bit at
> the
> begin and clears it at the end via cmpxchg() together with the
> CHAINED
> bit.
>
> I've been staring at it for sometime and I don't see how this can
> happen.
>
I did find a problem with the patch when configured as !SMP since in
that case the RUN flag is never set (will send a patch for that
shortly), but that wouldn't be the case here.
It would help to be able to reproduce it, but I haven't been able to
yet.
Tom
> Sebastian
Powered by blists - more mailing lists