linux-kernel - Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e288ef193f743782df48667b6b03122bd025119f.camel@kernel.org>
Date:   Tue, 09 Jun 2020 11:17:53 -0500
From:   Tom Zanussi <zanussi@...nel.org>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Ramon Fried <rfried.dev@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-rt-users <linux-rt-users@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Carsten Emde <C.Emde@...dl.org>,
        John Kacur <jkacur@...hat.com>, Daniel Wagner <wagi@...om.org>,
        Clark Williams <williams@...hat.com>,
        Zhang Xiao <xiao.zhang@...driver.com>
Subject: Re: [PATCH RT 1/2] tasklet: Address a race resulting in
 double-enqueue

Hi Sebastian,

On Tue, 2020-06-09 at 17:47 +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-04 15:51:14 [-0500], Tom Zanussi wrote:
> > > 
> > > Hi, This patch introduced a regression in our kernel
> > > (v4.19.124-rt53-rebase), It occurs when we're jumping to crush
> > > kernel
> > > using kexec, in the initialization of the emmc driver.
> > > I'm still debugging the root cause, but I thought of mentioning
> > > this
> > > in the mailing list if you have any idea why this could occur.
> > > The issue doesn't happen on normal boot, only when I specifically
> > > crash the kernel into the crash kernel.
> > > Thanks,
> > > Ramon.
> > 
> > I'm not very familiar with crashing the kernel into the crash
> > kernel. 
> > Can you explain in enough detail how to set things up to reproduce
> > this
> > and how to trigger it?  Does it happen every time? 
> > 
> > > From looking at the backtrace, it's hitting the WARN_ON() in the
> > 
> > cmpxchg() loop below, because TASKLET_STATE is just
> > TASKLET_STATE_CHAINED.
> > 
> > It seems that the only way to turn off TASKLET_STATE_CHAINED is via
> > this cmpxchg(), but TASKLET_STATE_RUN can be independently turned
> > off
> > elsewhere (tasklet_unlock() and tasklet_tryunlock()), so if that
> > happens and this loop is hit, you could loop until loops runs out
> > and
> > hit this warning.
> 
> But clearing TASKLET_STATE_RUN independently happens by the task,
> that
> set it / part of tasklet_schedule().
> tasklet_tryunlock() does a cmpxchg() with only the RUN bit so it
> won't
> work if the additional CHAINED bit is set.
> 
> The tasklet itself (which may run on another CPU) sets the RUN bit at
> the
> begin and clears it at the end via cmpxchg() together with the
> CHAINED
> bit. 
> 
> I've been staring at it for sometime and I don't see how this can
> happen.
> 

I did find a problem with the patch when configured as !SMP since in
that case the RUN flag is never set (will send a patch for that
shortly), but that wouldn't be the case here.

It would help to be able to reproduce it, but I haven't been able to
yet.

Tom

> Sebastian