linux-kernel - Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200609154741.5kesuvl7txz4s3yu@linutronix.de>
Date:   Tue, 9 Jun 2020 17:47:41 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Tom Zanussi <zanussi@...nel.org>
Cc:     Ramon Fried <rfried.dev@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-rt-users <linux-rt-users@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Carsten Emde <C.Emde@...dl.org>,
        John Kacur <jkacur@...hat.com>, Daniel Wagner <wagi@...om.org>,
        Clark Williams <williams@...hat.com>,
        Zhang Xiao <xiao.zhang@...driver.com>
Subject: Re: [PATCH RT 1/2] tasklet: Address a race resulting in
 double-enqueue

On 2020-06-04 15:51:14 [-0500], Tom Zanussi wrote:
> > 
> > Hi, This patch introduced a regression in our kernel
> > (v4.19.124-rt53-rebase), It occurs when we're jumping to crush kernel
> > using kexec, in the initialization of the emmc driver.
> > I'm still debugging the root cause, but I thought of mentioning this
> > in the mailing list if you have any idea why this could occur.
> > The issue doesn't happen on normal boot, only when I specifically
> > crash the kernel into the crash kernel.
> > Thanks,
> > Ramon.
> 
> I'm not very familiar with crashing the kernel into the crash kernel. 
> Can you explain in enough detail how to set things up to reproduce this
> and how to trigger it?  Does it happen every time? 
> 
> >From looking at the backtrace, it's hitting the WARN_ON() in the
> cmpxchg() loop below, because TASKLET_STATE is just
> TASKLET_STATE_CHAINED.
> 
> It seems that the only way to turn off TASKLET_STATE_CHAINED is via
> this cmpxchg(), but TASKLET_STATE_RUN can be independently turned off
> elsewhere (tasklet_unlock() and tasklet_tryunlock()), so if that
> happens and this loop is hit, you could loop until loops runs out and
> hit this warning.

But clearing TASKLET_STATE_RUN independently happens by the task, that
set it / part of tasklet_schedule().
tasklet_tryunlock() does a cmpxchg() with only the RUN bit so it won't
work if the additional CHAINED bit is set.

The tasklet itself (which may run on another CPU) sets the RUN bit at the
begin and clears it at the end via cmpxchg() together with the CHAINED
bit. 

I've been staring at it for sometime and I don't see how this can
happen.

Sebastian