lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200609154741.5kesuvl7txz4s3yu@linutronix.de>
Date:   Tue, 9 Jun 2020 17:47:41 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Tom Zanussi <zanussi@...nel.org>
Cc:     Ramon Fried <rfried.dev@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-rt-users <linux-rt-users@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Carsten Emde <C.Emde@...dl.org>,
        John Kacur <jkacur@...hat.com>, Daniel Wagner <wagi@...om.org>,
        Clark Williams <williams@...hat.com>,
        Zhang Xiao <xiao.zhang@...driver.com>
Subject: Re: [PATCH RT 1/2] tasklet: Address a race resulting in
 double-enqueue

On 2020-06-04 15:51:14 [-0500], Tom Zanussi wrote:
> > 
> > Hi, This patch introduced a regression in our kernel
> > (v4.19.124-rt53-rebase), It occurs when we're jumping to crush kernel
> > using kexec, in the initialization of the emmc driver.
> > I'm still debugging the root cause, but I thought of mentioning this
> > in the mailing list if you have any idea why this could occur.
> > The issue doesn't happen on normal boot, only when I specifically
> > crash the kernel into the crash kernel.
> > Thanks,
> > Ramon.
> 
> I'm not very familiar with crashing the kernel into the crash kernel. 
> Can you explain in enough detail how to set things up to reproduce this
> and how to trigger it?  Does it happen every time? 
> 
> >From looking at the backtrace, it's hitting the WARN_ON() in the
> cmpxchg() loop below, because TASKLET_STATE is just
> TASKLET_STATE_CHAINED.
> 
> It seems that the only way to turn off TASKLET_STATE_CHAINED is via
> this cmpxchg(), but TASKLET_STATE_RUN can be independently turned off
> elsewhere (tasklet_unlock() and tasklet_tryunlock()), so if that
> happens and this loop is hit, you could loop until loops runs out and
> hit this warning.

But clearing TASKLET_STATE_RUN independently happens by the task, that
set it / part of tasklet_schedule().
tasklet_tryunlock() does a cmpxchg() with only the RUN bit so it won't
work if the additional CHAINED bit is set.

The tasklet itself (which may run on another CPU) sets the RUN bit at the
begin and clears it at the end via cmpxchg() together with the CHAINED
bit. 

I've been staring at it for sometime and I don't see how this can
happen.

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ