lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1354140755.3588.27.camel@thor>
Date:	Wed, 28 Nov 2012 17:12:35 -0500
From:	Peter Hurley <peter@...leysoftware.com>
To:	Andrew Morton <akpm@...ux-foundation.org>,
	Xiaotian Feng <dannyfeng@...cent.com>,
	Xiaotian Feng <xtfeng@...il.com>
Cc:	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, linux-next@...r.kernel.org
Subject: Re: [BUG -next-20121127] kernel BUG at kernel/softirq.c:471!

[cc'ing linux-next]

On Wed, 2012-11-28 at 13:00 -0500, Peter Hurley wrote:
> Hi all,
> 
> I couldn't find the v2 patch of this on linux-kernel but this commit
> 
>   4660e32 "tasklet: ignore disabled tasklet in tasklet_action()"
> 
> BUGS in -next-20121127.
> 
> -----------[cut here ]----------
> kernel BUG at /home/peter/src/kernels/next/kernel/softirq.c:471!
> invalid opcode: 0000 [#1] PREEMPT SMP
> ....
> 
> The registers/stack dump isn't useful so I didn't include it here.
> 
> I'm still trying to track down the execution sequence that causes this,
> but the high-level trigger is a firewire bus reset.
> 
> Hopefully I'll have more information soon.

>From the original v1 of this patch [where is v2?] ...

On Fri, 2012-11-02 at 10:48 +0800, Xiaotian Feng wrote:
> We met a ksoftirqd 100% issue, the perf top shows kernel is busy
> with tasklet_action(), but no actual action is shown. From dumped
> kernel, there's only one disabled tasklet on the tasklet_vec.
> 
> tasklet_action might be handled after tasklet is disabled, this will
> make disabled tasklet stayed on tasklet_vec. tasklet_action will not
> handle disabled tasklet, but place it on the tail of tasklet_vec,
> still raise softirq for this tasklet. Things will become worse if
> device driver uses tasklet_disable on its device remove/close code.
> The disabled tasklet will stay on the vec, frequently __raise_softirq_off()
> and make ksoftirqd wakeup even if no tasklets need to be handled.
> 
> This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ,
> in tasklet_action(), simply ignore the disabled tasklet and don't raise
> the softirq nr. In my previous patch, I remove tasklet_hi_enable() since
> it is the same as tasklet_enable(). So only tasklet_enable() needs to be
> modified, if tasklet state is changed from disable to enable, use
> __tasklet_schedule() to put it on the right vec.
> 
> Signed-off-by: Xiaotian Feng <dannyfeng@...cent.com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> ---
>  include/linux/interrupt.h |   12 ++++++++++--
>  kernel/softirq.c          |   10 +++++-----
>  2 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 5e4e617..7e5bb00 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -521,7 +521,8 @@ struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
>  enum
>  {
>  	TASKLET_STATE_SCHED,	/* Tasklet is scheduled for execution */
> -	TASKLET_STATE_RUN	/* Tasklet is running (SMP only) */
> +	TASKLET_STATE_RUN,	/* Tasklet is running (SMP only) */
> +	TASKLET_STATE_HI	/* Tasklet is HI_SOFTIRQ */
>  };
>  
>  #ifdef CONFIG_SMP
> @@ -593,7 +594,14 @@ static inline void tasklet_disable(struct tasklet_struct *t)
>  static inline void tasklet_enable(struct tasklet_struct *t)
>  {
>  	smp_mb__before_atomic_dec();
> -	atomic_dec(&t->count);
> +	if (atomic_dec_and_test(&t->count)) {
> +		if (!test_bit(TASKLET_STATE_SCHED, &t->state))
> +			return;
> +		if (test_bit(TASKLET_STATE_HI, &t->state))
> +			__tasklet_hi_schedule(t);
> +		else
> 

Since this isn't protected by locks, all of the conditions that __were__
met to arrive here (t->count == 0 && t->state & TASKLET_STATE_SCHED) may
no longer be true now because another cpu may have run tasklet_action(),
so now this tasklet will be scheduled when it should not be.

Plus __tasklet_schedule() can't just be called from any cpu that happens
to be calling tasklet_enable(). What you're doing here means that the
tasklet could be scheduled on multiple cpus at the same time -- that's
not going to work.

+			__tasklet_schedule(t);
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ