linux-kernel - Re: rcu_preempt detected stalls.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141023191319.GA5137@redhat.com>
Date:	Thu, 23 Oct 2014 21:13:19 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>, htejun@...il.com
Subject: Re: rcu_preempt detected stalls.

On 10/23, Paul E. McKenney wrote:
>
> On Mon, Oct 13, 2014 at 01:35:04PM -0400, Dave Jones wrote:
> > Today in "rcu stall while fuzzing" news:
> >
> > INFO: rcu_preempt detected stalls on CPUs/tasks:
> > 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> > trinity-c342    R  running task    13384   766  32295 0x00000000
> >  ffff880068943d58 0000000000000002 0000000000000002 ffff880193c8c680
> >  00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
> >  ffff88024302c680 ffff880193c8c680 ffff880068943fd8 0000000000000000
> > Call Trace:
> >  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
> >  [<ffffffff8883df10>] retint_kernel+0x20/0x30
> >  [<ffffffff880d9424>] ? lock_acquire+0xd4/0x2b0
> >  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
> >  [<ffffffff8808d4d5>] kill_pid_info+0x45/0x130
> >  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
> >  [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
> >  [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
> >  [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
> >  [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
> >  [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
> >  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
>
> Well, there is a loop in kill_pid_info().  I am surprised that it
> would loop indefinitely, but if it did, you would certainly get
> RCU CPU stalls.  Please see patch below, adding Oleg for his thoughts.

Yes, this loops should not be a problem, we only restart if we race with
a multi-threaded exec from a non-leader thread.

But I already saw a couple of bug-reports which look as a task_struct
corruption (->signal/creds == NULL), looks like something was broken
recently. Perhaps an unbalanced put_task_struct...

_Perhaps_ this is another case. If ->sighand was nullified then it will
loop forever.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/