linux-kernel - Re: rcu_preempt detected stalls.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141023193807.GZ4977@linux.vnet.ibm.com>
Date:	Thu, 23 Oct 2014 12:38:07 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>, htejun@...il.com
Subject: Re: rcu_preempt detected stalls.

On Thu, Oct 23, 2014 at 09:13:19PM +0200, Oleg Nesterov wrote:
> On 10/23, Paul E. McKenney wrote:
> >
> > On Mon, Oct 13, 2014 at 01:35:04PM -0400, Dave Jones wrote:
> > > Today in "rcu stall while fuzzing" news:
> > >
> > > INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > > 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P766 P646
> > > 	(detected by 0, t=6502 jiffies, g=75434, c=75433, q=0)
> > > trinity-c342    R  running task    13384   766  32295 0x00000000
> > >  ffff880068943d58 0000000000000002 0000000000000002 ffff880193c8c680
> > >  00000000001d4100 0000000000000000 ffff880068943fd8 00000000001d4100
> > >  ffff88024302c680 ffff880193c8c680 ffff880068943fd8 0000000000000000
> > > Call Trace:
> > >  [<ffffffff888368e2>] preempt_schedule_irq+0x52/0xb0
> > >  [<ffffffff8883df10>] retint_kernel+0x20/0x30
> > >  [<ffffffff880d9424>] ? lock_acquire+0xd4/0x2b0
> > >  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
> > >  [<ffffffff8808d4d5>] kill_pid_info+0x45/0x130
> > >  [<ffffffff8808d495>] ? kill_pid_info+0x5/0x130
> > >  [<ffffffff8808d6d2>] SYSC_kill+0xf2/0x2f0
> > >  [<ffffffff8808d67b>] ? SYSC_kill+0x9b/0x2f0
> > >  [<ffffffff8819c2b7>] ? context_tracking_user_exit+0x57/0x280
> > >  [<ffffffff880136bd>] ? syscall_trace_enter+0x13d/0x310
> > >  [<ffffffff8808fd9e>] SyS_kill+0xe/0x10
> > >  [<ffffffff8883d3a4>] tracesys+0xdd/0xe2
> >
> > Well, there is a loop in kill_pid_info().  I am surprised that it
> > would loop indefinitely, but if it did, you would certainly get
> > RCU CPU stalls.  Please see patch below, adding Oleg for his thoughts.
> 
> Yes, this loops should not be a problem, we only restart if we race with
> a multi-threaded exec from a non-leader thread.
> 
> But I already saw a couple of bug-reports which look as a task_struct
> corruption (->signal/creds == NULL), looks like something was broken
> recently. Perhaps an unbalanced put_task_struct...
> 
> _Perhaps_ this is another case. If ->sighand was nullified then it will
> loop forever.

OK, so making each pass through the loop a separate RCU read-side critical
section might be considered to be suppressing notification of an error
condition?

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/