lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 31 Mar 2014 17:48:01 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: rcu_prempt stalls / lockup

On Mon, Mar 31, 2014 at 07:35:52PM -0400, Dave Jones wrote:
> On Mon, Mar 31, 2014 at 04:22:21PM -0700, Paul E. McKenney wrote:
>  > On Mon, Mar 31, 2014 at 07:02:41PM -0400, Dave Jones wrote:
>  > > You can tell the merge window is open, because I'm back to breaking RCU.
>  > > 
>  > > ... 
>  > > [ 3558.120739] INFO: Stall ended before state dump start
>  > > 
>  > > at that point, userspace stopped responding. cursor on console was blinking,
>  > > but I couldn't even switch tty's, or sysrq dump.

Hmmm...  I am having a very hard time imagining any of this merge
window's RCU changes preventing a sysrq dump.  On the other hand,
having a single grace period persist without anything blocking it
is pretty strange as well.

I would hope that the sysrq path does not allocate memory, but who knows?
After all, one possible reason for the eventual hang is memory exhaustion.
So one thing to try is to do sysrq earlier in the process.  (Yeah,
I know, tough to do if you have lots of scripted systems.)

>  > > rc8 was fine, so this is todays rcu changes.
>  > 
>  > New one on me!  Any chance of a .config file?
> 
> http://paste.fedoraproject.org/90449/30888213/raw/

Given that you have CONFIG_RCU_NOCB_CPU_ALL=y, all the grace-period
activity is being driven by the grace-period kthreads ("rcu_preempt"
in this case).  This leads me to wonder if your workload if preventing
RCU's grace-period kthreads from running.  These kthreads are SCHED_OTHER,
so could potentially be preempted for a long time.  But I would expect
a softlockup message in that case.

Alternatively, I suppose a wakeup could be getting lost.  The main change
related to that this merge window was ffa83fb565fb, which eliminated
idle wakeups from RCU in the CONFIG_RCU_NOCB_CPU_ALL=y case.

So, could you please try reverting ffa83fb565fb?

If that doesn't work, I will need to put together some diagnostic patches.
Starting with the one below.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0c47e300210a..c5a163378710 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -936,7 +936,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 	       smp_processor_id(), (long)(jiffies - rsp->gp_start),
 	       rsp->gpnum, rsp->completed, totqlen);
 	if (ndetected == 0)
-		pr_err("INFO: Stall ended before state dump start\n");
+		pr_err("INFO: Stall ended before state dump start, gp_kthread state: %#lx\n", rsp->gp_kthread->state);
 	else if (!trigger_all_cpu_backtrace())
 		rcu_dump_cpu_stacks(rsp);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ