lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150120152531.GR116159@redhat.com>
Date:	Tue, 20 Jan 2015 10:25:31 -0500
From:	Don Zickus <dzickus@...hat.com>
To:	Zhang Zhen <zhenzhang.zhang@...wei.com>
Cc:	paulmck@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
	morgan.wang@...wei.com, josh@...edesktop.org, dipankar@...ibm.com
Subject: Re: RCU CPU stall console spews  leads to soft lockup disabled is
 reasonable ?

On Tue, Jan 20, 2015 at 11:09:19AM +0800, Zhang Zhen wrote:
> 
> > Of course back then, touch_nmi_watchdog touched all cpus.  So a problem
> > like this was masked.  I believe this upstream commit 62572e29bc53, solved
> > the problem.
> 
> Thanks for your suggestion.
> 
> Commit 62572e29bc53 changed the semantics of touch_nmi_watchdog and make it
> only touch local cpu not every one.
> But watchdog_nmi_touch = true only guarantee no hard lockup check on this cpu.
> 
> Commit 62572e29bc53 didn't changed the semantics of touch_softlockup_watchdog.

Ah, yes.  I reviewed the commit to quickly yesterday.  I thought
touch_softlockup_watchdog was called on every cpu and that commit changed
it to the local cpu.  But that was incorrect.

> > 
> > You can apply that commit and see if you if you get both RCU stall
> > messages _and_ softlockup messages.  I believe that is what you were
> > expecting, correct?
> > 
> Correct, i expect i can get  both RCU stall messages _and_ softlockup messages.
> I applied that commit, and i only got RCU stall messages.

Hmm, I believe the act of printing to the console calls touch_nmi_watchdog
which calls touch_softlockup_watchdog.  I think that is the problem there.

This may not cause other problems but what happens if you comment out the
'touch_softlockup_watchdog' from the touch_nmi_watchdog function like
below (based on latest upstream cb59670870)?

The idea is that console printing for that cpu won't reset the softlockup
detector.  Again other bad things might happen and this patch may not be a
good final solution, but it can help give me a clue about what is going
on.

Cheers,
Don

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 70bf118..833c015 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -209,7 +209,7 @@ void touch_nmi_watchdog(void)
 	 * going off.
 	 */
 	raw_cpu_write(watchdog_nmi_touch, true);
-	touch_softlockup_watchdog();
+	//touch_softlockup_watchdog();
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ