lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 6 Jul 2016 22:05:07 -0700
From:	Joel Fernandes <agnel.joel@...il.com>
To:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: High rate of touch_softlockup makes Soft Lockup detector useless

Hi,

In a system running a recent kernel, I am trying to use soft lockup
detector to detect soft lockups in the system.
During this exercise, I see that even with real soft lockups, the
kernel is unable to detect them.

Digging in further, I found that the softlockup watchdog is touched
1000s of times per second by the NOHZ code.
prints revealed the following 2 functions calling touch_softlockup_watchdog:
[  165.960292] CPU0 touch: tick_nohz_restart_sched_tick
[  165.960309] CPU1 touch: tick_nohz_update_jiffies

I am wondering, do we really need to touch the softlockup watchdog
from the tick_nohz code?
>From the code comments it looks like the watchdog is touch'ed because
the tick was off and was being turned on so it could the watchdog may
not have been touched for a long time.
BUT, wouldn't the hrtimer interrupt for the watchdog timer cause the
watchdog thread to be scheduled even though the tick was off for a
long time? Then in that case do we really need to touch the softlockup
watchdog from the tick_nohz code?

In any case, looks like the softlockup detection is broken and doesn't
work with nohz.

BTW, commenting out the touch_softlockup seems to make soft lockup
detection work again. Any suggestions for a real fix and the right way
forward?

Thanks,

Joel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ