linux-kernel - Re: [PATCH] watchdog: nohz: don't run watchdog on nohz

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150402133502.GA175361@redhat.com>
Date:	Thu, 2 Apr 2015 09:35:02 -0400
From:	Don Zickus <dzickus@...hat.com>
To:	Chris Metcalf <cmetcalf@...hip.com>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrew Jones <drjones@...hat.com>,
	chai wen <chaiw.fnst@...fujitsu.com>,
	Ulrich Obergfell <uobergfe@...hat.com>,
	Fabian Frederick <fabf@...net.be>,
	Aaron Tomlin <atomlin@...hat.com>,
	Ben Zhang <benzh@...omium.org>,
	Christoph Lameter <cl@...ux.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Gilad Ben-Yossef <gilad@...yossef.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores

On Tue, Mar 31, 2015 at 02:30:44PM -0400, Chris Metcalf wrote:
> On 03/31/2015 03:25 AM, Ingo Molnar wrote:
> >* cmetcalf@...hip.com <cmetcalf@...hip.com> wrote:
> >
> >>From: Chris Metcalf <cmetcalf@...hip.com>
> >>
> >>Running watchdog can be a helpful debugging feature on regular
> >>cores, but it's incompatible with nohz_full, since it forces
> >>regular scheduling events.  Accordingly, just exit out immediately
> >>from any nohz_full core.
> >>
> >>An alternate approach would be to add a flags field or function to
> >>smp_hotplug_thread to control on which cores the percpu threads
> >>are created, but it wasn't clear that much mechanism was useful.
> >>
> >>[...]
> >So what happens if someone wants to enable the lockup detector, with a
> >long timeout, even on nohz-full CPUs? This patch makes that
> >impossible.
> >
> >A better solution would be to tweak the defaults:
> >
> >  - to default the watchdog(s) to disabled when nohz-full is
> >    enabled, even if HARDLOCKUP_DETECTOR=y or DETECT_HUNG_TASK=y, and
> >    allow it to be re-enabled via its sysctl.
> 
> That's certainly a reasonable thing to do; it looks like just an #ifdef
> at the top of watchdog.c would suffice.  Does this look right?
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 8a46d9d8a66f..c8555c211e65 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -25,7 +25,11 @@
>  #include <linux/kvm_para.h>
>  #include <linux/perf_event.h>
> +#ifdef CONFIG_NO_HZ_FULL
> +int watchdog_user_enabled = 0;
> +#else
>  int watchdog_user_enabled = 1;
> +#endif
>  int __read_mostly watchdog_thresh = 10;
>  #ifdef CONFIG_SMP
>  int __read_mostly sysctl_softlockup_all_cpu_backtrace;
> 
> It doesn't look like I need to do anything else special to disable
> HARDLOCKUP_DETECTOR, and khungtaskd can happily run on
> a non-nohz core, so that should be OK.
> 
> What I was trying to achieve with my proposed patch was kind
> of orthogonal: to allow the watchdog to run on standard cores,
> but not run on nohz cores, so we could benefit from it on the
> cores where it was safe for it to run.  Do you see value in this,
> or better to just enable/disable all watchdog threads collectively?


Hmm, I am not sure I am a big fan of this approach.  I know RHEL keeps the
watchdogs enabled for customers and it would be a regression if we disabled
it.  And at the same time, I could see RHEL leaning towards enabling
CONFIG_NO_HZ_FULL, which would just delay this problem a number of years
until RHEL-8 gets around to ramping up.

So I guess I would prefer to figure out a better co-existing solution now.

Can I ask how the NO_HZ_FULL technology works from userspace?  Is there a
system command that has to be sent?  How does the kernel know to turn off
ticks and trust userspace to do the right thing?

Cheers,
Don

> 
> -- 
> Chris Metcalf, EZChip Semiconductor
> http://www.ezchip.com
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/