[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150402133502.GA175361@redhat.com>
Date: Thu, 2 Apr 2015 09:35:02 -0400
From: Don Zickus <dzickus@...hat.com>
To: Chris Metcalf <cmetcalf@...hip.com>
Cc: Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrew Jones <drjones@...hat.com>,
chai wen <chaiw.fnst@...fujitsu.com>,
Ulrich Obergfell <uobergfe@...hat.com>,
Fabian Frederick <fabf@...net.be>,
Aaron Tomlin <atomlin@...hat.com>,
Ben Zhang <benzh@...omium.org>,
Christoph Lameter <cl@...ux.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Gilad Ben-Yossef <gilad@...yossef.com>,
Steven Rostedt <rostedt@...dmis.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores
On Tue, Mar 31, 2015 at 02:30:44PM -0400, Chris Metcalf wrote:
> On 03/31/2015 03:25 AM, Ingo Molnar wrote:
> >* cmetcalf@...hip.com <cmetcalf@...hip.com> wrote:
> >
> >>From: Chris Metcalf <cmetcalf@...hip.com>
> >>
> >>Running watchdog can be a helpful debugging feature on regular
> >>cores, but it's incompatible with nohz_full, since it forces
> >>regular scheduling events. Accordingly, just exit out immediately
> >>from any nohz_full core.
> >>
> >>An alternate approach would be to add a flags field or function to
> >>smp_hotplug_thread to control on which cores the percpu threads
> >>are created, but it wasn't clear that much mechanism was useful.
> >>
> >>[...]
> >So what happens if someone wants to enable the lockup detector, with a
> >long timeout, even on nohz-full CPUs? This patch makes that
> >impossible.
> >
> >A better solution would be to tweak the defaults:
> >
> > - to default the watchdog(s) to disabled when nohz-full is
> > enabled, even if HARDLOCKUP_DETECTOR=y or DETECT_HUNG_TASK=y, and
> > allow it to be re-enabled via its sysctl.
>
> That's certainly a reasonable thing to do; it looks like just an #ifdef
> at the top of watchdog.c would suffice. Does this look right?
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 8a46d9d8a66f..c8555c211e65 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -25,7 +25,11 @@
> #include <linux/kvm_para.h>
> #include <linux/perf_event.h>
> +#ifdef CONFIG_NO_HZ_FULL
> +int watchdog_user_enabled = 0;
> +#else
> int watchdog_user_enabled = 1;
> +#endif
> int __read_mostly watchdog_thresh = 10;
> #ifdef CONFIG_SMP
> int __read_mostly sysctl_softlockup_all_cpu_backtrace;
>
> It doesn't look like I need to do anything else special to disable
> HARDLOCKUP_DETECTOR, and khungtaskd can happily run on
> a non-nohz core, so that should be OK.
>
> What I was trying to achieve with my proposed patch was kind
> of orthogonal: to allow the watchdog to run on standard cores,
> but not run on nohz cores, so we could benefit from it on the
> cores where it was safe for it to run. Do you see value in this,
> or better to just enable/disable all watchdog threads collectively?
Hmm, I am not sure I am a big fan of this approach. I know RHEL keeps the
watchdogs enabled for customers and it would be a regression if we disabled
it. And at the same time, I could see RHEL leaning towards enabling
CONFIG_NO_HZ_FULL, which would just delay this problem a number of years
until RHEL-8 gets around to ramping up.
So I guess I would prefer to figure out a better co-existing solution now.
Can I ask how the NO_HZ_FULL technology works from userspace? Is there a
system command that has to be sent? How does the kernel know to turn off
ticks and trust userspace to do the right thing?
Cheers,
Don
>
> --
> Chris Metcalf, EZChip Semiconductor
> http://www.ezchip.com
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists