[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5519A4E7.8040708@ezchip.com>
Date: Mon, 30 Mar 2015 15:32:55 -0400
From: Chris Metcalf <cmetcalf@...hip.com>
To: Don Zickus <dzickus@...hat.com>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Andrew Jones <drjones@...hat.com>,
chai wen <chaiw.fnst@...fujitsu.com>,
Ingo Molnar <mingo@...nel.org>,
Ulrich Obergfell <uobergfe@...hat.com>,
Fabian Frederick <fabf@...net.be>,
Aaron Tomlin <atomlin@...hat.com>,
Ben Zhang <benzh@...omium.org>,
Christoph Lameter <cl@...ux.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Gilad Ben-Yossef <gilad@...yossef.com>,
Steven Rostedt <rostedt@...dmis.org>,
open list <linux-kernel@...r.kernel.org>
Subject: [PATCH v2] watchdog: nohz: don't run watchdog on nohz_full cores
On 03/30/2015 03:12 PM, Don Zickus wrote:
> On Mon, Mar 30, 2015 at 02:51:05PM -0400, cmetcalf@...hip.com wrote:
>> From: Chris Metcalf <cmetcalf@...hip.com>
>>
>> Running watchdog can be a helpful debugging feature on regular
>> cores, but it's incompatible with nohz_full, since it forces
>> regular scheduling events. Accordingly, just exit out immediately
>> from any nohz_full core.
>>
>> An alternate approach would be to add a flags field or function to
>> smp_hotplug_thread to control on which cores the percpu threads
>> are created, but it wasn't clear that much mechanism was useful.
> Hi Chris,
>
> It seems like the correct solution would be to hook into the idle_loop
> somehow. If the cpu is idle, then it seems unlikely that a lockup could
> occur.
With nohz_full, though, the cpu might be running userspace code
with the intention of keeping kernel ticks disabled. Even returning
to kernel mode to try to figure out if we "should" be running the
watchdog on a given core will induce exactly the kind of interrupts
that nohz_full is designed to prevent.
My assumption is generally that nohz_full cores don't spend a lot of
time in the kernel anyway, as they are optimized for user space.
I guess you could imagine doing something per-cpu on the nohz_full
cores where we effectively call watchdog_disable() whenever a
nohz_full core enters userspace, and watchdog_enable() whenever it
enters the kernel. We could add some per-cpu state in the watchdog
code to track whether that core was currently enabled or disabled
to avoid double-enabling or double-disabling. I would think
context_tracking_user_exit()/_enter() would be the place to do this.
This feels like a lot of overhead, potentially. Thoughts?
> My fear with this apporach is a lockup would occur on the nohz cpu and it
> would go undetected because that cpu is disabled. Further no printk is
> thrown out to even indicate a cpu is disabled making it more difficult to
> debug.
Assuming we stick with my simple "exit the watchdog thread" model,
we probably likely wouldn't want to warn for every nohz_full core, but a
one-shot message makes sense, e.g. just "watchdog: disabled on
nohz_full cores".
>> Signed-off-by: Chris Metcalf <cmetcalf@...hip.com>
>> ---
>> kernel/watchdog.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 3174bf8e3538..8a46d9d8a66f 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -19,6 +19,7 @@
>> #include <linux/sysctl.h>
>> #include <linux/smpboot.h>
>> #include <linux/sched/rt.h>
>> +#include <linux/tick.h>
>>
>> #include <asm/irq_regs.h>
>> #include <linux/kvm_para.h>
>> @@ -431,6 +432,10 @@ static void watchdog_enable(unsigned int cpu)
>> hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>> hrtimer->function = watchdog_timer_fn;
>>
>> + /* nohz_full cpus do not do watchdog checking. */
>> + if (tick_nohz_full_cpu(cpu))
>> + do_exit(0);
>> +
>> /* Enable the perf event */
>> watchdog_nmi_enable(cpu);
>>
>> --
>> 2.1.2
>>
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists