[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM31RKR4w75Y8oNxS-cZ77AauvCFFXRzH=hhWXfr6LLQt2Myw@mail.gmail.com>
Date: Thu, 5 Mar 2020 14:07:37 -0800
From: Paul Turner <pjt@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Peter Zijlstra <peterz@...radead.org>, Xi Wang <xii@...gle.com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Josh Don <joshdon@...gle.com>,
LKML <linux-kernel@...r.kernel.org>,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] sched: watchdog: Touch kernel watchdog in sched code
On Thu, Mar 5, 2020 at 10:07 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> Peter Zijlstra <peterz@...radead.org> writes:
>
> > On Wed, Mar 04, 2020 at 01:39:41PM -0800, Xi Wang wrote:
> >> The main purpose of kernel watchdog is to test whether scheduler can
> >> still schedule tasks on a cpu. In order to reduce latency from
> >> periodically invoking watchdog reset in thread context, we can simply
> >> touch watchdog from pick_next_task in scheduler. Compared to actually
> >> resetting watchdog from cpu stop / migration threads, we lose coverage
> >> on: a migration thread actually get picked and we actually context
> >> switch to the migration thread. Both steps are heavily protected by
> >> kernel locks and unlikely to silently fail. Thus the change would
> >> provide the same level of protection with less overhead.
> >>
> >> The new way vs the old way to touch the watchdogs is configurable
> >> from:
> >>
> >> /proc/sys/kernel/watchdog_touch_in_thread_interval
> >>
> >> The value means:
> >> 0: Always touch watchdog from pick_next_task
> >> 1: Always touch watchdog from migration thread
> >> N (N>0): Touch watchdog from migration thread once in every N
> >> invocations, and touch watchdog from pick_next_task for
> >> other invocations.
> >>
> >
> > This is configurable madness. What are we really trying to do here?
>
> Create yet another knob which will be advertised in random web blogs to
> solve all problems of the world and some more. Like the one which got
> silently turned into a NOOP ~10 years ago :)
>
The knob can obviously be removed, it's vestigial and reflects caution
from when we were implementing / rolling things over to it. We have
default values that we know work at scale. I don't think this actually
needs or wants to be tunable beyond on or off (and even that could be
strictly compile or boot time only).
Powered by blists - more mailing lists