[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1686084684.35307565.1449235166196.JavaMail.zimbra@redhat.com>
Date: Fri, 4 Dec 2015 08:19:26 -0500 (EST)
From: Ulrich Obergfell <uobergfe@...hat.com>
To: Tejun Heo <tj@...nel.org>
Cc: Don Zickus <dzickus@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH 2/2] workqueue: implement lockup detector
Tejun,
> Sure, separating the knobs out isn't difficult. I still don't like
> the idea of having multiple set of similar knobs controlling about the
> same thing tho.
>
> For example, let's say there's a user who boots with "nosoftlockup"
> explicitly. I'm pretty sure the user wouldn't be intending to keep
> workqueue watchdog running. The same goes for threshold adjustments,
> so here's my question. What are the reasons for the concern? What
> are we worrying about?
I'm not sure if it is obvious to a user that a stall of workqueues is
"about the same thing" as a soft lockup, and that one could thus argue
that both should be controlled by the same knob. Looking at this from
perspective of usability, I would still vote for having separate knobs
for each lockup detector. For example
/proc/sys/kernel/wq_watchdog_thresh
could control the on|off state of the workqueue watchdog and the timeout
at the same time (0 means off, > 0 means on and specifies the timeout).
Separating wq_watchdog_thresh from watchdog_thresh might also be useful
for diagnostic purposes for example, if during the investigation of a
problem one would want to explicitly increase or lower one threshold
without impacting the other.
>> And another question that comes to my mind is: Would the workqueue watchdog
>> participate in the lockup detector suspend/resume mechanism, and if yes, how
>> would it be integrated into this ?
>
> From the usage, I can't quite tell what the purpose of the mechanism
> is. The only user seems to be fixup_ht_bug() and when it fails it
> says "failed to disable PMU erratum BJ122, BV98, HSD29 workaround" so
> if you could give me a pointer, it'd be great. But at any rate, if
> shutting down watchdog is all that's necessary, it shouldn't be a
> problem to integrate.
The patch post that introduced the mechanism is here:
http://marc.info/?l=linux-kernel&m=143843318208917&w=2
The watchdog_{suspend|resume} functions were later renamed:
http://marc.info/?l=linux-kernel&m=143894132129982&w=2
At the moment I don't see a reason why the workqueue watchdog would have to
participate in that mechanism. However, if the workqueue watchdog would be
connected to the soft lockup detector as you proposed, I think it should be
participating for the 'sake of consistency' (it would seem hard to under-
stand if the interface would only suspend parts of the lockup detector).
Regards,
Uli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists