linux-kernel - Re: [PATCH 2/2] workqueue: implement lockup detector

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1686084684.35307565.1449235166196.JavaMail.zimbra@redhat.com>
Date:	Fri, 4 Dec 2015 08:19:26 -0500 (EST)
From:	Ulrich Obergfell <uobergfe@...hat.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Don Zickus <dzickus@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH 2/2] workqueue: implement lockup detector

Tejun,

> Sure, separating the knobs out isn't difficult.  I still don't like
> the idea of having multiple set of similar knobs controlling about the
> same thing tho.
>
> For example, let's say there's a user who boots with "nosoftlockup"
> explicitly.  I'm pretty sure the user wouldn't be intending to keep
> workqueue watchdog running.  The same goes for threshold adjustments,
> so here's my question.  What are the reasons for the concern?  What
> are we worrying about?

I'm not sure if it is obvious to a user that a stall of workqueues is
"about the same thing" as a soft lockup, and that one could thus argue
that both should be controlled by the same knob. Looking at this from
perspective of usability, I would still vote for having separate knobs
for each lockup detector. For example

  /proc/sys/kernel/wq_watchdog_thresh

could control the on|off state of the workqueue watchdog and the timeout
at the same time (0 means off, > 0 means on and specifies the timeout).
Separating wq_watchdog_thresh from watchdog_thresh might also be useful
for diagnostic purposes for example, if during the investigation of a
problem one would want to explicitly increase or lower one threshold
without impacting the other.

>> And another question that comes to my mind is: Would the workqueue watchdog
>> participate in the lockup detector suspend/resume mechanism, and if yes, how
>> would it be integrated into this ?
>
> From the usage, I can't quite tell what the purpose of the mechanism
> is.  The only user seems to be fixup_ht_bug() and when it fails it
> says "failed to disable PMU erratum BJ122, BV98, HSD29 workaround" so
> if you could give me a pointer, it'd be great.  But at any rate, if
> shutting down watchdog is all that's necessary, it shouldn't be a
> problem to integrate.

The patch post that introduced the mechanism is here:

  http://marc.info/?l=linux-kernel&m=143843318208917&w=2

The watchdog_{suspend|resume} functions were later renamed:

  http://marc.info/?l=linux-kernel&m=143894132129982&w=2

At the moment I don't see a reason why the workqueue watchdog would have to
participate in that mechanism. However, if the workqueue watchdog would be
connected to the soft lockup detector as you proposed, I think it should be
participating for the 'sake of consistency' (it would seem hard to under-
stand if the interface would only suspend parts of the lockup detector).

Regards,

Uli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/