[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YGGoa/AAk86FmYgn@alley>
Date: Mon, 29 Mar 2021 12:14:03 +0200
From: Petr Mladek <pmladek@...e.com>
To: Wang Qing <wangqing@...o.com>
Cc: Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"Guilherme G. Piccoli" <gpiccoli@...onical.com>,
Vlastimil Babka <vbabka@...e.cz>,
Santosh Sivaraj <santosh@...six.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH V3] workqueue/watchdog: Make unbound workqueues aware of
On Wed 2021-03-24 19:34:02, Wang Qing wrote:
> There are two workqueue-specific watchdog timestamps:
>
> + @wq_watchdog_touched_cpu (per-CPU) updated by
> touch_softlockup_watchdog()
>
> + @wq_watchdog_touched (global) updated by
> touch_all_softlockup_watchdogs()
>
> watchdog_timer_fn() checks only the global @wq_watchdog_touched for
> unbound workqueues. As a result, unbound workqueues are not aware
> of touch_softlockup_watchdog(). The watchdog might report a stall
> even when the unbound workqueues are blocked by a known slow code.
>
> Solution:
> touch_softlockup_watchdog() must touch also the global @wq_watchdog_touched
> timestamp.
>
> The global timestamp can not longer be used for bound workqueues
> because it is updated on all CPUs. Instead, bound workqueues
> have to check only @wq_watchdog_touched_cpu and these timestamp
> has to be updated for all CPUs in touch_all_softlockup_watchdogs().
>
> Beware:
> The change might cause the opposite problem. An unbound workqueue
> might get blocked on CPU A because of a real softlockup. The workqueue
> watchdog would miss it when the timestamp got touched on CPU B.
>
> It is acceptable because softlockups are detected by softlockup
> watchdog. The workqueue watchdog is there to detect stalls where
> a work never finishes, for example, because of dependencies of works
> queued into the same workqueue.
>
> V3:
> - Modify the commit message clearly according to Petr's suggestion.
>
> Signed-off-by: Wang Qing <wangqing@...o.com>
The patch fixes a real problem:
Reviewed-by: Petr Mladek <pmladek@...e.com>
Best Regards,
Petr
Powered by blists - more mailing lists