lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 29 Mar 2021 12:14:03 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Wang Qing <wangqing@...o.com>
Cc:     Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Guilherme G. Piccoli" <gpiccoli@...onical.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Santosh Sivaraj <santosh@...six.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH V3] workqueue/watchdog: Make unbound workqueues aware of

On Wed 2021-03-24 19:34:02, Wang Qing wrote:
> There are two workqueue-specific watchdog timestamps:
> 
>     + @wq_watchdog_touched_cpu (per-CPU) updated by
>       touch_softlockup_watchdog()
> 
>     + @wq_watchdog_touched (global) updated by
>       touch_all_softlockup_watchdogs()
> 
> watchdog_timer_fn() checks only the global @wq_watchdog_touched for
> unbound workqueues. As a result, unbound workqueues are not aware
> of touch_softlockup_watchdog(). The watchdog might report a stall
> even when the unbound workqueues are blocked by a known slow code.
> 
> Solution:
> touch_softlockup_watchdog() must touch also the global @wq_watchdog_touched 
> timestamp.
> 
> The global timestamp can not longer be used for bound workqueues
> because it is updated on all CPUs. Instead, bound workqueues
> have to check only @wq_watchdog_touched_cpu and these timestamp
> has to be updated for all CPUs in touch_all_softlockup_watchdogs().
> 
> Beware:
> The change might cause the opposite problem. An unbound workqueue
> might get blocked on CPU A because of a real softlockup. The workqueue
> watchdog would miss it when the timestamp got touched on CPU B.
> 
> It is acceptable because softlockups are detected by softlockup
> watchdog. The workqueue watchdog is there to detect stalls where
> a work never finishes, for example, because of dependencies of works
> queued into the same workqueue.
> 
> V3:
> - Modify the commit message clearly according to Petr's suggestion.
> 
> Signed-off-by: Wang Qing <wangqing@...o.com>

The patch fixes a real problem:

Reviewed-by: Petr Mladek <pmladek@...e.com>

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ