[<prev] [next>] [day] [month] [year] [list]
Message-ID: <2024092754-CVE-2024-46839-cfab@gregkh>
Date: Fri, 27 Sep 2024 14:40:07 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: linux-cve-announce@...r.kernel.org
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: CVE-2024-46839: workqueue: Improve scalability of workqueue watchdog touch
Description
===========
In the Linux kernel, the following vulnerability has been resolved:
workqueue: Improve scalability of workqueue watchdog touch
On a ~2000 CPU powerpc system, hard lockups have been observed in the
workqueue code when stop_machine runs (in this case due to CPU hotplug).
This is due to lots of CPUs spinning in multi_cpu_stop, calling
touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
and that can find itself in the same cacheline as other important
workqueue data, which slows down operations to the point of lockups.
In the case of the following abridged trace, worker_pool_idr was in
the hot line, causing the lockups to always appear at idr_find.
watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
Call Trace:
get_work_pool
__queue_work
call_timer_fn
run_timer_softirq
__do_softirq
do_softirq_own_stack
irq_exit
timer_interrupt
decrementer_common_virt
* interrupt: 900 (timer) at multi_cpu_stop
multi_cpu_stop
cpu_stopper_thread
smpboot_thread_fn
kthread
Fix this by having wq_watchdog_touch() only write to the line if the
last time a touch was recorded exceeds 1/4 of the watchdog threshold.
The Linux kernel CVE team has assigned CVE-2024-46839 to this issue.
Affected and fixed versions
===========================
Fixed in 5.15.167 with commit 9d08fce64dd7
Fixed in 6.1.110 with commit a2abd35e7dc5
Fixed in 6.6.51 with commit 241bce1c757d
Fixed in 6.10.10 with commit da5f374103a1
Fixed in 6.11 with commit 98f887f820c9
Please see https://www.kernel.org for a full list of currently supported
kernel versions by the kernel community.
Unaffected versions might change over time as fixes are backported to
older supported kernel versions. The official CVE entry at
https://cve.org/CVERecord/?id=CVE-2024-46839
will be updated if fixes are backported, please check that for the most
up to date information about this issue.
Affected files
==============
The file(s) affected by this issue are:
kernel/workqueue.c
Mitigation
==========
The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes. Individual
changes are never tested alone, but rather are part of a larger kernel
release. Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all. If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
https://git.kernel.org/stable/c/9d08fce64dd77f42e2361a4818dbc4b50f3c7dad
https://git.kernel.org/stable/c/a2abd35e7dc55bf9ed01e2b3481fa78e086d3bf4
https://git.kernel.org/stable/c/241bce1c757d0587721512296952e6bba69631ed
https://git.kernel.org/stable/c/da5f374103a1e0881bbd35847dc57b04ac155eb0
https://git.kernel.org/stable/c/98f887f820c993e05a12e8aa816c80b8661d4c87
Powered by blists - more mailing lists