linux-cve-announce - CVE-2024-39476: md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <2024070518-CVE-2024-39476-aa2d@gregkh>
Date: Fri,  5 Jul 2024 08:55:20 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: linux-cve-announce@...r.kernel.org
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: CVE-2024-39476: md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING

Description
===========

In the Linux kernel, the following vulnerability has been resolved:

md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING

Xiao reported that lvm2 test lvconvert-raid-takeover.sh can hang with
small possibility, the root cause is exactly the same as commit
bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"")

However, Dan reported another hang after that, and junxiao investigated
the problem and found out that this is caused by plugged bio can't issue
from raid5d().

Current implementation in raid5d() has a weird dependence:

1) md_check_recovery() from raid5d() must hold 'reconfig_mutex' to clear
   MD_SB_CHANGE_PENDING;
2) raid5d() handles IO in a deadloop, until all IO are issued;
3) IO from raid5d() must wait for MD_SB_CHANGE_PENDING to be cleared;

This behaviour is introduce before v2.6, and for consequence, if other
context hold 'reconfig_mutex', and md_check_recovery() can't update
super_block, then raid5d() will waste one cpu 100% by the deadloop, until
'reconfig_mutex' is released.

Refer to the implementation from raid1 and raid10, fix this problem by
skipping issue IO if MD_SB_CHANGE_PENDING is still set after
md_check_recovery(), daemon thread will be woken up when 'reconfig_mutex'
is released. Meanwhile, the hang problem will be fixed as well.

The Linux kernel CVE team has assigned CVE-2024-39476 to this issue.


Affected and fixed versions
===========================

	Issue introduced in 4.19.262 with commit f3d55bd5b7b9 and fixed in 4.19.316 with commit b32aa95843ca
	Issue introduced in 5.4.220 with commit 1c00bb624cd0 and fixed in 5.4.278 with commit 634ba3c97ec4
	Issue introduced in 5.10.150 with commit 782b3e71c957 and fixed in 5.10.219 with commit aa64464c8f4d
	Issue introduced in 5.15.75 with commit 9e86dffd0b02 and fixed in 5.15.161 with commit 098d54934814
	Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.1.94 with commit 3f8d5e802d4c
	Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.6.34 with commit cd2538e5af49
	Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.9.5 with commit e332a12f65d8
	Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.10-rc1 with commit 151f66bb618d
	Issue introduced in 4.14.296 with commit 7d808fe6af84
	Issue introduced in 4.19.307 with commit 1e8c1c2a9269
	Issue introduced in 5.4.269 with commit 7f71d9817cea
	Issue introduced in 5.10.210 with commit 88ec9bbcd33c
	Issue introduced in 5.15.148 with commit 0ee3ded745ca
	Issue introduced in 5.19.17 with commit 2cab058f2b14
	Issue introduced in 6.0.3 with commit 91962e40ec3d
	Issue introduced in 6.1.74 with commit bed0acf330b2

Please see https://www.kernel.org for a full list of currently supported
kernel versions by the kernel community.

Unaffected versions might change over time as fixes are backported to
older supported kernel versions.  The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2024-39476
will be updated if fixes are backported, please check that for the most
up to date information about this issue.


Affected files
==============

The file(s) affected by this issue are:
	drivers/md/raid5.c


Mitigation
==========

The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes.  Individual
changes are never tested alone, but rather are part of a larger kernel
release.  Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all.  If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
	https://git.kernel.org/stable/c/b32aa95843cac6b12c2c014d40fca18aef24a347
	https://git.kernel.org/stable/c/634ba3c97ec413cb10681c7b196db43ee461ecf4
	https://git.kernel.org/stable/c/aa64464c8f4d2ab92f6d0b959a1e0767b829d787
	https://git.kernel.org/stable/c/098d54934814dd876963abfe751c3b1cf7fbe56a
	https://git.kernel.org/stable/c/3f8d5e802d4cedd445f9a89be8c3fd2d0e99024b
	https://git.kernel.org/stable/c/cd2538e5af495b3c747e503db346470fc1ffc447
	https://git.kernel.org/stable/c/e332a12f65d8fed8cf63bedb4e9317bb872b9ac7
	https://git.kernel.org/stable/c/151f66bb618d1fd0eeb84acb61b4a9fa5d8bb0fa