linux-kernel - [BUG] md: race between bitmap_daemon_work and __bitmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMGffE=Mbfp=7xD_hYxXk1PAaCZNSEAVeQGKGy7YF9f2S4=NEA@mail.gmail.com>
Date: Mon, 19 Jan 2026 16:14:05 +0100
From: Jinpu Wang <jinpu.wang@...os.com>
To: linux-raid <linux-raid@...r.kernel.org>, yukuai@...as.com, 
	Song Liu <song@...nel.org>, open list <linux-kernel@...r.kernel.org>
Subject: [BUG] md: race between bitmap_daemon_work and __bitmap_resize leading
 to use-after-free

Hello folks,

We are seeing a general protection fault in the md bitmap code during
array resize operations. This appears to be a race condition between
the bitmap daemon work and the bitmap resize code path in kernel 6.1
(and likely later versions).

[Crash Details]

The crash occurs at write_page+0x22b when dereferencing page_buffers(page).

general protection fault, probably for non-canonical address
0xc2f57c2ef374586f: 0000 [#1] PREEMPT SMP
CPU: 18 PID: 1598035 Comm: md53_raid1 Kdump: loaded Tainted: G
  O       6.1.118-pserver
RIP: 0010:write_page+0x22b/0x3c0 [md_mod]
Code: f0 ff 83 f0 00 00 00 e8 13 6d a3 cb 48 85 db 74 cb 48 8b 53 28
49 8b b6 40 03 00 00 48 85 d2 0f 84 41 01 00 00 49 8b 44 24 70 <49> 8b
7d 20 b9 00 10 00 00 48 83 e8 01 48 39 c7 0f 84 da 00 00 00
RSP: 0018:ffffa82f3b857c40 EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff99abc0e39400 RCX: 0000000000000000
RDX: ffff99bfc21a3c00 RSI: 0000000000000008 RDI: ffff9a2c9ce358c0
RBP: ffff99ac72048018 R08: 0000000000000000 R09: ffff99ac720482c0
R10: 0000000000000000 R11: 0000000000000000 R12: ffff99b151373e00
R13: c2f57c2ef374586f R14: ffff99ac72048000 R15: ffff99abc0e394f0
Call Trace:
 <TASK>
 ? exc_general_protection+0x222/0x4b0
 ? asm_exc_general_protection+0x22/0x30
 ? write_page+0x22b/0x3c0 [md_mod]
 bitmap_daemon_work+0x26b/0x3a0 [md_mod]
 md_check_recovery+0x58/0x5d0 [md_mod]
 raid1d+0x8e/0x1940 [raid1]

[Analysis]

The root cause is a use-after-free race between

__bitmap_resize() and
bitmap_daemon_work().

bitmap_daemon_work() (running in the md thread) iterates over
bitmap->storage.filemap[] and calls
write_page():

for (j = 0; j < bitmap->storage.file_pages; j++) {
    if (bitmap->storage.filemap && ...) {
        write_page(bitmap, bitmap->storage.filemap[j], 0);
    }
}

Crucially, this access to filemap[j] is done without holding any lock
that would prevent the storage from being replaced and freed.

__bitmap_resize() (triggered by resize ioctl) replaces the bitmap storage:

spin_lock_irq(&bitmap->counts.lock);
md_bitmap_file_unmap(&bitmap->storage); // Frees old pages and kfrees filemap
bitmap->storage = store;
spin_unlock_irq(&bitmap->counts.lock);

Even though

__bitmap_resize() calls quiesce(), this only suspends normal I/O. It
does NOT stop the md thread itself, which continues to run and can
enter md_check_recovery() -> bitmap_daemon_work().

[Race Window]

Thread 1 (md thread) reads a page pointer from
bitmap->storage.filemap[j]. Simultaneously, Thread 2 (resize) calls

md_bitmap_file_unmap(), which calls
free_buffers(page) and
kfree(filemap). When Thread 1 enters
write_page(), it dereferences the now-freed page/buffer_head,
resulting in the GPF.

The current locking (counts.lock) protects the bitmap counters but not
the bitmap->storage structure itself during the transition to

write_page.

We are looking for suggestions on the best way to synchronize this. It
seems we need to either: a) Ensure the md thread's daemon work is
stopped/flushed before

__bitmap_resize proceeds with unmapping. b) Protect bitmap->storage
replacement with a lock that
bitmap_daemon_work also respects.

Any thoughts on the preferred approach?

Best regards,
Jinpu Wang @ IONOS Cloud