[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMGffE=Mbfp=7xD_hYxXk1PAaCZNSEAVeQGKGy7YF9f2S4=NEA@mail.gmail.com>
Date: Mon, 19 Jan 2026 16:14:05 +0100
From: Jinpu Wang <jinpu.wang@...os.com>
To: linux-raid <linux-raid@...r.kernel.org>, yukuai@...as.com,
Song Liu <song@...nel.org>, open list <linux-kernel@...r.kernel.org>
Subject: [BUG] md: race between bitmap_daemon_work and __bitmap_resize leading
to use-after-free
Hello folks,
We are seeing a general protection fault in the md bitmap code during
array resize operations. This appears to be a race condition between
the bitmap daemon work and the bitmap resize code path in kernel 6.1
(and likely later versions).
[Crash Details]
The crash occurs at write_page+0x22b when dereferencing page_buffers(page).
general protection fault, probably for non-canonical address
0xc2f57c2ef374586f: 0000 [#1] PREEMPT SMP
CPU: 18 PID: 1598035 Comm: md53_raid1 Kdump: loaded Tainted: G
O 6.1.118-pserver
RIP: 0010:write_page+0x22b/0x3c0 [md_mod]
Code: f0 ff 83 f0 00 00 00 e8 13 6d a3 cb 48 85 db 74 cb 48 8b 53 28
49 8b b6 40 03 00 00 48 85 d2 0f 84 41 01 00 00 49 8b 44 24 70 <49> 8b
7d 20 b9 00 10 00 00 48 83 e8 01 48 39 c7 0f 84 da 00 00 00
RSP: 0018:ffffa82f3b857c40 EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff99abc0e39400 RCX: 0000000000000000
RDX: ffff99bfc21a3c00 RSI: 0000000000000008 RDI: ffff9a2c9ce358c0
RBP: ffff99ac72048018 R08: 0000000000000000 R09: ffff99ac720482c0
R10: 0000000000000000 R11: 0000000000000000 R12: ffff99b151373e00
R13: c2f57c2ef374586f R14: ffff99ac72048000 R15: ffff99abc0e394f0
Call Trace:
<TASK>
? exc_general_protection+0x222/0x4b0
? asm_exc_general_protection+0x22/0x30
? write_page+0x22b/0x3c0 [md_mod]
bitmap_daemon_work+0x26b/0x3a0 [md_mod]
md_check_recovery+0x58/0x5d0 [md_mod]
raid1d+0x8e/0x1940 [raid1]
[Analysis]
The root cause is a use-after-free race between
__bitmap_resize() and
bitmap_daemon_work().
bitmap_daemon_work() (running in the md thread) iterates over
bitmap->storage.filemap[] and calls
write_page():
for (j = 0; j < bitmap->storage.file_pages; j++) {
if (bitmap->storage.filemap && ...) {
write_page(bitmap, bitmap->storage.filemap[j], 0);
}
}
Crucially, this access to filemap[j] is done without holding any lock
that would prevent the storage from being replaced and freed.
__bitmap_resize() (triggered by resize ioctl) replaces the bitmap storage:
spin_lock_irq(&bitmap->counts.lock);
md_bitmap_file_unmap(&bitmap->storage); // Frees old pages and kfrees filemap
bitmap->storage = store;
spin_unlock_irq(&bitmap->counts.lock);
Even though
__bitmap_resize() calls quiesce(), this only suspends normal I/O. It
does NOT stop the md thread itself, which continues to run and can
enter md_check_recovery() -> bitmap_daemon_work().
[Race Window]
Thread 1 (md thread) reads a page pointer from
bitmap->storage.filemap[j]. Simultaneously, Thread 2 (resize) calls
md_bitmap_file_unmap(), which calls
free_buffers(page) and
kfree(filemap). When Thread 1 enters
write_page(), it dereferences the now-freed page/buffer_head,
resulting in the GPF.
The current locking (counts.lock) protects the bitmap counters but not
the bitmap->storage structure itself during the transition to
write_page.
We are looking for suggestions on the best way to synchronize this. It
seems we need to either: a) Ensure the md thread's daemon work is
stopped/flushed before
__bitmap_resize proceeds with unmapping. b) Protect bitmap->storage
replacement with a lock that
bitmap_daemon_work also respects.
Any thoughts on the preferred approach?
Best regards,
Jinpu Wang @ IONOS Cloud
Powered by blists - more mailing lists