[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFRLqsXdo9YMYS_drZkaBSzeBSPWfAF_3JNr=ng__8SBNe9cyA@mail.gmail.com>
Date: Sat, 28 Jun 2025 06:49:56 +0800
From: cen zhang <zzzccc427@...il.com>
To: Filipe Manana <fdmanana@...nel.org>
Cc: clm@...com, josef@...icpanda.com, dsterba@...e.com,
linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
baijiaju1990@...il.com, zhenghaoran154@...il.com
Subject: Re: [BUG] btrfs: Data race between btrfs_quota_disable and
btrfs_qgroup_rescan leads to a UAF
Hello Btrfs maintainers,
Thank you for your detailed response to our previous report again. We
have carefully studied the two synchronization mechanisms you
mentioned and fully understand their design intent.
After a more in-depth analysis, we have confirmed a specific timing
window where both of these synchronization mechanisms can be bypassed,
leading to a Use-After-Free (UAF) issue. This email aims to elaborate
on this scenario and explain why we believe it is a race condition
that needs to be fixed.
The core of the problem is that there is a data race between
qgroup_rescan_zero_tracking(), called during the preparation phase of
the btrfs_qgroup_rescan function, and btrfs_free_qgroup_config(),
called by btrfs_quota_disable. The two key synchronization mechanisms
fail to cover this specific conflict point.
We try to explain why the tow synchronization mechanisms will fail:
You mentioned that btrfs_quota_disable first clears the
BTRFS_FS_QUOTA_ENABLED flag and then calls
btrfs_qgroup_wait_for_completion to wait for the rescan worker to
stop. This is intended to ensure a graceful shutdown.But the
effectiveness of this mechanism relies on
btrfs_qgroup_wait_for_completion correctly determining if a rescan
task is running. The function checks the
fs_info->qgroup_rescan_running flag to do this. However, in the
btrfs_qgroup_rescan function, the qgroup_rescan_running flag is set to
true only after all preparation work (including the call to
qgroup_rescan_zero_tracking) is complete. This creates a window where
a rescan task has started its setup but is not yet marked as
"running." If btrfs_quota_disable executes during this window, it will
see qgroup_rescan_running as false, incorrectly skip the wait, and
proceed directly to the cleanup process.
And regarding the second synchronization mechanism you proposed, you
mentioned that btrfs_quota_disable acquires a transaction handle
before calling btrfs_free_qgroup_config. Since the background rescan
worker also needs to commit transactions, the inherent mutual
exclusion of the transaction system should block the worker, thus
preventing concurrent modifications.But this mechanism protects
operations that require a Btrfs transaction. However, the problematic
function, qgroup_rescan_zero_tracking, is a memory-only operation. It
traverses and modifies the in-memory qgroup red-black tree, a process
that does not require or hold a Btrfs transaction handle. In the
execution flow of btrfs_qgroup_rescan, it first completes a
btrfs_commit_current_transaction, at which point it no longer holds a
transaction handle. It then calls qgroup_rescan_zero_tracking.
Therefore, even if btrfs_quota_disable holds a transaction handle, it
cannot block the execution of qgroup_rescan_zero_tracking, as the
latter does not compete for the transaction lock.
Detailed Execution Trace Triggering the UAF
Consider the precise steps showing how the two threads can interleave
to trigger the UAF:
TID 1 (Rescan Thread): Executes btrfs quota rescan, entering
btrfs_qgroup_rescan(). After successfully executing
btrfs_commit_current_transaction() but before calling
qgroup_rescan_zero_tracking(), it gets preempted by the OS or delay by
our tools.And now qgroup_rescan_running is still false. TID 1 does not
hold a transaction handle.
TID 2 (Disable Thread): Executes btrfs quota disable, entering
btrfs_quota_disable().
It calls btrfs_qgroup_wait_for_completion(). Since
qgroup_rescan_running is false, the wait is skipped.
It successfully starts a new transaction (btrfs_start_transaction).
It calls btrfs_free_qgroup_config(). This function, without taking a
lock, loops through calls to rb_erase and kfree, freeing all nodes and
memory of the qgroup red-black tree.
TID 1 (Rescan Thread): Resumes execution.
It calls qgroup_rescan_zero_tracking().
Inside this function, the for (n = rb_first(&fs_info->qgroup_tree);
...) attempts to access the memory just freed by TID 2. rb_first
returns a dangling pointer n.
qgroup = rb_entry(n, ...) results in an access to freed memory.
Finally, the call to list_add(&qgroup->dirty, ...) inside
qgroup_dirty() causes a kernel panic due to the illegal memory access.
Thank you again for your patience and time.
Best regards,
Cen Zhang
Filipe Manana <fdmanana@...nel.org> 于2025年6月26日周四 21:22写道:
>
> On Thu, Jun 26, 2025 at 2:00 PM cen zhang <zzzccc427@...il.com> wrote:
> >
> > Hello maintainers,
> >
> > I've encountered a kernel panic while using Btrfs. The issue appears
> > to be a use-after-free (UAF) caused by a data race between the
> > btrfs_quota_disable operation and a background btrfs_qgroup_rescan
> > task.
>
> You have already reported this yesterday:
>
> https://lore.kernel.org/linux-btrfs/CAFRLqsXQMknPBgYkds=ARWFC0vj1xAP77USG+ZG5GH3rbqB5xQ@mail.gmail.com/
>
> >
> > Kernel Version: 6.16.0-rc1-g7f6432600434-dirty
> > Environment: QEMU Virtual Machine
> >
> > Problem Description & Root Cause Analysis:
> >
> > The issue is triggered by the concurrent execution of the
> > btrfs_quota_disable function and the background btrfs_qgroup_rescan
> > worker.
>
> That can't happen, for two different reasons:
>
> 1) At btrfs_quota_disable() we clear the bit BTRFS_FS_QUOTA_ENABLED
> from fs_info and then wait for the rescan worker to complete.
> The rescan worker stops as soon as it sees that bit is not set,
> and if anyone tries to queue the rescan worker, the worker won't do
> nothing since rescan_should_stop() returns true because the bit is not
> set in fs_info anymore;
>
> 2) The quota disable and the rescan worker concurrently calling
> btrfs_commit_transaction() -> commit_cowonly_roots() ->
> btrfs_run_qgroups() is just not possible due to transaction states.
> At btrfs_free_qgroup_config(), besides having ensured early that
> the rescan worker is not running and can't run anymore, as said before
> there's also this detail:
>
> btrfs_free_qgroup_config() is called after acquiring a transaction
> handle - that means anyone trying to commit a transaction will block
> waiting for the task running btrfs_quota_disable() to release its
> transaction handle.
> So there's no way the rescan worker could get into
> commit_cowonly_roots() since that happens when there is no one else
> holding a handle for the transaction, the transaction state is
> TRANS_STATE_COMMIT_DOING.
>
> More comments below.
>
>
> >
> > When quotas are disabled via an ioctl call, the btrfs_quota_disable ->
> > btrfs_free_qgroup_config path iterates through the qgroup tree and
> > frees the memory for each btrfs_qgroup object using kfree.
>
> Yes, while holding a transaction handle open and having waited for the
> rescan worker to complete and preventing new runs of the rescan
> worker, as just explained above.
>
> >
> > Simultaneously, a background qgroup rescan task (btrfs_qgroup_rescan
> > -> qgroup_rescan_zero_tracking) may be running, which iterates over
> > and accesses the very same qgroup tree.
>
> No, as said before, the rescan worker can just commit a transaction
> while the quota disable task is freeing qgroups.
>
> >
> > Due to a lack of proper locking to synchronize these two operations,
> > the qgroup_rescan_zero_tracking function can access a btrfs_qgroup
> > object that has just been freed by btrfs_free_qgroup_config.
>
> There's is proper locking - transaction states and waiting for rescan
> worker and making sure it can't start before we start freeing qrgoups.
> >
> > When it then attempts to modify a member of this dangling pointer
> > (e.g., calling list_add in qgroup_dirty), it triggers a
> > use-after-free, which ultimately leads to the kernel panic.
> >
> > We found this bug using our proprietary data-race detector in
>
> Ok, so this is an out of tree thing, and it's buggy for this case.
>
> > conjunction with syzkaller. Our tool first detected a race condition
> > and then actively intervened by swapping the execution order of the
> > conflicting operations. This controlled reordering directly exposed a
> > latent use-after-free (UAF) vulnerability, which was subsequently
> > caught and reported by KASAN.
>
> Ok, so your tool is reordering execution into a sequence that is not
> possible otherwise as explained before regarding transaction states
> and the rescan stop.
>
> Thanks.
>
> >
> > I believe this issue could be fixed by adding an appropriate
> > synchronization mechanism (e.g., a mutex) between the
> > btrfs_quota_disable path and the background qgroup scanning tasks.
> >
> > Full kernel panic log is attached below:
> >
> > var addr ffff888168296888, addr masked 888168296888
> > Kernel panic: ============ DATARACE ============
> > VarName 2268446652518064666, BlockLineNumber 9, IrLineNumber 1, is write 1
> > Function: found_watchpoint kernel/kccwf/wp_checker.c:75 [inline]
> > Function: watchpoints_monitor+0x1237/0x19a0 kernel/kccwf/wp_checker.c:155
> > Function: kccwf_rec_mem_access+0x7d0/0xae0 kernel/kccwf/core.c:582
> > Function: list_del include/linux/list.h:230 [inline]
> > Function: __del_qgroup_rb+0x2c2/0x17c0 fs/btrfs/qgroup.c:233
> > Function: btrfs_free_qgroup_config+0xa1/0x2b0 fs/btrfs/qgroup.c:645
> > Function: btrfs_quota_disable+0x826/0x25e0 fs/btrfs/qgroup.c:1393
> > Function: btrfs_ioctl_quota_ctl+0x3b3/0x4e0 fs/btrfs/ioctl.c:3703
> > Function: btrfs_ioctl+0xb3f/0x1480 fs/btrfs/ioctl.c:5323
> > Function: vfs_ioctl fs/ioctl.c:51 [inline]
> > Function: __do_sys_ioctl fs/ioctl.c:907 [inline]
> > Function: __se_sys_ioctl+0xd1/0x130 fs/ioctl.c:893
> > Function: do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > Function: do_syscall_64+0xcf/0x240 arch/x86/entry/syscall_64.c:94
> > Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > Function: 0x0
> > ============OTHER_INFO============
> > VarName 18129403906671370250, BlockLineNumber 22, IrLineNumber 1,
> > watchpoint index 19991
> > Function: set_report_info+0xa6/0x1f0 kernel/kccwf/report.c:49
> > Function: setup_watchpoint kernel/kccwf/wp_checker.c:102 [inline]
> > Function: watchpoints_monitor+0x7eb/0x19a0 kernel/kccwf/wp_checker.c:167
> > Function: kccwf_rec_mem_access+0x7d0/0xae0 kernel/kccwf/core.c:582
> > Function: __list_add include/linux/list.h:155 [inline]
> > Function: list_add include/linux/list.h:169 [inline]
> > Function: qgroup_dirty fs/btrfs/qgroup.c:1434 [inline]
> > Function: qgroup_rescan_zero_tracking fs/btrfs/qgroup.c:4005 [inline]
> > Function: btrfs_qgroup_rescan+0x4dc/0xa30 fs/btrfs/qgroup.c:4036
> > Function: btrfs_ioctl_quota_rescan+0x42a/0x530 fs/btrfs/ioctl.c:3943
> > Function: btrfs_ioctl+0x1187/0x1480 fs/btrfs/ioctl.c:5331
> > Function: vfs_ioctl fs/ioctl.c:51 [inline]
> > Function: __do_sys_ioctl fs/ioctl.c:907 [inline]
> > Function: __se_sys_ioctl+0xd1/0x130 fs/ioctl.c:893
> > Function: do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > Function: do_syscall_64+0xcf/0x240 arch/x86/entry/syscall_64.c:94
> > Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > =================END==============
> > Found watch_point 19991
> > BTRFS info (device sdb): balance: start -s
> > BTRFS info (device sdb): balance: ended with status: 0
> > BTRFS warning (device sdb): get dev_stats failed, device not found
> > [...]
> > ==================================================================
> > BUG: KASAN: slab-use-after-free in __list_del_entry
> > include/linux/list.h:218 [inline]
> > BUG: KASAN: slab-use-after-free in list_del_init
> > include/linux/list.h:287 [inline]
> > BUG: KASAN: slab-use-after-free in btrfs_run_qgroups+0x3cd/0x1ec0
> > fs/btrfs/qgroup.c:3132
> > Read of size 8 at addr ffff888168296890 by task btrfs-transacti/228
> >
> > CPU: 0 UID: 0 PID: 228 Comm: btrfs-transacti Not tainted
> > 6.16.0-rc1-g7f6432600434-dirty #52 PREEMPT(voluntary)
> > Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS
> > 1.16.3-debian-1.16.3-2 04/01/2014
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:94 [inline]
> > dump_stack_lvl+0x108/0x150 lib/dump_stack.c:120
> > print_address_description mm/kasan/report.c:408 [inline]
> > print_report+0x191/0x5b0 mm/kasan/report.c:521
> > kasan_report+0x139/0x170 mm/kasan/report.c:634
> > __list_del_entry include/linux/list.h:218 [inline]
> > list_del_init include/linux/list.h:287 [inline]
> > btrfs_run_qgroups+0x3cd/0x1ec0 fs/btrfs/qgroup.c:3132
> > commit_cowonly_roots+0x67c/0x1c10 fs/btrfs/transaction.c:1354
> > btrfs_commit_transaction+0x2a5b/0xc800 fs/btrfs/transaction.c:2457
> > transaction_kthread+0x5b7/0xcc0 fs/btrfs/disk-io.c:1590
> > kthread+0x351/0x780 kernel/kthread.c:464
> > ret_from_fork+0x10e/0x1c0 arch/x86/kernel/process.c:148
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > </TASK>
> >
> > Allocated by task 127769:
> > kasan_save_stack mm/kasan/common.c:47 [inline]
> > kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
> > poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
> > __kasan_kmalloc+0x82/0x90 mm/kasan/common.c:394
> > kmalloc_noprof include/linux/slab.h:905 [inline]
> > kzalloc_noprof include/linux/slab.h:1039 [inline]
> > btrfs_quota_enable+0x2d07/0x5d10 fs/btrfs/qgroup.c:1201
> > btrfs_ioctl_quota_ctl+0x36c/0x4e0 fs/btrfs/ioctl.c:3673
> > btrfs_ioctl+0xb3f/0x1480 fs/btrfs/ioctl.c:5323
> > vfs_ioctl fs/ioctl.c:51 [inline]
> > __do_sys_ioctl fs/ioctl.c:907 [inline]
> > __se_sys_ioctl+0xd1/0x130 fs/ioctl.c:893
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xcf/0x240 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > Freed by task 127948:
> > kasan_save_stack mm/kasan/common.c:47 [inline]
> > kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
> > kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:576
> > poison_slab_object mm/kasan/common.c:247 [inline]
> > __kasan_slab_free+0x36/0x50 mm/kasan/common.c:264
> > kasan_slab_free include/linux/kasan.h:233 [inline]
> > slab_free_hook mm/slub.c:2388 [inline]
> > slab_free mm/slub.c:4670 [inline]
> > kfree+0xfd/0x340 mm/slub.c:4869
> > btrfs_free_qgroup_config+0xcd/0x2b0 fs/btrfs/qgroup.c:647
> > btrfs_quota_disable+0x826/0x25e0 fs/btrfs/qgroup.c:1393
> > btrfs_ioctl_quota_ctl+0x3b3/0x4e0 fs/btrfs/ioctl.c:3703
> > btrfs_ioctl+0xb3f/0x1480 fs/btrfs/ioctl.c:5323
> > vfs_ioctl fs/ioctl.c:51 [inline]
> > __do_sys_ioctl fs/ioctl.c:907 [inline]
> > __se_sys_ioctl+0xd1/0x130 fs/ioctl.c:893
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xcf/0x240 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
> > The buggy address belongs to the object at ffff888168296800
> > which belongs to the cache kmalloc-512 of size 512
> > The buggy address is located 144 bytes inside of
> > freed 512-byte region [ffff888168296800, ffff888168296a00)
> > [...]
> > ==================================================================
> > Hope this helps in tracking down and fixing the issue.
> >
> > Here is the detail:
> >
> > Thank you for your attention to this matter.
> >
> > Best regards,
> > Cen Zhang
> >
Powered by blists - more mailing lists