[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <8b5487db-d15a-4dd1-901c-e33ec1418c75@fnnas.com>
Date: Fri, 9 Jan 2026 00:22:46 +0800
From: "Yu Kuai" <yukuai@...as.com>
To: "Zheng Qixing" <zhengqixing@...weicloud.com>, <tj@...nel.org>,
<josef@...icpanda.com>, <axboe@...nel.dk>
Cc: <cgroups@...r.kernel.org>, <linux-block@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <yi.zhang@...wei.com>,
<yangerkun@...wei.com>, <houtao1@...wei.com>, <zhengqixing@...wei.com>,
<yukuai@...as.com>
Subject: Re: [PATCH 3/3] blk-cgroup: skip dying blkg in blkcg_activate_policy()
Hi,
在 2026/1/8 9:44, Zheng Qixing 写道:
> From: Zheng Qixing <zhengqixing@...wei.com>
>
> When switching IO schedulers on a block device, blkcg_activate_policy()
> can race with concurrent blkcg deletion, leading to a use-after-free of
> the blkg.
>
> T1: T2:
> elv_iosched_store blkg_destroy
> elevator_switch kill(&blkg->refcnt) // blkg->refcnt=0
> ... blkg_release // call_rcu
> blkcg_activate_policy __blkg_release
> list for blkg blkg_free
> blkg_free_workfn
> ->pd_free_fn(pd)
> blkg_get(blkg) // blkg->refcnt=0->1
> list_del_init(&blkg->q_node)
> kfree(blkg)
> blkg_put(pinned_blkg) // blkg->refcnt=1->0
> blkg_release // call_rcu again
> call_rcu(..., __blkg_release)
This stack is not clear to me, can this problem be fixed by protecting
q->blkg_list iteration with blkcg_mutex as I said in patch 2?
>
> Fix this by replacing blkg_get() with blkg_tryget(), which fails if
> the blkg's refcount has already reached zero. If blkg_tryget() fails,
> skip processing this blkg since it's already being destroyed.
>
> The uaf call trace is as follows:
>
> ==================================================================
> BUG: KASAN: slab-use-after-free in rcu_accelerate_cbs+0x114/0x120
> Read of size 8 at addr ffff88815a20b5d8 by task bash/1068
> CPU: 0 PID: 1068 Comm: bash Not tainted 6.6.0-g6918ead378dc-dirty #31
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
> Call Trace:
> <IRQ>
> rcu_accelerate_cbs+0x114/0x120
> rcu_report_qs_rdp+0x1fb/0x3e0
> rcu_core+0x4d7/0x6f0
> handle_softirqs+0x198/0x550
> irq_exit_rcu+0x130/0x190
> sysvec_apic_timer_interrupt+0x6e/0x90
> </IRQ>
> <TASK>
> asm_sysvec_apic_timer_interrupt+0x16/0x20
>
> Allocated by task 1031:
> kasan_save_stack+0x1c/0x40
> kasan_set_track+0x21/0x30
> __kasan_kmalloc+0x8b/0x90
> blkg_alloc+0xb6/0x9c0
> blkg_create+0x8c6/0x1010
> blkg_lookup_create+0x2ca/0x660
> bio_associate_blkg_from_css+0xfb/0x4e0
> bio_associate_blkg+0x62/0xf0
> bio_init+0x272/0x8d0
> bio_alloc_bioset+0x45a/0x760
> ext4_bio_write_folio+0x68e/0x10d0
> mpage_submit_folio+0x14a/0x2b0
> mpage_process_page_bufs+0x1b1/0x390
> mpage_prepare_extent_to_map+0xa91/0x1060
> ext4_do_writepages+0x948/0x1c50
> ext4_writepages+0x23f/0x4a0
> do_writepages+0x162/0x5e0
> filemap_fdatawrite_wbc+0x11a/0x180
> __filemap_fdatawrite_range+0x9d/0xd0
> file_write_and_wait_range+0x91/0x110
> ext4_sync_file+0x1c1/0xaa0
> __x64_sys_fsync+0x55/0x90
> do_syscall_64+0x55/0x100
> entry_SYSCALL_64_after_hwframe+0x78/0xe2
>
> Freed by task 24:
> kasan_save_stack+0x1c/0x40
> kasan_set_track+0x21/0x30
> kasan_save_free_info+0x27/0x40
> __kasan_slab_free+0x106/0x180
> __kmem_cache_free+0x162/0x350
> process_one_work+0x573/0xd30
> worker_thread+0x67f/0xc30
> kthread+0x28b/0x350
> ret_from_fork+0x30/0x70
> ret_from_fork_asm+0x1b/0x30
>
> Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
> Signed-off-by: Zheng Qixing <zhengqixing@...wei.com>
> ---
> block/blk-cgroup.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index af468676cad1..ac7702db0836 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -1645,9 +1645,10 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
> * GFP_NOWAIT failed. Free the existing one and
> * prealloc for @blkg w/ GFP_KERNEL.
> */
Why this check is not done before pd_alloc_fn()? What if pd_alloc_fn() succeed for
removed blkg?
> + if (!blkg_tryget(blkg))
> + continue;
> if (pinned_blkg)
> blkg_put(pinned_blkg);
> - blkg_get(blkg);
> pinned_blkg = blkg;
>
> spin_unlock_irq(&q->queue_lock);
--
Thansk,
Kuai
Powered by blists - more mailing lists