linux-kernel - Re: [PATCH v2 1/3] blk-cgroup: fix race between policy activation and blkg destruction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <edf84e44-d7e3-4a34-ad49-90ab5a4f545e@huaweicloud.com>
Date: Thu, 15 Jan 2026 11:27:47 +0800
From: Zheng Qixing <zhengqixing@...weicloud.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: tj@...nel.org, josef@...icpanda.com, axboe@...nel.dk, yukuai3@...wei.com,
 hch@...radead.org, cgroups@...r.kernel.org, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
 houtao1@...wei.com, zhengqixing@...wei.com
Subject: Re: [PATCH v2 1/3] blk-cgroup: fix race between policy activation and
 blkg destruction


在 2026/1/14 18:40, Michal Koutný 写道:
> On Tue, Jan 13, 2026 at 02:10:33PM +0800, Zheng Qixing <zhengqixing@...weicloud.com> wrote:
>> From: Zheng Qixing <zhengqixing@...wei.com>
>>
>> When switching an IO scheduler on a block device, blkcg_activate_policy()
>> allocates blkg_policy_data (pd) for all blkgs attached to the queue.
>> However, blkcg_activate_policy() may race with concurrent blkcg deletion,
>> leading to use-after-free and memory leak issues.
>>
>> The use-after-free occurs in the following race:
>>
>> T1 (blkcg_activate_policy):
>>    - Successfully allocates pd for blkg1 (loop0->queue, blkcgA)
>>    - Fails to allocate pd for blkg2 (loop0->queue, blkcgB)
>>    - Enters the enomem rollback path to release blkg1 resources
>>
>> T2 (blkcg deletion):
>>    - blkcgA is deleted concurrently
>>    - blkg1 is freed via blkg_free_workfn()
>>    - blkg1->pd is freed
>>
>> T1 (continued):
>>    - Rollback path accesses blkg1->pd->online after pd is freed
> The rollback path is under q->queue_lock same like the list removal in
> blkg_free_workfn().
> Why is queue_lock not enough for synchronization in this case?
>
> (BTW have you observed this case "naturally" or have you injected the
> memory allocation failure?)
>
Yes, this issue was discovered by injecting memory allocation failure at
->pd_alloc_fn(..., GFP_KERNEL) in blkcg_activate_policy().

In blkg_free_workfn(), q->queue_lock only protects the
list_del_init(&blkg->q_node). However, ->pd_free_fn() is called before
list_del_init(), meaning the pd is already freed before the blkg is removed
from the queue's list.

     blkcg_activate_policy()                  blkg_free_workfn()
     -------------------                          ------------------
     spin_lock(&q->queue_lock)
     ...
     if (!pd) {
         spin_unlock(&q->queue_lock)
         ...
         goto enomem
     }
     enomem:
         spin_lock(&q->queue_lock)
         if (pd) {
->pd_free_fn()  // pd freed
            pd->online // uaf
         ...
         }
spin_lock(&q->queue_lock)
list_del_init(&blkg->q_node)
spin_unlock(&q->queue_lock)
>>    - Triggers use-after-free
>>
>> In addition, blkg_free_workfn() frees pd before removing the blkg from
>> q->blkg_list.
> Yeah, this looks weirdly reversed.

Commit f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from 
blkg_free_workfn() and blkcg_deactivate_policy()") delays 
list_del_init(&blkg->q_node) until after pd_free_fn() in 
blkg_free_workfn(). This keeps blkgs visible in the queue list during 
policy deactivation, preventing parent policy data from being freed 
before child policy data and avoiding use-after-free.

Kind Regards,
Qixing