linux-kernel - Re: [PATCH for-6.18/block 2/2] blk-cgroup: fix possible deadlock while configuring policy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFj5m9Jjwcurm-EuM177ermySQEctDgwOFh8vHiczEfz_xtrmg@mail.gmail.com>
Date: Tue, 23 Sep 2025 19:13:48 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Yu Kuai <yukuai1@...weicloud.com>
Cc: nilay@...ux.ibm.com, tj@...nel.org, josef@...icpanda.com, axboe@...nel.dk, 
	cgroups@...r.kernel.org, linux-block@...r.kernel.org, 
	linux-kernel@...r.kernel.org, yukuai3@...wei.com, yi.zhang@...wei.com, 
	yangerkun@...wei.com, johnny.chenyi@...wei.com, 
	Ming Lei <ming.lei@...hat.com>
Subject: Re: [PATCH for-6.18/block 2/2] blk-cgroup: fix possible deadlock
 while configuring policy

On Tue, Sep 23, 2025 at 4:06 PM Yu Kuai <yukuai1@...weicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@...wei.com>
>
> Following deadlock can be triggered easily by lockdep:
>
> WARNING: possible circular locking dependency detected
> 6.17.0-rc3-00124-ga12c2658ced0 #1665 Not tainted
> ------------------------------------------------------
> check/1334 is trying to acquire lock:
> ff1100011d9d0678 (&q->sysfs_lock){+.+.}-{4:4}, at: blk_unregister_queue+0x53/0x180
>
> but task is already holding lock:
> ff1100011d9d00e0 (&q->q_usage_counter(queue)#3){++++}-{0:0}, at: del_gendisk+0xba/0x110
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&q->q_usage_counter(queue)#3){++++}-{0:0}:
>        blk_queue_enter+0x40b/0x470
>        blkg_conf_prep+0x7b/0x3c0
>        tg_set_limit+0x10a/0x3e0
>        cgroup_file_write+0xc6/0x420
>        kernfs_fop_write_iter+0x189/0x280
>        vfs_write+0x256/0x490
>        ksys_write+0x83/0x190
>        __x64_sys_write+0x21/0x30
>        x64_sys_call+0x4608/0x4630
>        do_syscall_64+0xdb/0x6b0
>        entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> -> #1 (&q->rq_qos_mutex){+.+.}-{4:4}:
>        __mutex_lock+0xd8/0xf50
>        mutex_lock_nested+0x2b/0x40
>        wbt_init+0x17e/0x280
>        wbt_enable_default+0xe9/0x140
>        blk_register_queue+0x1da/0x2e0
>        __add_disk+0x38c/0x5d0
>        add_disk_fwnode+0x89/0x250
>        device_add_disk+0x18/0x30
>        virtblk_probe+0x13a3/0x1800
>        virtio_dev_probe+0x389/0x610
>        really_probe+0x136/0x620
>        __driver_probe_device+0xb3/0x230
>        driver_probe_device+0x2f/0xe0
>        __driver_attach+0x158/0x250
>        bus_for_each_dev+0xa9/0x130
>        driver_attach+0x26/0x40
>        bus_add_driver+0x178/0x3d0
>        driver_register+0x7d/0x1c0
>        __register_virtio_driver+0x2c/0x60
>        virtio_blk_init+0x6f/0xe0
>        do_one_initcall+0x94/0x540
>        kernel_init_freeable+0x56a/0x7b0
>        kernel_init+0x2b/0x270
>        ret_from_fork+0x268/0x4c0
>        ret_from_fork_asm+0x1a/0x30
>
> -> #0 (&q->sysfs_lock){+.+.}-{4:4}:
>        __lock_acquire+0x1835/0x2940
>        lock_acquire+0xf9/0x450
>        __mutex_lock+0xd8/0xf50
>        mutex_lock_nested+0x2b/0x40
>        blk_unregister_queue+0x53/0x180
>        __del_gendisk+0x226/0x690
>        del_gendisk+0xba/0x110
>        sd_remove+0x49/0xb0 [sd_mod]
>        device_remove+0x87/0xb0
>        device_release_driver_internal+0x11e/0x230
>        device_release_driver+0x1a/0x30
>        bus_remove_device+0x14d/0x220
>        device_del+0x1e1/0x5a0
>        __scsi_remove_device+0x1ff/0x2f0
>        scsi_remove_device+0x37/0x60
>        sdev_store_delete+0x77/0x100
>        dev_attr_store+0x1f/0x40
>        sysfs_kf_write+0x65/0x90
>        kernfs_fop_write_iter+0x189/0x280
>        vfs_write+0x256/0x490
>        ksys_write+0x83/0x190
>        __x64_sys_write+0x21/0x30
>        x64_sys_call+0x4608/0x4630
>        do_syscall_64+0xdb/0x6b0
>        entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> other info that might help us debug this:
>
> Chain exists of:
>   &q->sysfs_lock --> &q->rq_qos_mutex --> &q->q_usage_counter(queue)#3
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&q->q_usage_counter(queue)#3);
>                                lock(&q->rq_qos_mutex);
>                                lock(&q->q_usage_counter(queue)#3);
>   lock(&q->sysfs_lock);
>
> Root cause is that queue_usage_counter is grabbed with rq_qos_mutex
> held in blkg_conf_prep(), while queue should be freezed before
> rq_qos_mutex from other context.
>
> The blk_queue_enter() from blkg_conf_prep() is used to protect against
> policy deactivation, which is already protected with blkcg_mutex, hence
> convert blk_queue_enter() to blkcg_mutex to fix this problem. Meanwhile,
> consider that blkcg_mutex is held after queue is freezed from policy
> deactivation, also convert blkg_alloc() to use GFP_NOIO.

Looks good, and seems one example of abusing blk_queue_enter():

Reviewed-by: Ming Lei <ming.lei@...hat.com>