linux-kernel - Re: [PATCH] blk-throttle: fix throtl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHj4cs9p9H5yx+ywsb3CMUdbqGPhM+8tuBvhW=9ADiCjAqza9w@mail.gmail.com>
Date: Tue, 16 Sep 2025 16:15:06 +0800
From: Yi Zhang <yi.zhang@...hat.com>
To: Yu Kuai <yukuai1@...weicloud.com>
Cc: tj@...nel.org, josef@...icpanda.com, axboe@...nel.dk, liangjie@...iang.com, 
	yukuai3@...wei.com, hanguangjiang@...iang.com, cgroups@...r.kernel.org, 
	linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, 
	yi.zhang@...wei.com, yangerkun@...wei.com, johnny.chenyi@...wei.com
Subject: Re: [PATCH] blk-throttle: fix throtl_data leak during disk release

Hi Yu
A new issue was triggered with the change. Please check it. Thanks.

[  285.804104] run blktests throtl/001 at 2025-09-16 04:11:12
[  286.161894] null_blk: disk dev_nullb created

[  293.388583] ======================================================
[  293.394762] WARNING: possible circular locking dependency detected
[  293.400940] 6.17.0-rc6.v1+ #2 Not tainted
[  293.404952] ------------------------------------------------------
[  293.411131] find/1609 is trying to acquire lock:
[  293.415751] ffff8882911b50b0 ((&sq->pending_timer)){+.-.}-{0:0},
at: __timer_delete_sync+0x23/0x120
[  293.424817]
               but task is already holding lock:
[  293.430648] ffff8882b7794948 (&blkcg->lock){....}-{3:3}, at:
blkcg_deactivate_policy+0x1e7/0x4e0
[  293.439445]
               which lock already depends on the new lock.

[  293.447619]
               the existing dependency chain (in reverse order) is:
[  293.455096]
               -> #2 (&blkcg->lock){....}-{3:3}:
[  293.460948]        __lock_acquire+0x57c/0xbd0
[  293.465315]        lock_acquire.part.0+0xbd/0x260
[  293.470020]        _raw_spin_lock+0x37/0x80
[  293.474214]        blkg_create+0x3e2/0x1060
[  293.478401]        blkcg_init_disk+0x8f/0x460
[  293.482769]        __alloc_disk_node+0x27f/0x600
[  293.487397]        __blk_mq_alloc_disk+0x5f/0xd0
[  293.492025]        nvme_alloc_ns+0x202/0x17a0 [nvme_core]
[  293.497458]        nvme_scan_ns+0x30b/0x380 [nvme_core]
[  293.502709]        async_run_entry_fn+0x9a/0x4f0
[  293.507330]        process_one_work+0xd8b/0x1320
[  293.511956]        worker_thread+0x5f3/0xfe0
[  293.516231]        kthread+0x3b4/0x770
[  293.519992]        ret_from_fork+0x393/0x480
[  293.524272]        ret_from_fork_asm+0x1a/0x30
[  293.528728]
               -> #1 (&q->queue_lock){..-.}-{3:3}:
[  293.534749]        __lock_acquire+0x57c/0xbd0
[  293.539108]        lock_acquire.part.0+0xbd/0x260
[  293.543814]        _raw_spin_lock_irq+0x3f/0x90
[  293.548348]        throtl_pending_timer_fn+0x11c/0x15b0
[  293.553581]        call_timer_fn+0x19c/0x3e0
[  293.557853]        __run_timers+0x627/0x9f0
[  293.562041]        run_timer_base+0xe6/0x140
[  293.566312]        run_timer_softirq+0x1a/0x30
[  293.570758]        handle_softirqs+0x1fd/0x890
[  293.575205]        __irq_exit_rcu+0xfd/0x250
[  293.579477]        irq_exit_rcu+0xe/0x30
[  293.583402]        sysvec_apic_timer_interrupt+0xa1/0xd0
[  293.588717]        asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  293.594383]        cpuidle_enter_state+0xf5/0x2f0
[  293.599090]        cpuidle_enter+0x4e/0xa0
[  293.603197]        cpuidle_idle_call+0x213/0x370
[  293.607816]        do_idle+0x131/0x200
[  293.611568]        cpu_startup_entry+0x54/0x60
[  293.616017]        start_secondary+0x20d/0x290
[  293.620471]        common_startup_64+0x13e/0x141
[  293.625096]
               -> #0 ((&sq->pending_timer)){+.-.}-{0:0}:
[  293.631642]        check_prev_add+0xf1/0xcd0
[  293.635921]        validate_chain+0x487/0x570
[  293.640281]        __lock_acquire+0x57c/0xbd0
[  293.644641]        lock_acquire.part.0+0xbd/0x260
[  293.649345]        __timer_delete_sync+0x40/0x120
[  293.654052]        throtl_pd_free+0x19/0x40
[  293.658238]        blkcg_deactivate_policy+0x2c9/0x4e0
[  293.663378]        blk_throtl_exit+0xa5/0x100
[  293.667743]        blkcg_exit_disk+0x1f/0x270
[  293.672104]        disk_release+0x11b/0x3a0
[  293.676299]        device_release+0x9f/0x210
[  293.680579]        kobject_cleanup+0x105/0x360
[  293.685027]        null_del_dev.part.0+0x1e5/0x480 [null_blk]
[  293.690788]        nullb_group_drop_item+0xa5/0xd0 [null_blk]
[  293.696544]        configfs_rmdir+0x69f/0xac0
[  293.700910]        vfs_rmdir+0x1a5/0x5b0
[  293.704836]        do_rmdir+0x276/0x330
[  293.708677]        __x64_sys_unlinkat+0x16b/0x1e0
[  293.713393]        do_syscall_64+0x94/0x8d0
[  293.717584]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  293.723160]
               other info that might help us debug this:

[  293.731159] Chain exists of:
                 (&sq->pending_timer) --> &q->queue_lock --> &blkcg->lock

[  293.742045]  Possible unsafe locking scenario:

[  293.747972]        CPU0                    CPU1
[  293.752511]        ----                    ----
[  293.757043]   lock(&blkcg->lock);
[  293.760371]                                lock(&q->queue_lock);
[  293.766387]                                lock(&blkcg->lock);
[  293.772226]   lock((&sq->pending_timer));
[  293.776248]
                *** DEADLOCK ***

[  293.782166] 8 locks held by find/1609:
[  293.785921]  #0: ffff88813ddf6448 (sb_writers#16){.+.+}-{0:0}, at:
do_rmdir+0x1a8/0x330
[  293.793945]  #1: ffff88829e48a108 (&default_group_class[depth -
1]/1){+.+.}-{4:4}, at: do_rmdir+0x1ec/0x330
[  293.803704]  #2: ffff8881f918cb48
(&sb->s_type->i_mutex_key#22){+.+.}-{4:4}, at: vfs_rmdir+0xc0/0x5b0
[  293.812943]  #3: ffffffffc1cc4698
(&nullb_subsys.su_mutex){+.+.}-{4:4}, at: configfs_rmdir+0x57b/0xac0
[  293.822267]  #4: ffffffffc1ccc130 (&lock){+.+.}-{4:4}, at:
nullb_group_drop_item+0x50/0xd0 [null_blk]
[  293.831516]  #5: ffff88829ddb9980 (&q->blkcg_mutex){+.+.}-{4:4},
at: blkcg_deactivate_policy+0xf6/0x4e0
[  293.840926]  #6: ffff88829ddb9650 (&q->queue_lock){..-.}-{3:3}, at:
blkcg_deactivate_policy+0x10a/0x4e0
[  293.850339]  #7: ffff8882b7794948 (&blkcg->lock){....}-{3:3}, at:
blkcg_deactivate_policy+0x1e7/0x4e0
[  293.859577]
               stack backtrace:
[  293.863939] CPU: 11 UID: 0 PID: 1609 Comm: find Kdump: loaded Not
tainted 6.17.0-rc6.v1+ #2 PREEMPT(voluntary)
[  293.863946] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
2.17.0 12/04/2024
[  293.863949] Call Trace:
[  293.863953]  <TASK>
[  293.863959]  dump_stack_lvl+0x6f/0xb0
[  293.863970]  print_circular_bug.cold+0x38/0x45
[  293.863981]  check_noncircular+0x148/0x160
[  293.863997]  check_prev_add+0xf1/0xcd0
[  293.864001]  ? alloc_chain_hlocks+0x13e/0x1d0
[  293.864007]  ? srso_return_thunk+0x5/0x5f
[  293.864013]  ? add_chain_cache+0x12c/0x310
[  293.864022]  validate_chain+0x487/0x570
[  293.864027]  ? srso_return_thunk+0x5/0x5f
[  293.864037]  __lock_acquire+0x57c/0xbd0
[  293.864043]  ? srso_return_thunk+0x5/0x5f
[  293.864052]  lock_acquire.part.0+0xbd/0x260
[  293.864057]  ? __timer_delete_sync+0x23/0x120
[  293.864066]  ? srso_return_thunk+0x5/0x5f
[  293.864071]  ? rcu_is_watching+0x15/0xb0
[  293.864076]  ? blkcg_deactivate_policy+0x1e7/0x4e0
[  293.864080]  ? srso_return_thunk+0x5/0x5f
[  293.864085]  ? lock_acquire+0x10b/0x150
[  293.864092]  ? __timer_delete_sync+0x23/0x120
[  293.864098]  __timer_delete_sync+0x40/0x120
[  293.864103]  ? __timer_delete_sync+0x23/0x120
[  293.864111]  throtl_pd_free+0x19/0x40
[  293.864117]  blkcg_deactivate_policy+0x2c9/0x4e0
[  293.864132]  blk_throtl_exit+0xa5/0x100
[  293.864140]  blkcg_exit_disk+0x1f/0x270
[  293.864150]  disk_release+0x11b/0x3a0
[  293.864157]  device_release+0x9f/0x210
[  293.864164]  kobject_cleanup+0x105/0x360
[  293.864172]  null_del_dev.part.0+0x1e5/0x480 [null_blk]
[  293.864189]  nullb_group_drop_item+0xa5/0xd0 [null_blk]
[  293.864202]  configfs_rmdir+0x69f/0xac0
[  293.864210]  ? __pfx_may_link+0x10/0x10
[  293.864221]  ? __pfx_configfs_rmdir+0x10/0x10
[  293.864235]  vfs_rmdir+0x1a5/0x5b0
[  293.864244]  do_rmdir+0x276/0x330
[  293.864252]  ? __pfx_do_rmdir+0x10/0x10
[  293.864262]  ? ktime_get_coarse_real_ts64+0x121/0x180
[  293.864271]  ? srso_return_thunk+0x5/0x5f
[  293.864276]  ? getname_flags.part.0+0xf8/0x480
[  293.864285]  __x64_sys_unlinkat+0x16b/0x1e0
[  293.864293]  do_syscall_64+0x94/0x8d0
[  293.864298]  ? srso_return_thunk+0x5/0x5f
[  293.864304]  ? srso_return_thunk+0x5/0x5f
[  293.864309]  ? fput_close_sync+0x156/0x1c0
[  293.864316]  ? __pfx_fput_close_sync+0x10/0x10
[  293.864326]  ? srso_return_thunk+0x5/0x5f
[  293.864333]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  293.864337]  ? srso_return_thunk+0x5/0x5f
[  293.864342]  ? lockdep_hardirqs_on+0x78/0x100
[  293.864347]  ? srso_return_thunk+0x5/0x5f
[  293.864351]  ? do_syscall_64+0x139/0x8d0
[  293.864355]  ? srso_return_thunk+0x5/0x5f
[  293.864362]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  293.864366]  ? srso_return_thunk+0x5/0x5f
[  293.864371]  ? lockdep_hardirqs_on+0x78/0x100
[  293.864376]  ? srso_return_thunk+0x5/0x5f
[  293.864381]  ? do_syscall_64+0x139/0x8d0
[  293.864385]  ? srso_return_thunk+0x5/0x5f
[  293.864393]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  293.864398] RIP: 0033:0x7f13bc09bb9b
[  293.864404] Code: 77 05 c3 0f 1f 40 00 48 8b 15 71 52 0d 00 f7 d8
64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 07 01 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 52 0d 00 f7 d8 64 89
01 48
[  293.864408] RSP: 002b:00007ffee8a880e8 EFLAGS: 00000216 ORIG_RAX:
0000000000000107
[  293.864414] RAX: ffffffffffffffda RBX: 00005593f2ad7fb0 RCX: 00007f13bc09bb9b
[  293.864417] RDX: 0000000000000200 RSI: 00005593f2ad7fb0 RDI: 0000000000000005
[  293.864421] RBP: 0000000000000200 R08: 00000000ffffffff R09: 00005593f2acfd50
[  293.864424] R10: 00005593f2ac8010 R11: 0000000000000216 R12: 00005593f2ace330
[  293.864427] R13: 0000000000000001 R14: 0000000000000005 R15: 00007ffee8a890bb
[  293.864445]  </TASK>

On Tue, Sep 16, 2025 at 9:20 AM Yu Kuai <yukuai1@...weicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@...wei.com>
>
> Tightening the throttle activation check in blk_throtl_activated() to
> require both q->td presence and policy bit set introduced a memory leak
> during disk release:
>
> blkg_destroy_all() clears the policy bit first during queue deactivation,
> causing subsequent blk_throtl_exit() to skip throtl_data cleanup when
> blk_throtl_activated() fails policy check.
>
> Fix by reordering operations in disk_release() to call blk_throtl_exit()
> while throttle policy is still active. We avoid modifying blk_throtl_exit()
> activation check because it's intuitive that blk-throtl start from
> blk_throtl_init() and end in blk_throtl_exit().
>
> Fixes: bd9fd5be6bc0 ("blk-throttle: fix access race during throttle policy activation")
> Reported-by: Yi Zhang <yi.zhang@...hat.com>
> Closes: https://lore.kernel.org/all/CAHj4cs-p-ZwBEKigBj7T6hQCOo-H68-kVwCrV6ZvRovrr9Z+HA@mail.gmail.com/
> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> ---
>  block/blk-cgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 7246fc256315..64a56c8697f9 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -1539,8 +1539,8 @@ int blkcg_init_disk(struct gendisk *disk)
>
>  void blkcg_exit_disk(struct gendisk *disk)
>  {
> -       blkg_destroy_all(disk);
>         blk_throtl_exit(disk);
> +       blkg_destroy_all(disk);
>  }
>
>  static void blkcg_exit(struct task_struct *tsk)
> --
> 2.39.2
>
>


-- 
Best Regards,
  Yi Zhang