linux-kernel - Re: CVE-2025-40146: blk-mq: fix potential deadlock while nr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c0b453e1-c141-412c-ae46-9686656be2f6@huaweicloud.com>
Date: Fri, 28 Nov 2025 17:44:22 +0800
From: Zheng Qixing <zhengqixing@...weicloud.com>
To: Nilay Shroff <nilay@...ux.ibm.com>
Cc: cve@...nel.org, linux-kernel@...r.kernel.org, yukuai@...as.com,
 ming.lei@...hat.com, "zhangyi (F)" <yi.zhang@...wei.com>,
 Hou Tao <houtao1@...wei.com>, yangerkun <yangerkun@...wei.com>,
 Greg KH <gregkh@...uxfoundation.org>, zhengqixing@...wei.com
Subject: Re: CVE-2025-40146: blk-mq: fix potential deadlock while nr_requests
 grown


在 2025/11/28 15:15, Nilay Shroff 写道:
>> commit b86433721f46d934940528f28d49c1dedb690df1 (HEAD -> master)
>> Author: Yu Kuai <yukuai3@...wei.com>
>> Date:   Wed Sep 10 16:04:43 2025 +0800
>>
>>      blk-mq: fix potential deadlock while nr_requests grown
>>
>>      Allocate and free sched_tags while queue is freezed can deadlock[1],
>>      this is a long term problem, hence allocate memory before freezing
>>      queue and free memory after queue is unfreezed.
>>
>>      [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/
>>      Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs")
>>
>>      Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>>      Reviewed-by: Nilay Shroff <nilay@...ux.ibm.com>
>>      Signed-off-by: Jens Axboe <axboe@...nel.dk>
>>
>> We are assume that what's the problem Yu describe is when we update
>> nr_request, we may need some memory allocation(nr_requests grows). And
>> the memory allocation may trigger some memory reclaim, and fall into
>> another I/O process, and since the request_queue has been freezen, there
>> exist deadlock.
>>
>> But after checking the source code, there exist
>> queue_requests_store->blk_mq_freeze_queue->memalloc_noio_save, the
>> whole process which may trigger memory allocation won't trigger I/O
>> process. So deadlock can not happened... And if that's true, this patch
>> does not fix any problem.
>>
> Yes, memalloc_noio_save() is invoked before we freeze the queue (e.g., in
> elv_iosched_store()), but that does not prevent the deadlock scenario described
> in the lockdep splat.
>
> If you look closely at the splat, the problematic lock is not fs_reclaim (which
> may be the first impression), but rather ->pcpu_alloc_mutex. From the splat, the
> chain of dependencies looks like this:
>
> thread #0: blocked on q->elevator_lock
>    thread #1: blocked on ->pcpu_alloc_mutex
>      thread #2: blocked on fs-reclaim
>   
> Here is the key detail:
>
> Thread #0 is running under GFP_NOIO scope (due to memalloc_noio_save()).
> However, it is not blocked on fs_reclaim. Instead, it is blocked
> on ->elevator_lock.
>
> Thread #1 is also running with GFP_NOIO and holds ->elevator_lock
> while the queue is frozen. It is blocked on ->pcpu_alloc_mutex,
> which is already held by Thread #2 (the thread that is stuck in
> fs_reclaim). Thread #2 is running without GFP_NOIO scope.
>
> In other words:
> - GFP_NOIO prevents a thread from entering fs_reclaim, but it does
>    not prevent triggering per-CPU memory allocations, which require
>    taking ->pcpu_alloc_mutex.
> - This ->pcpu_alloc_mutex is the actual source of contention in the
>    splat, and it sits outside the protections offered by GFP_NOIO.
>
> That means:
> - Even though memalloc_noio_save() avoids fs reclaim recursion,
>    it does not prevent per-CPU allocations from blocking, and thus
>    it cannot prevent the deadlock involving ->pcpu_alloc_mutex.
>

Thank you for the detailed explanation.

Now I understand that there could indeed be a deadlock issue here :)


Thanks,

Qixing