linux-kernel - Re: [PATCH] nbd: fix false lockdep deadlock warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <08ce91d9-756a-a8fa-a988-a13ec74d8c1c@huaweicloud.com>
Date: Sat, 5 Jul 2025 09:15:17 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Yu Kuai <yukuai1@...weicloud.com>, Nilay Shroff <nilay@...ux.ibm.com>,
 Ming Lei <ming.lei@...hat.com>
Cc: josef@...icpanda.com, axboe@...nel.dk, hch@...radead.org, hare@...e.de,
 linux-block@...r.kernel.org, nbd@...er.debian.org,
 linux-kernel@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
 johnny.chenyi@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH] nbd: fix false lockdep deadlock warning

Hi,

在 2025/07/02 15:30, Yu Kuai 写道:
> Hi,
> 
> 在 2025/07/02 14:22, Nilay Shroff 写道:
>>
>>
>> On 7/2/25 8:02 AM, Ming Lei wrote:
>>> On Wed, Jul 02, 2025 at 09:12:09AM +0800, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> 在 2025/07/01 21:28, Nilay Shroff 写道:
>>>>>
>>>>>
>>>>> On 6/28/25 6:18 AM, Yu Kuai wrote:
>>>>>> Hi,
>>>>>>
>>>>>> 在 2025/06/27 19:04, Ming Lei 写道:
>>>>>>> I guess the patch in the following link may be simper, both two take
>>>>>>> similar approach:
>>>>>>>
>>>>>>> https://lore.kernel.org/linux-block/aFjbavzLAFO0Q7n1@fedora/
>>>>>>
>>>>>> I this the above approach has concurrent problems if nbd_start_device
>>>>>> concurrent with nbd_start_device:
>>>>>>
>>>>>> t1:
>>>>>> nbd_start_device
>>>>>> lock
>>>>>> num_connections = 1
>>>>>> unlock
>>>>>>       t2:
>>>>>>       nbd_add_socket
>>>>>>       lock
>>>>>>       config->num_connections++
>>>>>>       unlock
>>>>>>           t3:
>>>>>>           nbd_start_device
>>>>>>           lock
>>>>>>           num_connections = 2
>>>>>>           unlock
>>>>>>           blk_mq_update_nr_hw_queues
>>>>>>
>>>>>> blk_mq_update_nr_hw_queues
>>>>>> //nr_hw_queues updated to 1 before failure
>>>>>> return -EINVAL
>>>>>>
>>>>>
>>>>> In the above case, yes I see that t1 would return -EINVAL (as
>>>>> config->num_connections doesn't match with num_connections)
>>>>> but then t3 would succeed to update nr_hw_queue (as both
>>>>> config->num_connections and num_connections set to 2 this
>>>>> time). Isn't it? If yes, then the above patch (from Ming)
>>>>> seems good.
>>>>
>>>> Emm, I'm confused, If you agree with the concurrent process, then
>>>> t3 update nr_hw_queues to 2 first and return sucess, later t1 update
>>>> nr_hw_queues back to 1 and return failure.
>>>
>>> It should be easy to avoid failure by simple retrying.
>>>
>> Yeah I think retry should be a safe bet here.
>>
> 
> I really not sure about the retry, the above is just a scenario that I
> think of with a quick review, and there are still many concurrent
> scenarios that need to be checked, I'm kind of lost here.
> 
> Except nbd_start_device() and nbd_add_socked(), I'm not confident
> other context that is synchronized with config_lock is not broken.
> However, I'm ok with the bet.
> 
>> On another note, synchronizing nbd_start_device and nbd_add_socket
>> using nbd->task_setup looks more complex and rather we may use
>> nbd->pid to synchronize both. We need to move setting of nbd->pid
>> before we invoke blk_mq_update_nr_hw_queues in nbd_start_device.
>> Then in nbd_add_socket we can evaluate nbd->pid and if it's
>> non-NULL then we could assume that either nr_hw_queues update is in
>> progress or device has been setup and so return -EBUSY. I think
>> anyways updating number of connections once device is configured
>> would not be possible, so once nbd_start_device is initiated, we
>> shall prevent user adding more connections. If we follow this
>> approach then IMO we don't need to add retry discussed above.
> 
> It's ok for me to forbit nbd_add_socked after nbd is configured, there
> is nowhere to use the added sock. And if there really are other contexts
> need to be synchronized, I think nbd->pid can be used as well.
> 

Do we have a conclusion now? Feel free to send the retry version, or let
me know if I should send a new synchronize version.

Thanks,
Kuai

>>
>> Thanks,
>> --Nilay
>> .
>>
> 
> .
>