lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <63ae97dd-8284-4123-a879-a2680b6fab26@linux.ibm.com>
Date: Tue, 8 Jul 2025 16:43:05 +0530
From: Nilay Shroff <nilay@...ux.ibm.com>
To: Ming Lei <ming.lei@...hat.com>
Cc: Yu Kuai <yukuai1@...weicloud.com>, josef@...icpanda.com, axboe@...nel.dk,
        hch@...radead.org, hare@...e.de, linux-block@...r.kernel.org,
        nbd@...er.debian.org, linux-kernel@...r.kernel.org,
        yi.zhang@...wei.com, yangerkun@...wei.com, johnny.chenyi@...wei.com,
        "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH] nbd: fix false lockdep deadlock warning



On 7/8/25 1:04 PM, Ming Lei wrote:
> On Tue, Jul 08, 2025 at 10:42:00AM +0530, Nilay Shroff wrote:
>>
>>
>> On 7/5/25 6:45 AM, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2025/07/02 15:30, Yu Kuai 写道:
>>>> Hi,
>>>>
>>>> 在 2025/07/02 14:22, Nilay Shroff 写道:
>>>>>
>>>>>
>>>>> On 7/2/25 8:02 AM, Ming Lei wrote:
>>>>>> On Wed, Jul 02, 2025 at 09:12:09AM +0800, Yu Kuai wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> 在 2025/07/01 21:28, Nilay Shroff 写道:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/28/25 6:18 AM, Yu Kuai wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> 在 2025/06/27 19:04, Ming Lei 写道:
>>>>>>>>>> I guess the patch in the following link may be simper, both two take
>>>>>>>>>> similar approach:
>>>>>>>>>>
>>>>>>>>>> https://lore.kernel.org/linux-block/aFjbavzLAFO0Q7n1@fedora/
>>>>>>>>>
>>>>>>>>> I this the above approach has concurrent problems if nbd_start_device
>>>>>>>>> concurrent with nbd_start_device:
>>>>>>>>>
>>>>>>>>> t1:
>>>>>>>>> nbd_start_device
>>>>>>>>> lock
>>>>>>>>> num_connections = 1
>>>>>>>>> unlock
>>>>>>>>>       t2:
>>>>>>>>>       nbd_add_socket
>>>>>>>>>       lock
>>>>>>>>>       config->num_connections++
>>>>>>>>>       unlock
>>>>>>>>>           t3:
>>>>>>>>>           nbd_start_device
>>>>>>>>>           lock
>>>>>>>>>           num_connections = 2
>>>>>>>>>           unlock
>>>>>>>>>           blk_mq_update_nr_hw_queues
>>>>>>>>>
>>>>>>>>> blk_mq_update_nr_hw_queues
>>>>>>>>> //nr_hw_queues updated to 1 before failure
>>>>>>>>> return -EINVAL
>>>>>>>>>
>>>>>>>>
>>>>>>>> In the above case, yes I see that t1 would return -EINVAL (as
>>>>>>>> config->num_connections doesn't match with num_connections)
>>>>>>>> but then t3 would succeed to update nr_hw_queue (as both
>>>>>>>> config->num_connections and num_connections set to 2 this
>>>>>>>> time). Isn't it? If yes, then the above patch (from Ming)
>>>>>>>> seems good.
>>>>>>>
>>>>>>> Emm, I'm confused, If you agree with the concurrent process, then
>>>>>>> t3 update nr_hw_queues to 2 first and return sucess, later t1 update
>>>>>>> nr_hw_queues back to 1 and return failure.
>>>>>>
>>>>>> It should be easy to avoid failure by simple retrying.
>>>>>>
>>>>> Yeah I think retry should be a safe bet here.
>>>>>
>>>>
>>>> I really not sure about the retry, the above is just a scenario that I
>>>> think of with a quick review, and there are still many concurrent
>>>> scenarios that need to be checked, I'm kind of lost here.
>>>>
>>>> Except nbd_start_device() and nbd_add_socked(), I'm not confident
>>>> other context that is synchronized with config_lock is not broken.
>>>> However, I'm ok with the bet.
>>>>
>>>>> On another note, synchronizing nbd_start_device and nbd_add_socket
>>>>> using nbd->task_setup looks more complex and rather we may use
>>>>> nbd->pid to synchronize both. We need to move setting of nbd->pid
>>>>> before we invoke blk_mq_update_nr_hw_queues in nbd_start_device.
>>>>> Then in nbd_add_socket we can evaluate nbd->pid and if it's
>>>>> non-NULL then we could assume that either nr_hw_queues update is in
>>>>> progress or device has been setup and so return -EBUSY. I think
>>>>> anyways updating number of connections once device is configured
>>>>> would not be possible, so once nbd_start_device is initiated, we
>>>>> shall prevent user adding more connections. If we follow this
>>>>> approach then IMO we don't need to add retry discussed above.
>>>>
>>>> It's ok for me to forbit nbd_add_socked after nbd is configured, there
>>>> is nowhere to use the added sock. And if there really are other contexts
>>>> need to be synchronized, I think nbd->pid can be used as well.
>>>>
>>>
>>> Do we have a conclusion now? Feel free to send the retry version, or let
>>> me know if I should send a new synchronize version.
>>>
>> Personally, I prefer synchronizing nbd_start_device and nbd_add_socket
>> using nbd->pid but I do agree retry version would also work. Having
>> said that, lets wait for Ming's feedback as well.
> 
> IMO simpler fix, especially in concept, is more effective for addressing
> this kind of issue, with less chance to introduce regression.
> 
> If no one objects, I may send the retry version.
> 
Adding new socket/connection, after nr_hw_queue update is initiated or 
device is setup, is of no use. But I am also good with retry version. 
So please send your patch.

Thanks,
--Nilay


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ