[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aF56oVEzTygIOUTN@fedora>
Date: Fri, 27 Jun 2025 19:04:01 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Yu Kuai <yukuai1@...weicloud.com>
Cc: josef@...icpanda.com, axboe@...nel.dk, hch@...radead.org,
nilay@...ux.ibm.com, hare@...e.de, linux-block@...r.kernel.org,
nbd@...er.debian.org, linux-kernel@...r.kernel.org,
yukuai3@...wei.com, yi.zhang@...wei.com, yangerkun@...wei.com,
johnny.chenyi@...wei.com
Subject: Re: [PATCH] nbd: fix false lockdep deadlock warning
On Fri, Jun 27, 2025 at 05:23:48PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@...wei.com>
>
> The deadlock is reported because there are circular dependency:
>
> t1: disk->open_mutex -> nbd->config_lock
>
> blkdev_release
> bdev_release
> //lock disk->open_mutex)
> blkdev_put_whole
> nbd_release
> nbd_config_put
> refcount_dec_and_mutex_lock
> //lock nbd->config_lock
>
> t2: nbd->config_lock -> set->update_nr_hwq_lock
>
> nbd_genl_connect
> //lock nbd->config_lock
> nbd_start_device
> blk_mq_update_nr_hw_queues
> //lock set->update_nr_hwq_lock
>
> t3: set->update_nr_hwq_lock -> disk->open_mutex
>
> nbd_dev_remove_work
> nbd_dev_remove
> del_gendisk
> down_read(&set->update_nr_hwq_lock);
> __del_gendisk
> mutex_lock(&disk->open_mutex);
>
> This is false warning because t1 and t2 should be synchronized by
> nbd->refs, and t1 is still holding the reference while t2 is triggered
> when the reference is decreased to 0. However the lock order is broken.
>
> Fix the problem by breaking the dependency from t2, by calling
> blk_mq_update_nr_hw_queues() outside of nbd internal config_lock, since
> now other context can concurrent with nbd_start_device(), also make sure
> they will still return -EBUSY, the difference is that they will not wait
> for nbd_start_device() to be done.
>
> Fixes: 98e68f67020c ("block: prevent adding/deleting disk during updating nr_hw_queues")
> Reported-by: syzbot+2bcecf3c38cb3e8fdc8d@...kaller.appspotmail.com
> Closes: https://lore.kernel.org/all/6855034f.a00a0220.137b3.0031.GAE@google.com/
> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> ---
> drivers/block/nbd.c | 28 ++++++++++++++++++++++------
> 1 file changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 7bdc7eb808ea..d43e8e73aeb3 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -1457,10 +1457,13 @@ static void nbd_config_put(struct nbd_device *nbd)
> }
> }
>
> -static int nbd_start_device(struct nbd_device *nbd)
> +static int nbd_start_device(struct nbd_device *nbd, bool netlink)
> + __releases(&nbd->config_lock)
> + __acquires(&nbd->config_lock)
> {
> struct nbd_config *config = nbd->config;
> int num_connections = config->num_connections;
> + struct task_struct *old;
> int error = 0, i;
>
> if (nbd->pid)
> @@ -1473,8 +1476,21 @@ static int nbd_start_device(struct nbd_device *nbd)
> return -EINVAL;
> }
>
> - blk_mq_update_nr_hw_queues(&nbd->tag_set, config->num_connections);
> + /*
> + * synchronize with concurrent nbd_start_device() and
> + * nbd_add_socket()
> + */
> nbd->pid = task_pid_nr(current);
> + if (!netlink) {
> + old = nbd->task_setup;
> + nbd->task_setup = current;
> + }
> +
> + mutex_unlock(&nbd->config_lock);
> + blk_mq_update_nr_hw_queues(&nbd->tag_set, config->num_connections);
> + mutex_lock(&nbd->config_lock);
> + if (!netlink)
> + nbd->task_setup = old;
I guess the patch in the following link may be simper, both two take
similar approach:
https://lore.kernel.org/linux-block/aFjbavzLAFO0Q7n1@fedora/
thanks,
Ming
Powered by blists - more mailing lists