lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251204195759.GC337106-mkhalfella@purestorage.com>
Date: Thu, 4 Dec 2025 11:57:59 -0800
From: Mohamed Khalfella <mkhalfella@...estorage.com>
To: Bart Van Assche <bvanassche@....org>
Cc: Chaitanya Kulkarni <kch@...dia.com>, Christoph Hellwig <hch@....de>,
	Jens Axboe <axboe@...nel.dk>, Keith Busch <kbusch@...nel.org>,
	Sagi Grimberg <sagi@...mberg.me>,
	Casey Chen <cachen@...estorage.com>,
	Yuanyuan Zhong <yzhong@...estorage.com>,
	Hannes Reinecke <hare@...e.de>, Ming Lei <ming.lei@...hat.com>,
	Waiman Long <llong@...hat.com>, Hillf Danton <hdanton@...a.com>,
	linux-nvme@...ts.infradead.org, linux-block@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] block: Use RCU in blk_mq_[un]quiesce_tagset()
 instead of set->tag_list_lock

On Thu 2025-12-04 09:31:55 -1000, Bart Van Assche wrote:
> On 12/4/25 9:15 AM, Mohamed Khalfella wrote:
> > The stacktraces are from old 6.6.9 kernel.
> 
> Please always include stack traces from a recent upstream kernel in
> patch descriptions.
> 

Good point. Will do that in next version of the patch.

> > However, the issue is still
> > applicable to recent kernels. This is an example from 6.13 kernel.
> 
> Thanks, these stack traces make it clear what is causing the deadlock.
> 
>  From nvme_timeout():
> 
> 	/*
> 	 * Reset immediately if the controller is failed
> 	 */
> 	if (nvme_should_reset(dev, csts)) {
> 		nvme_warn_reset(dev, csts);
> 		nvme_dev_disable(dev, false);
> 		nvme_reset_ctrl(&dev->ctrl);
> 		return BLK_EH_DONE;
> 	}
> 
> Is my understanding correct that the above code is involved in the
> reported deadlock? If so, has it been considered to run the code inside
> the if-statement asynchronously (queue_work()) instead of calling it
> synchronously? Would this be sufficient to fix the deadlock?
> 

Yes, the above code is involved in the deadlock. I do not see how
running this code in another thread will solve the problem. It will
still cause a deadlock between blk_mq_quiesce_tagset() and 
blk_mq_del_queue_tag_set(). The later is holding the mutex and while
waiting for the queue to be frozen. The former wants the mutex in order
to make progress and cancel inflight requests to let the queue to be
frozen. I do not see how this will make a difference.

> Thanks,
> 
> Bart.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ