lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <22e48664-63f3-4cc0-8b99-f56e98204e5b@flourine.local>
Date: Wed, 16 Apr 2025 08:57:19 +0200
From: Daniel Wagner <dwagner@...e.de>
To: Mohamed Khalfella <mkhalfella@...estorage.com>
Cc: Daniel Wagner <wagi@...nel.org>, Christoph Hellwig <hch@....de>, 
	Sagi Grimberg <sagi@...mberg.me>, Keith Busch <kbusch@...nel.org>, Hannes Reinecke <hare@...e.de>, 
	John Meneghini <jmeneghi@...hat.com>, randyj@...estorage.com, linux-nvme@...ts.infradead.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 3/3] nvme: delay failover by command quiesce timeout

On Tue, Apr 15, 2025 at 05:17:38PM -0700, Mohamed Khalfella wrote:
> Help me see this:
> 
> - nvme_failover_req() is the only place reqs are added to failover_list.
> - nvme_decide_disposition() returns FAILOVER only if req has REQ_NVME_MPATH set.
> 
> How/where do admin requests get REQ_NVME_MPATH set?

Admin commands don't set REQ_NVME_MPATH. This is what the current code
does and I have deliberately decided not to touch this with this RFC.

Given how much discussion the CQT/CCR feature triggers, I don't think
it's a good idea to add this topic to this discussion.

> > > - What about requests that do not go through nvme_failover_req(), like
> > >   passthrough requests, do we not want to hold these requests until it
> > >   is safe for them to be retried?
> > 
> > Pasthrough commands should fail immediately. Userland is in charge here,
> > not the kernel. At least this what should happen here.
> > 
> > > - In case of controller reset or delete if nvme_disable_ctrl()
> > >   successfully disables the controller, then we do not want to add
> > >   canceled requests to failover_list, right? Does this implementation
> > >   consider this case?
> > 
> > Not sure. I've tested a few things but I am pretty sure this RFC is far
> > from being complete.
> 
> I think it does not, and maybe it should honor this. Otherwise every
> controller reset/delete will end up holding requests unnecessarily.

Yes, this is one of the problems with the failover queue. It could be
solved by really starting to track the delay timeout for each commands.
But this is a lot of logic code and complexity. Thus during the
discussion at LSFMM everyone including me, said failover queue idea
should not be our first choice.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ