[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3dad09ce-151d-41fc-8137-56a931c4c224@flourine.local>
Date: Wed, 16 Apr 2025 10:30:11 +0200
From: Daniel Wagner <dwagner@...e.de>
To: Mohamed Khalfella <mkhalfella@...estorage.com>
Cc: Daniel Wagner <wagi@...nel.org>, Christoph Hellwig <hch@....de>,
Sagi Grimberg <sagi@...mberg.me>, Keith Busch <kbusch@...nel.org>, Hannes Reinecke <hare@...e.de>,
John Meneghini <jmeneghi@...hat.com>, randyj@...estorage.com, linux-nvme@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 3/3] nvme: delay failover by command quiesce timeout
On Tue, Apr 15, 2025 at 05:40:16PM -0700, Mohamed Khalfella wrote:
> On 2025-04-15 14:17:48 +0200, Daniel Wagner wrote:
> > Pasthrough commands should fail immediately. Userland is in charge here,
> > not the kernel. At least this what should happen here.
>
> I see your point. Unless I am missing something these requests should be
> held equally to bio requests from multipath layer. Let us say app
> submitted write a request that got canceled immediately, how does the app
> know when it is safe to retry the write request?
Good question, but nothing new as far I can tell. If the kernel doesn't
start to retry passthru IO commands, we have to figure out how to pass
additional information to the userland.
> Holding requests like write until it is safe to be retried is the whole
> point of this work, right?
My first goal was to address the IO commands submitted via the block
layer. I didn't had the IO passthru interface on my radar. I agree,
getting the IO passthru path correct is also good idea.
Anyway, I don't have a lot of experience with the IO passthru interface.
A quick test to fitgure out what the status quo is: 'nvme read'
results in an EINTR or 'Resource temporarily unavailable' (EAGAIN or
EWOULDBLOCK) when a tcp connection between host and target is blocked
(*), though it comes from an admin command. It looks like I have to dive
into this a bit more.
(*) I think this might be the same problem as the systemd folks have
reported.
Powered by blists - more mailing lists