[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4cd2cbb4-95ff-4f3b-b33b-9c066147d12b@flourine.local>
Date: Tue, 15 Apr 2025 14:11:04 +0200
From: Daniel Wagner <dwagner@...e.de>
To: Sagi Grimberg <sagi@...mberg.me>
Cc: Mohamed Khalfella <mkhalfella@...estorage.com>,
Daniel Wagner <wagi@...nel.org>, Christoph Hellwig <hch@....de>, Keith Busch <kbusch@...nel.org>,
Hannes Reinecke <hare@...e.de>, John Meneghini <jmeneghi@...hat.com>, randyj@...estorage.com,
linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 3/3] nvme: delay failover by command quiesce timeout
On Tue, Apr 15, 2025 at 01:28:15AM +0300, Sagi Grimberg wrote:
> > > +void nvme_schedule_failover(struct nvme_ctrl *ctrl)
> > > +{
> > > + unsigned long delay;
> > > +
> > > + if (ctrl->cqt)
> > > + delay = msecs_to_jiffies(ctrl->cqt);
> > > + else
> > > + delay = ctrl->kato * HZ;
> > I thought that delay = m * ctrl->kato + ctrl->cqt
> > where m = ctrl->ctratt & NVME_CTRL_ATTR_TBKAS ? 3 : 2
> > no?
>
> This was said before, but if we are going to always start waiting for kato
> for failover purposes,
> we first need a patch that prevent kato from being arbitrarily long.
That should be addressed with the cross controller reset (CCR). The KATO*n
+ CQT is the upper limit for the target recovery. As soon we have CCR,
the recovery delay is reduced to the time the CCR exchange takes.
> Lets cap kato to something like 10 seconds (which is 2x the default which
> apparently no one is touching).
If I understood the TP4129 the upper limit is now defined, so we don't
have to define our own upper limit.
Powered by blists - more mailing lists