[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <df0c4a9cce5928fdc8ba3a1858e4c6611edb4474.camel@infradead.org>
Date: Thu, 25 Apr 2019 07:45:31 +0200
From: David Woodhouse <dwmw2@...radead.org>
To: Sagi Grimberg <sagi@...mberg.me>, Keith Busch <kbusch@...nel.org>
Cc: Jens Axboe <axboe@...com>, James Smart <james.smart@...adcom.com>,
linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
Keith Busch <keith.busch@...el.com>,
Maximilian Heyne <mheyne@...zon.de>,
Amit Shah <aams@...zon.de>, Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v2 0/2] Adding per-controller timeout support to nvme
On Wed, 2019-04-24 at 13:58 -0700, Sagi Grimberg wrote:
> > It isn't that the media is slow; the max timeout is based on the SLA
> > for certain classes of "fabric" outages. Linux copes *really* badly
> > with I/O errors, and if we can make the timeout last long enough to
> > cover the switch restart worst case, then users are a lot happier.
>
> Well, what is usually done to handle fabric outages is having multiple
> paths to the storage device, not sure if that is applicable for you or
> not...
Yeah, that turns out to be impractical in this case.
> What do you mean by "Linux copes *really* badly with I/O errors"? What
> can be done better?
There's not a lot that can be done here in the short term. If file
systems get errors on certain I/O, then graceful recovery would be
complicated to achieve.
Better for the I/O timeout to be set higher than the known worst case
time for successful completion.
Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5174 bytes)
Powered by blists - more mailing lists