lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db034829-7140-1374-0132-7d27240b8b3b@grimberg.me>
Date:   Wed, 24 Apr 2019 13:58:56 -0700
From:   Sagi Grimberg <sagi@...mberg.me>
To:     David Woodhouse <dwmw2@...radead.org>,
        Keith Busch <kbusch@...nel.org>
Cc:     Jens Axboe <axboe@...com>, James Smart <james.smart@...adcom.com>,
        linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Keith Busch <keith.busch@...el.com>,
        Maximilian Heyne <mheyne@...zon.de>,
        Amit Shah <aams@...zon.de>, Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v2 0/2] Adding per-controller timeout support to nvme


>>>> As different nvme controllers are connect via different fabrics, some require
>>>> different timeout settings than others. This series implements per-controller
>>>> timeouts in the nvme subsystem which can be set via sysfs.
>>>
>>> How much of a real issue is this?
>>>
>>> block io_timeout defaults to 30 seconds which are considered a universal
>>> eternity for pretty much any nvme fabric. Moreover, io_timeout is
>>> mutable already on a per-namespace level.
>>>
>>> This leaves the admin_timeout which goes beyond this to 60 seconds...
>>>
>>> Can you describe what exactly are you trying to solve?
>>
>> I think they must have an nvme target that is backed by slow media
>> (i.e. non-SSD). If that's the case, I think it may be a better option
>> if the target advertises relatively shallow queue depths and/or lower
>> MDTS that better aligns to the backing storage capabilies.
> 
> It isn't that the media is slow; the max timeout is based on the SLA
> for certain classes of "fabric" outages. Linux copes *really* badly
> with I/O errors, and if we can make the timeout last long enough to
> cover the switch restart worst case, then users are a lot happier.

Well, what is usually done to handle fabric outages is having multiple
paths to the storage device, not sure if that is applicable for you or
not...

What do you mean by "Linux copes *really* badly with I/O errors"? What
can be done better?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ