[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4CBC35AE.9050002@suse.de>
Date: Mon, 18 Oct 2010 13:55:26 +0200
From: Hannes Reinecke <hare@...e.de>
To: Jun'ichi Nomura <j-nomura@...jp.nec.com>
Cc: device-mapper development <dm-devel@...hat.com>,
Kiyoshi Ueda <k-ueda@...jp.nec.com>, michaelc@...wisc.edu,
tytso@....edu, linux-scsi@...r.kernel.org,
Mike Snitzer <snitzer@...hat.com>, jaxboe@...ionio.com,
jack@...e.cz, vst@...b.net, linux-kernel@...r.kernel.org,
swhiteho@...hat.com, linux-raid@...r.kernel.org,
linux-ide@...r.kernel.org, James.Bottomley@...e.de,
chris.mason@...cle.com, konishi.ryusuke@....ntt.co.jp,
linux-fsdevel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
rwheeler@...hat.com, Christoph Hellwig <hch@....de>,
Sergei Shtylyov <sshtylyov@...sta.com>
Subject: Re: [RFC] training mpath to discern between SCSI errors
On 10/18/2010 10:09 AM, Jun'ichi Nomura wrote:
> Hi Hannes,
>
> Thank you for working on this issue and sorry for very late reply...
>
> (08/30/10 23:52), Hannes Reinecke wrote:
>> From: Hannes Reinecke <hare@...e.de>
>> Date: Mon, 30 Aug 2010 16:21:10 +0200
>> Subject: [RFC][PATCH] scsi: Detailed I/O errors
>>
>> Instead of just passing 'EIO' for any I/O errors we should be
>> notifying the upper layers with some more details about the cause
>> of this error.
>> This patch updates the possible I/O errors to:
>>
>> - ENOLINK: Link failure between host and target
>> - EIO: Retryable I/O error
>> - EREMOTEIO: Non-retryable I/O error
>>
>> 'Retryable' in this context means that an I/O error _might_ be
>> restricted to the I_T_L nexus (vulgo: path), so retrying on another
>> nexus / path might succeed.
>
> Does 'retryable' of EIO mean retryable in multipath layer?
> If so, what is the difference between EIO and ENOLINK?
>
Yes, EIO is intended for errors which should be retried at the
multipath layer. This does _not_ include transport errors, which are
signalled by ENOLINK.
Basically, ENOLINK is a transport error, and EIO just means
something is wrong and we weren't able to classify it properly.
If we were, it'd be either ENOLINK or EREMOTEIO.
> I've heard of a case where just retrying within path-group is
> preferred to (relatively costly) switching group.
> So, if EIO (or other error code) can be used to indicate such type
> of errors, it's nice.
>
Yes, that was one of the intention.
>
> Also (although this might be a bit off topic from your patch),
> can we expand such a distinction to what should be logged?
> Currently, it's difficult to distinguish important SCSI/block errors
> and less important ones in kernel log.
> For example, when I get a link failure on sda, kernel prints something
> like below, regardless of whether the I/O is recovered by multipathing or not:
> end_request: I/O error, dev sda, sector XXXXX
>
Indeed, when using the above we could be modifying the above
message, eg by
end_request: transport error, dev sda, sector XXXXX
or
end_request: target error, dev sda, sector XXXXX
which would improve the output noticeable.
> Setting REQ_QUIET in dm-multipath could mask the message
> but also other important ones in SCSI.
>
Hmm. Not sure about that, but I think the above modifications will
be useful already.
I'll be sending an updated patch.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@...e.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists