linux-kernel - Re: [RFC] training mpath to discern between SCSI errors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4CBC35AE.9050002@suse.de>
Date:	Mon, 18 Oct 2010 13:55:26 +0200
From:	Hannes Reinecke <hare@...e.de>
To:	Jun'ichi Nomura <j-nomura@...jp.nec.com>
Cc:	device-mapper development <dm-devel@...hat.com>,
	Kiyoshi Ueda <k-ueda@...jp.nec.com>, michaelc@...wisc.edu,
	tytso@....edu, linux-scsi@...r.kernel.org,
	Mike Snitzer <snitzer@...hat.com>, jaxboe@...ionio.com,
	jack@...e.cz, vst@...b.net, linux-kernel@...r.kernel.org,
	swhiteho@...hat.com, linux-raid@...r.kernel.org,
	linux-ide@...r.kernel.org, James.Bottomley@...e.de,
	chris.mason@...cle.com, konishi.ryusuke@....ntt.co.jp,
	linux-fsdevel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
	rwheeler@...hat.com, Christoph Hellwig <hch@....de>,
	Sergei Shtylyov <sshtylyov@...sta.com>
Subject: Re: [RFC] training mpath to discern between SCSI errors

On 10/18/2010 10:09 AM, Jun'ichi Nomura wrote:
> Hi Hannes,
> 
> Thank you for working on this issue and sorry for very late reply...
> 
> (08/30/10 23:52), Hannes Reinecke wrote:
>> From: Hannes Reinecke <hare@...e.de>
>> Date: Mon, 30 Aug 2010 16:21:10 +0200
>> Subject: [RFC][PATCH] scsi: Detailed I/O errors
>>
>> Instead of just passing 'EIO' for any I/O errors we should be
>> notifying the upper layers with some more details about the cause
>> of this error.
>> This patch updates the possible I/O errors to:
>>
>> - ENOLINK: Link failure between host and target
>> - EIO: Retryable I/O error
>> - EREMOTEIO: Non-retryable I/O error
>>
>> 'Retryable' in this context means that an I/O error _might_ be
>> restricted to the I_T_L nexus (vulgo: path), so retrying on another
>> nexus / path might succeed.
> 
> Does 'retryable' of EIO mean retryable in multipath layer?
> If so, what is the difference between EIO and ENOLINK?
> 
Yes, EIO is intended for errors which should be retried at the
multipath layer. This does _not_ include transport errors, which are
signalled by ENOLINK.

Basically, ENOLINK is a transport error, and EIO just means
something is wrong and we weren't able to classify it properly.
If we were, it'd be either ENOLINK or EREMOTEIO.

> I've heard of a case where just retrying within path-group is
> preferred to (relatively costly) switching group.
> So, if EIO (or other error code) can be used to indicate such type
> of errors, it's nice.
> 
Yes, that was one of the intention.

> 
> Also (although this might be a bit off topic from your patch),
> can we expand such a distinction to what should be logged?
> Currently, it's difficult to distinguish important SCSI/block errors
> and less important ones in kernel log.
> For example, when I get a link failure on sda, kernel prints something
> like below, regardless of whether the I/O is recovered by multipathing or not:
>   end_request: I/O error, dev sda, sector XXXXX
> 
Indeed, when using the above we could be modifying the above
message, eg by

end_request: transport error, dev sda, sector XXXXX

or

end_request: target error, dev sda, sector XXXXX

which would improve the output noticeable.

> Setting REQ_QUIET in dm-multipath could mask the message
> but also other important ones in SCSI.
> 
Hmm. Not sure about that, but I think the above modifications will
be useful already.

I'll be sending an updated patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@...e.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/