lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B7593F4.2050102@cs.wisc.edu>
Date:	Fri, 12 Feb 2010 11:46:28 -0600
From:	Mike Christie <michaelc@...wisc.edu>
To:	Tomohiro Kusumi <kusumi.tomohiro@...fujitsu.com>
CC:	linux-scsi@...r.kernel.org, James.Bottomley@...e.de,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] scsi_transport_fc: handle transient error on multipath
 environment

On 02/12/2010 02:09 AM, Tomohiro Kusumi wrote:
> @@ -1953,6 +1987,13 @@
>   {
>   	struct fc_rport *rport = starget_to_rport(scsi_target(scmd->device));
> 
> +	if (rport->recover_transient_error) {
> +		fc_queue_work(scmd->device->host,&rport->rport_te_work);
> +		scmd->result = ((scmd->result&  0xFF00FFFF) |
> +				(DID_TRANSPORT_DISRUPTED<<  16));
> +		return BLK_EH_HANDLED;
> +	}
> +
>   	if (rport->port_state == FC_PORTSTATE_BLOCKED)
>   		return BLK_EH_RESET_TIMER;

- For the link down case you mentioned, would we see that the rport is
blocked here then we would return RESET_TIMER. If the fast_io_fail tmo
is set, then that would fail io quickly upwards (the fast io fail timo
would probably fire before the cmd even timed out).

What transport problems are you seeing where the rport is not blocked
and the scsi cmd timer fires? Would it be mostly buggy switches or
something like that?

- Maybe you want to instead hook something into the dm-mutlipath's
request (no more bios like in 2004 :)). Can you set a timer on that
level of request. If that times out then, dm-multipath could do
something like call blk_abort_queue.

I think the problem with blk_abort_queue might be that it stops all IO
to the entire host where you probably just want to work on the remote
port/path. For that you could call something like
recover_transient_error. Maybe it could just be a call to
terminate_rport_io from a workqueue though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ