linux-kernel - RE: [PATCH 1/1] scsi: scsi_transport_fc: Fix a bug in the error handling function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 8 Jan 2016 20:12:40 +0000
From:	KY Srinivasan <kys@...rosoft.com>
To:	James Bottomley <James.Bottomley@...senPartnership.com>,
	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
	"ohering@...e.com" <ohering@...e.com>,
	"jbottomley@...allels.com" <jbottomley@...allels.com>,
	"hch@...radead.org" <hch@...radead.org>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
	"apw@...onical.com" <apw@...onical.com>,
	"vkuznets@...hat.com" <vkuznets@...hat.com>,
	"jasowang@...hat.com" <jasowang@...hat.com>,
	"martin.petersen@...cle.com" <martin.petersen@...cle.com>,
	"hare@...e.de" <hare@...e.de>
CC:	"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [PATCH 1/1] scsi: scsi_transport_fc: Fix a bug in the error
 handling function



> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@...senPartnership.com]
> Sent: Friday, January 8, 2016 11:21 AM
> To: KY Srinivasan <kys@...rosoft.com>; gregkh@...uxfoundation.org; linux-
> kernel@...r.kernel.org; devel@...uxdriverproject.org; ohering@...e.com;
> jbottomley@...allels.com; hch@...radead.org; linux-scsi@...r.kernel.org;
> apw@...onical.com; vkuznets@...hat.com; jasowang@...hat.com;
> martin.petersen@...cle.com; hare@...e.de
> Cc: stable@...r.kernel.org
> Subject: Re: [PATCH 1/1] scsi: scsi_transport_fc: Fix a bug in the error
> handling function
> 
> On Fri, 2016-01-08 at 18:58 +0000, KY Srinivasan wrote:
> >
> > > -----Original Message-----
> > > From: James Bottomley
> [mailto:James.Bottomley@...senPartnership.com
> > > ]
> > > Sent: Thursday, January 7, 2016 3:49 PM
> > > To: KY Srinivasan <kys@...rosoft.com>; gregkh@...uxfoundation.org;
> > > linux-
> > > kernel@...r.kernel.org; devel@...uxdriverproject.org;
> > > ohering@...e.com;
> > > jbottomley@...allels.com; hch@...radead.org;
> > > linux-scsi@...r.kernel.org;
> > > apw@...onical.com; vkuznets@...hat.com; jasowang@...hat.com;
> > > martin.petersen@...cle.com; hare@...e.de
> > > Cc: stable@...r.kernel.org
> > > Subject: Re: [PATCH 1/1] scsi: scsi_transport_fc: Fix a bug in the
> > > error
> > > handling function
> > >
> > > On Thu, 2016-01-07 at 16:40 -0800, K. Y. Srinivasan wrote:
> > > > The macro startget_to_rport() can return NULL; handle that case
> > > > properly.
> > >
> > > OK, can we unwind why you think you could possibly need this?  It
> > > would
> > > mean that fc_timed_out was called for a non-FC device, which was
> > > thought to be an impossibility when the fc transport class was
> > > designed.
> >
> > As you know, on Hyper-V, FC devices are handled exactly like normal
> > scsi devices and the only additional information that is provided for
> > FC devices is the WWN for port and node. Till recently, I was not
> > publishing the WWN in the guest and so I was not even using the FC
> > transport. Recently, I implemented support for publishing the WWN in
> > the guest and for that I am using the FC transport for FC hosts. When
> > an FC LUN is dynamically removed, sometimes I see the timeout occurri
> > ng and since there is no rport associated with these devices I am
> > hitting the issue this patch is addressing. I could have addressed
> > this problem by establishing a storvsc specific time out function
> > even for FC devices - the same timeout function that I currently use
> > for scsi devices -  storvsc_eh_timed_out(). I chose to instead fix
> > the fc_timed_out() function since the code was not handling a
> > possible condition.
> 
> OK, so the specific problem is that the device is partly torn down when
> the timeout fires?  I'm having a hard time seeing how we get a null
> rport in that case.  The starget_to_rport() can only return NULL if the
> parent isn't an rport ... that shouldn't depend on the state of the FC
> device because the parent is torn down after the child.

In our case, the parent is not an rport since I don't invoke fc_remote_port_add() and
so I do get a NULL value from the starget_to_rport().

> 
> In any case, returning BLK_EH_RESET_TIMER will cause all sorts of
> problems because it resets the timer to fire again for the device.
>  What you want is something to return BLK_EH_HANDLED which will just
> complete the request ... probably at a generic level, since this
> doesn't sound to be specific to FC.

On Hyper-V, the host implements a variety of recovery strategies and for that reason,
the  eh_timed_out handler for standard scsi devices will effectively have infinite timeout value:
storvsc_eh_timed_out() just resets the timer. This is the behavior I wanted for the FC devices as well.

K. Y

> 
> Something like the below ... assuming the teardown issue is the real
> problem.
> 
> James
> 
> ---
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 984ddcb..3c514c6 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -273,6 +273,10 @@ enum blk_eh_timer_return scsi_times_out(struct
> request *req)
>  	enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
>  	struct Scsi_Host *host = scmd->device->host;
> 
> +	/* timeout for an already dead device, just kill the request */
> +	if (scmd->device->sdev_state == SDEV_DEL)
> +		return BLK_EH_HANDLED;
> +
>  	trace_scsi_dispatch_cmd_timeout(scmd);
>  	scsi_log_completion(scmd, TIMEOUT_ERROR);
>