linux-kernel - Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1285362769.1849.256.camel@haakon2.linux-iscsi.org>
Date:	Fri, 24 Sep 2010 14:12:49 -0700
From:	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>
To:	Brian King <brking@...ux.vnet.ibm.com>
Cc:	linux-scsi <linux-scsi@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Vasu Dev <vasu.dev@...ux.intel.com>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Andi Kleen <ak@...ux.intel.com>,
	Matthew Wilcox <willy@...ux.intel.com>,
	James Bottomley <James.Bottomley@...e.de>,
	Mike Christie <michaelc@...wisc.edu>,
	James Smart <james.smart@...lex.com>,
	Andrew Vasquez <andrew.vasquez@...gic.com>,
	FUJITA Tomonori <fujita.tomonori@....ntt.co.jp>,
	Hannes Reinecke <hare@...e.de>,
	Joe Eykholt <jeykholt@...co.com>,
	Christoph Hellwig <hch@....de>,
	MPTFusionLinux <DL-MPTFusionLinux@....com>,
	"eata.c maintainer" <dario.ballabio@...ind.it>
Subject: Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in
	scsi_dispatch_cmd()

On Fri, 2010-09-24 at 16:10 -0500, Brian King wrote:
> On 09/24/2010 03:44 PM, Nicholas A. Bellinger wrote:
> > On Fri, 2010-09-24 at 08:41 -0500, Brian King wrote:
> >> On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote:
> >>> @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
> >>>  int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>>  {
> >>>  	struct Scsi_Host *host = cmd->device->host;
> >>> -	unsigned long flags = 0;
> >>>  	unsigned long timeout;
> >>>  	int rtn = 0;
> >>>
> >>> @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>>  		scsi_done(cmd);
> >>>  		goto out;
> >>>  	}
> >>> -
> >>> -	spin_lock_irqsave(host->host_lock, flags);
> >>>  	/*
> >>> -	 * AK: unlikely race here: for some reason the timer could
> >>> -	 * expire before the serial number is set up below.
> >>> -	 *
> >>> -	 * TODO: kill serial or move to blk layer
> >>> +	 * Note that scsi_cmd_get_serial() used to be called here, but
> >>> +	 * now we expect the legacy SCSI LLDs that actually need this
> >>> +	 * to call it directly within their SHT->queuecommand() caller.
> >>>  	 */
> >>> -	scsi_cmd_get_serial(host, cmd); 
> >>>
> >>>  	if (unlikely(host->shost_state == SHOST_DEL)) {
> >>>  		cmd->result = (DID_NO_CONNECT << 16);
> >>> @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>>  		trace_scsi_dispatch_cmd_start(cmd);
> >>>  		rtn = host->hostt->queuecommand(cmd, scsi_done);
> >>>  	}
> >>> -	spin_unlock_irqrestore(host->host_lock, flags);
> >>> +
> >>>  	if (rtn) {
> >>>  		trace_scsi_dispatch_cmd_error(cmd, rtn);
> >>>  		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
> >>
> >> Are you planning a future revision that moves the acquiring of the host lock
> >> into the LLDD's queuecommand for all the other drivers you don't currently
> >> touch in this patch set?
> >>
> > 
> > Hi Brian,
> > 
> > I was under the impression that this would be unnecessary for the vast
> > majority of existing LLD drivers, but if you are aware of specific LLDs
> > that would still need the struct Scsi_Host->host_lock held in their
> > SHT->queuecommand() for whaterver reason please let me know and I would
> > be happy to include this into an RFCv4.
> 
> I would think that most drivers might have issues without some pretty careful
> auditing. When Christoph did this for the EH handlers, the first step was to
> simply move acquiring the host lock into the LLDs. That way we can optimize
> drivers one at a time after ensuring they can run lockless in their queuecommand
> handler.
> 
> A couple examples of possible issues with drivers I'm familiar with (ibmvfc, ipr):
> 
> * Some drivers will do list manipulation for resources needed to send commands. If
>   done lockless, this could result in list corruption with multiple readers/writers.
> 
> * Some drivers check the state of the hardware before sending a command. Failing to
>   do this when the hardware is being reset may result in nasty things like PCI bus
>   errors or even sending a command to the wrong device.
> 

Indeed, I can very much see some older LLDs making these types of
assumptions.

> These are all the sorts of errors that will be very difficult to hit but have
> pretty bad consequence when they are hit.
> 

I think pretty bad would be an under-statement when running into either
of the above two items in ancient LLD code.

In that case, I will re-spin a v4 series that contains a legacy
host_lock held w/ SHT->queuecomand() for all of the "historically high
host_lock optimized in ->queuecommand()" LLDs that are in RFCv3, and
include the other specific ones (namely mpt/fusion and mpt2sas) that we
know are safe to drop host_lock.

Many thanks for your invaluable input on some of the non-obvious items
at play here.

Best,

--nab


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/