linux-kernel - Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in scsi_dispatch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4C9D13B6.5080404@linux.vnet.ibm.com>
Date:	Fri, 24 Sep 2010 16:10:14 -0500
From:	Brian King <brking@...ux.vnet.ibm.com>
To:	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>
CC:	linux-scsi <linux-scsi@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Vasu Dev <vasu.dev@...ux.intel.com>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Andi Kleen <ak@...ux.intel.com>,
	Matthew Wilcox <willy@...ux.intel.com>,
	James Bottomley <James.Bottomley@...e.de>,
	Mike Christie <michaelc@...wisc.edu>,
	James Smart <james.smart@...lex.com>,
	Andrew Vasquez <andrew.vasquez@...gic.com>,
	FUJITA Tomonori <fujita.tomonori@....ntt.co.jp>,
	Hannes Reinecke <hare@...e.de>,
	Joe Eykholt <jeykholt@...co.com>,
	Christoph Hellwig <hch@....de>,
	MPTFusionLinux <DL-MPTFusionLinux@....com>,
	"eata.c maintainer" <dario.ballabio@...ind.it>
Subject: Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in
 scsi_dispatch_cmd()

On 09/24/2010 03:44 PM, Nicholas A. Bellinger wrote:
> On Fri, 2010-09-24 at 08:41 -0500, Brian King wrote:
>> On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote:
>>> @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
>>>  int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
>>>  {
>>>  	struct Scsi_Host *host = cmd->device->host;
>>> -	unsigned long flags = 0;
>>>  	unsigned long timeout;
>>>  	int rtn = 0;
>>>
>>> @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
>>>  		scsi_done(cmd);
>>>  		goto out;
>>>  	}
>>> -
>>> -	spin_lock_irqsave(host->host_lock, flags);
>>>  	/*
>>> -	 * AK: unlikely race here: for some reason the timer could
>>> -	 * expire before the serial number is set up below.
>>> -	 *
>>> -	 * TODO: kill serial or move to blk layer
>>> +	 * Note that scsi_cmd_get_serial() used to be called here, but
>>> +	 * now we expect the legacy SCSI LLDs that actually need this
>>> +	 * to call it directly within their SHT->queuecommand() caller.
>>>  	 */
>>> -	scsi_cmd_get_serial(host, cmd); 
>>>
>>>  	if (unlikely(host->shost_state == SHOST_DEL)) {
>>>  		cmd->result = (DID_NO_CONNECT << 16);
>>> @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
>>>  		trace_scsi_dispatch_cmd_start(cmd);
>>>  		rtn = host->hostt->queuecommand(cmd, scsi_done);
>>>  	}
>>> -	spin_unlock_irqrestore(host->host_lock, flags);
>>> +
>>>  	if (rtn) {
>>>  		trace_scsi_dispatch_cmd_error(cmd, rtn);
>>>  		if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
>>
>> Are you planning a future revision that moves the acquiring of the host lock
>> into the LLDD's queuecommand for all the other drivers you don't currently
>> touch in this patch set?
>>
> 
> Hi Brian,
> 
> I was under the impression that this would be unnecessary for the vast
> majority of existing LLD drivers, but if you are aware of specific LLDs
> that would still need the struct Scsi_Host->host_lock held in their
> SHT->queuecommand() for whaterver reason please let me know and I would
> be happy to include this into an RFCv4.

I would think that most drivers might have issues without some pretty careful
auditing. When Christoph did this for the EH handlers, the first step was to
simply move acquiring the host lock into the LLDs. That way we can optimize
drivers one at a time after ensuring they can run lockless in their queuecommand
handler.

A couple examples of possible issues with drivers I'm familiar with (ibmvfc, ipr):

* Some drivers will do list manipulation for resources needed to send commands. If
  done lockless, this could result in list corruption with multiple readers/writers.

* Some drivers check the state of the hardware before sending a command. Failing to
  do this when the hardware is being reset may result in nasty things like PCI bus
  errors or even sending a command to the wrong device.

These are all the sorts of errors that will be very difficult to hit but have
pretty bad consequence when they are hit.

Thanks,

Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/