[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1285362769.1849.256.camel@haakon2.linux-iscsi.org>
Date: Fri, 24 Sep 2010 14:12:49 -0700
From: "Nicholas A. Bellinger" <nab@...ux-iscsi.org>
To: Brian King <brking@...ux.vnet.ibm.com>
Cc: linux-scsi <linux-scsi@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Vasu Dev <vasu.dev@...ux.intel.com>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
Matthew Wilcox <willy@...ux.intel.com>,
James Bottomley <James.Bottomley@...e.de>,
Mike Christie <michaelc@...wisc.edu>,
James Smart <james.smart@...lex.com>,
Andrew Vasquez <andrew.vasquez@...gic.com>,
FUJITA Tomonori <fujita.tomonori@....ntt.co.jp>,
Hannes Reinecke <hare@...e.de>,
Joe Eykholt <jeykholt@...co.com>,
Christoph Hellwig <hch@....de>,
MPTFusionLinux <DL-MPTFusionLinux@....com>,
"eata.c maintainer" <dario.ballabio@...ind.it>
Subject: Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage in
scsi_dispatch_cmd()
On Fri, 2010-09-24 at 16:10 -0500, Brian King wrote:
> On 09/24/2010 03:44 PM, Nicholas A. Bellinger wrote:
> > On Fri, 2010-09-24 at 08:41 -0500, Brian King wrote:
> >> On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote:
> >>> @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
> >>> int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>> {
> >>> struct Scsi_Host *host = cmd->device->host;
> >>> - unsigned long flags = 0;
> >>> unsigned long timeout;
> >>> int rtn = 0;
> >>>
> >>> @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>> scsi_done(cmd);
> >>> goto out;
> >>> }
> >>> -
> >>> - spin_lock_irqsave(host->host_lock, flags);
> >>> /*
> >>> - * AK: unlikely race here: for some reason the timer could
> >>> - * expire before the serial number is set up below.
> >>> - *
> >>> - * TODO: kill serial or move to blk layer
> >>> + * Note that scsi_cmd_get_serial() used to be called here, but
> >>> + * now we expect the legacy SCSI LLDs that actually need this
> >>> + * to call it directly within their SHT->queuecommand() caller.
> >>> */
> >>> - scsi_cmd_get_serial(host, cmd);
> >>>
> >>> if (unlikely(host->shost_state == SHOST_DEL)) {
> >>> cmd->result = (DID_NO_CONNECT << 16);
> >>> @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>> trace_scsi_dispatch_cmd_start(cmd);
> >>> rtn = host->hostt->queuecommand(cmd, scsi_done);
> >>> }
> >>> - spin_unlock_irqrestore(host->host_lock, flags);
> >>> +
> >>> if (rtn) {
> >>> trace_scsi_dispatch_cmd_error(cmd, rtn);
> >>> if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
> >>
> >> Are you planning a future revision that moves the acquiring of the host lock
> >> into the LLDD's queuecommand for all the other drivers you don't currently
> >> touch in this patch set?
> >>
> >
> > Hi Brian,
> >
> > I was under the impression that this would be unnecessary for the vast
> > majority of existing LLD drivers, but if you are aware of specific LLDs
> > that would still need the struct Scsi_Host->host_lock held in their
> > SHT->queuecommand() for whaterver reason please let me know and I would
> > be happy to include this into an RFCv4.
>
> I would think that most drivers might have issues without some pretty careful
> auditing. When Christoph did this for the EH handlers, the first step was to
> simply move acquiring the host lock into the LLDs. That way we can optimize
> drivers one at a time after ensuring they can run lockless in their queuecommand
> handler.
>
> A couple examples of possible issues with drivers I'm familiar with (ibmvfc, ipr):
>
> * Some drivers will do list manipulation for resources needed to send commands. If
> done lockless, this could result in list corruption with multiple readers/writers.
>
> * Some drivers check the state of the hardware before sending a command. Failing to
> do this when the hardware is being reset may result in nasty things like PCI bus
> errors or even sending a command to the wrong device.
>
Indeed, I can very much see some older LLDs making these types of
assumptions.
> These are all the sorts of errors that will be very difficult to hit but have
> pretty bad consequence when they are hit.
>
I think pretty bad would be an under-statement when running into either
of the above two items in ancient LLD code.
In that case, I will re-spin a v4 series that contains a legacy
host_lock held w/ SHT->queuecomand() for all of the "historically high
host_lock optimized in ->queuecommand()" LLDs that are in RFCv3, and
include the other specific ones (namely mpt/fusion and mpt2sas) that we
know are safe to drop host_lock.
Many thanks for your invaluable input on some of the non-obvious items
at play here.
Best,
--nab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists