linux-kernel - Re: [ANNOUNCE] Status of unlocked

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1288132071.8283.689.camel@mulgrave.site>
Date:	Tue, 26 Oct 2010 17:27:50 -0500
From:	James Bottomley <James.Bottomley@...e.de>
To:	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>
Cc:	Mike Anderson <andmike@...ux.vnet.ibm.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-scsi <linux-scsi@...r.kernel.org>,
	Vasu Dev <vasu.dev@...ux.intel.com>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Andi Kleen <ak@...ux.intel.com>,
	Matthew Wilcox <willy@...ux.intel.com>,
	Mike Christie <michaelc@...wisc.edu>,
	Jens Axboe <jaxboe@...ionio.com>,
	James Smart <james.smart@...lex.com>,
	Andrew Vasquez <andrew.vasquez@...gic.com>,
	FUJITA Tomonori <fujita.tomonori@....ntt.co.jp>,
	Hannes Reinecke <hare@...e.de>,
	Joe Eykholt <jeykholt@...co.com>,
	Christoph Hellwig <hch@....de>,
	Jon Hawley <warthog9@...nel.org>,
	Brian King <brking@...ux.vnet.ibm.com>,
	Christof Schmitt <christof.schmitt@...ibm.com>,
	Tejun Heo <tj@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [ANNOUNCE] Status of unlocked_qcmds=1 operation for .37

On Tue, 2010-10-26 at 15:08 -0700, Nicholas A. Bellinger wrote:
> On Thu, 2010-10-21 at 08:08 -0700, Mike Anderson wrote:
> > Nicholas A. Bellinger <nab@...ux-iscsi.org> wrote:
> > > *) core drivers/scsi remaining issue(s):
> > > 
> > > The issue raised by andmike during RFCv4 described as:
> > > 
> > > "If we skip __scsi_try_to_abort_cmd when REQ_ATOM_COMPLETE is set it
> > > would be correct for the scsi_decide_disposition cases but it would
> > > appear this would stop __scsi_try_to_abort_cmd from being called in the
> > > time out case as REQ_ATOM_COMPLETE is set prior to calling
> > > blk_rq_timed_out."
> > > 
> > > The complete discussion is here:
> > > 
> > > http://marc.info/?l=linux-scsi&m=128535319915212&w=2
> > > 
> > > We still need folks with experience to dig into this code, so you know
> > > the scsi_error.c code please jump in!
> > > 
> > 
> > I provided two logging traces below for the two cases mentioned in the
> > above email thread. The second trace used a modified version of
> > scsi_debug so that I could generate the needed unit attention codes.
> > 1.) On the check for complete comment the logging below marked with
> > "***" indicated that during a timeout that the complete bit is set. We
> > would not want to have a check for complete skip the call to
> > __scsi_try_to_abort_cmd. 
> > 
> > scsi_debug: cmd 28 00 00 00 00 00 00 00 20 00 
> > sd 1:0:0:0: [sdb] Done: TIMEOUT
> > sd 1:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> > sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 20 00
> > Waking error handler thread
> > Error handler scsi_eh_1 waking up
> > sd 1:0:0:0: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 1
> > Total of 1 commands on 1 devices require eh work
> > scsi_eh_1: aborting cmd:0xffff88019e171b80
> > 
> > ***
> > sd 1:0:0:0: scsi_try_to_abort_cmd: Comp bit set
> > ***
> > 
> > scsi_debug: abort
> > scsi_debug: cmd 00 00 00 00 00 00 
> > scsi_eh_done scmd: ffff88019e171b80 result: 0
> > scsi_send_eh_cmnd: scmd: ffff88019e171b80, timeleft: 10000
> > scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> > scsi_eh_tur: scmd ffff88019e171b80 rtn 2002
> > scsi_eh_1: flush retry cmd: ffff88019e171b80
> > scsi_restart_operations: waking up host to restart
> > scsi_debug: cmd 28 00 00 00 00 00 00 00 20 00 
> > Error handler scsi_eh_1 sleeping
> > 
> 
> Hi Mike and Co,
> 
> After considering a couple of different approches here, I ended up with
> the following simple patch that has been tested on linus HEAD from this
> afternoon w/ forcing scsi_debug cmd TIMEOUT during modprobe and via
> sg_dd ops.  (See comments below)
> 
> [PATCH] scsi: Add SCSI_EH_SOFTIRQ_DONE usage
> 
> This patch introduces a SCSI_EH_SOFTIRQ_DONE flag that is set in scsi_softirq_done()
> from block soft_irq context that is used to signal when scsi_try_to_abort_cmd() should
> be calling __scsi_try_to_abort_cmd() for a timed out struct scsi_cmnd instead of
> returning SUCCESS via checking only blk_test_rq_complete().  This is done because
> blk_rq_timed_out_timer() calls blk_mark_rq_complete() before blk_rq_timed_out() ->
> struct request_queue->rq_timed_out_fn().

This is getting pretty far off into the weeds.  I think the first step
should be queue lock push down into ->queuecommand.  This would still
necessitate locking around the serial number.  The next step might be
serial number elimination/reduction which might need to address issues
like this (or not ... block already knows the information, there's not
really much need for us to track it twice).  There might also be other
lock aquisition reduction as part of step 2.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/