[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1302633208.2661.29.camel@dolmen>
Date: Tue, 12 Apr 2011 19:33:28 +0100
From: Steven Whitehouse <swhiteho@...hat.com>
To: James Bottomley <James.Bottomley@...senPartnership.com>
Cc: Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
Jens Axboe <jaxboe@...ionio.com>
Subject: Re: Strange block/scsi/workqueue issue
Hi,
On Tue, 2011-04-12 at 12:41 -0500, James Bottomley wrote:
> On Tue, 2011-04-12 at 17:51 +0100, Steven Whitehouse wrote:
> > Still not quite there, but looking more hopeful now,
>
> Not sure I share your optimism; but this one
>
Neither do I any more :-) Looks like we are back in blk_peek_request()
again.
> > scsi 0:2:1:0: Direct-Access DELL PERC 6/i 1.22 PQ: 0 ANSI: 5
> > scsi: killing requests for dead queue
> > ------------[ cut here ]------------
> > WARNING: at lib/kref.c:34 kref_get+0x2d/0x30()
> > Hardware name: PowerEdge R710
> > Modules linked in:
> > Pid: 386, comm: kworker/6:1 Not tainted 2.6.39-rc2+ #193
> > Call Trace:
> > [<ffffffff8108fa9a>] warn_slowpath_common+0x7a/0xb0
> > [<ffffffff8108fae5>] warn_slowpath_null+0x15/0x20
> > [<ffffffff813c984d>] kref_get+0x2d/0x30
> > [<ffffffff813c824a>] kobject_get+0x1a/0x30
> > [<ffffffff81460874>] get_device+0x14/0x20
> > [<ffffffff81478bd7>] scsi_request_fn+0x37/0x4a0
>
> Is definitely a race between the last put of the SCSI device and the
> block delayed work. The signal that mediates that race is supposed to
> be the q->queuedata being null, but that doesn't get set until some time
> into the release function (by which time the ref is already zero).
>
> Closing the window completely involves setting this to NULL before we do
> the final put when we know everything else is gone. So, here's the next
> incremental.
>
> James
>
> ---
>
> Index: linux-2.6/drivers/scsi/scsi_sysfs.c
> ===================================================================
> --- linux-2.6.orig/drivers/scsi/scsi_sysfs.c
> +++ linux-2.6/drivers/scsi/scsi_sysfs.c
> @@ -323,7 +323,6 @@ static void scsi_device_dev_release_user
> }
>
> if (sdev->request_queue) {
> - sdev->request_queue->queuedata = NULL;
> /* user context needed to free queue */
> scsi_free_queue(sdev->request_queue);
> /* temporary expedient, try to catch use of queue lock
> @@ -937,6 +936,7 @@ void __scsi_remove_device(struct scsi_de
> if (sdev->host->hostt->slave_destroy)
> sdev->host->hostt->slave_destroy(sdev);
> transport_destroy_device(dev);
> + sdev->request_queue->queuedata = NULL;
> put_device(dev);
> }
>
>
>
__elv_next_request():
/home/steve/linux-2.6/block/blk.h:60
static inline struct request *__elv_next_request(struct request_queue *q)
{
struct request *rq;
while (1) {
if (!list_empty(&q->queue_head)) {
6d59: 49 39 dc cmp %rbx,%r12
6d5c: 0f 85 2e ff ff ff jne 6c90 <blk_peek_request+0x30>
/home/steve/linux-2.6/block/blk.h:65
rq = list_entry_rq(q->queue_head.next);
return rq;
}
if (!q->elevator->ops || !q->elevator->ops->elevator_dispatch_fn
(q, 0))
6d62: 49 8b 44 24 18 mov 0x18(%r12),%rax
6d67: 48 8b 00 mov (%rax),%rax
6d6a: 48 85 c0 test %rax,%rax
6d6d: 74 0c je 6d7b <blk_peek_request+0x11b>
6d6f: 31 f6 xor %esi,%esi
6d71: 4c 89 e7 mov %r12,%rdi <----- here
6d74: ff 50 28 callq *0x28(%rax)
6d77: 85 c0 test %eax,%eax
6d79: 75 da jne 6d55 <blk_peek_request+0xf5>
6d7b: 45 31 ed xor %r13d,%r13d
blk_peek_request():
/home/steve/linux-2.6/block/blk-core.c:1929
break;
}
}
return rq;
}
Boot logs attached as usual,
Steve.
View attachment "james5.txt" of type "text/plain" (25010 bytes)
Powered by blists - more mailing lists