lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1302633208.2661.29.camel@dolmen>
Date:	Tue, 12 Apr 2011 19:33:28 +0100
From:	Steven Whitehouse <swhiteho@...hat.com>
To:	James Bottomley <James.Bottomley@...senPartnership.com>
Cc:	Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
	Jens Axboe <jaxboe@...ionio.com>
Subject: Re: Strange block/scsi/workqueue issue

Hi,

On Tue, 2011-04-12 at 12:41 -0500, James Bottomley wrote:
> On Tue, 2011-04-12 at 17:51 +0100, Steven Whitehouse wrote:
> > Still not quite there, but looking more hopeful now,
> 
> Not sure I share your optimism; but this one
> 
Neither do I any more :-) Looks like we are back in blk_peek_request()
again. 

> > scsi 0:2:1:0: Direct-Access     DELL     PERC 6/i         1.22 PQ: 0 ANSI: 5
> > scsi: killing requests for dead queue
> > ------------[ cut here ]------------
> > WARNING: at lib/kref.c:34 kref_get+0x2d/0x30()
> > Hardware name: PowerEdge R710
> > Modules linked in:
> > Pid: 386, comm: kworker/6:1 Not tainted 2.6.39-rc2+ #193
> > Call Trace:
> >  [<ffffffff8108fa9a>] warn_slowpath_common+0x7a/0xb0
> >  [<ffffffff8108fae5>] warn_slowpath_null+0x15/0x20
> >  [<ffffffff813c984d>] kref_get+0x2d/0x30
> >  [<ffffffff813c824a>] kobject_get+0x1a/0x30
> >  [<ffffffff81460874>] get_device+0x14/0x20
> >  [<ffffffff81478bd7>] scsi_request_fn+0x37/0x4a0
> 
> Is definitely a race between the last put of the SCSI device and the
> block delayed work.  The signal that mediates that race is supposed to
> be the q->queuedata being null, but that doesn't get set until some time
> into the release function (by which time the ref is already zero).
> 
> Closing the window completely involves setting this to NULL before we do
> the final put when we know everything else is gone.  So, here's the next
> incremental.
> 
> James
> 
> ---
> 
> Index: linux-2.6/drivers/scsi/scsi_sysfs.c
> ===================================================================
> --- linux-2.6.orig/drivers/scsi/scsi_sysfs.c
> +++ linux-2.6/drivers/scsi/scsi_sysfs.c
> @@ -323,7 +323,6 @@ static void scsi_device_dev_release_user
>  	}
>  
>  	if (sdev->request_queue) {
> -		sdev->request_queue->queuedata = NULL;
>  		/* user context needed to free queue */
>  		scsi_free_queue(sdev->request_queue);
>  		/* temporary expedient, try to catch use of queue lock
> @@ -937,6 +936,7 @@ void __scsi_remove_device(struct scsi_de
>  	if (sdev->host->hostt->slave_destroy)
>  		sdev->host->hostt->slave_destroy(sdev);
>  	transport_destroy_device(dev);
> +	sdev->request_queue->queuedata = NULL;
>  	put_device(dev);
>  }
>  
> 
> 

__elv_next_request():
/home/steve/linux-2.6/block/blk.h:60
static inline struct request *__elv_next_request(struct request_queue *q)
{
        struct request *rq;

        while (1) {
                if (!list_empty(&q->queue_head)) {
    6d59:       49 39 dc                cmp    %rbx,%r12
    6d5c:       0f 85 2e ff ff ff       jne    6c90 <blk_peek_request+0x30>
/home/steve/linux-2.6/block/blk.h:65
                        rq = list_entry_rq(q->queue_head.next);
                        return rq;
                }

                if (!q->elevator->ops || !q->elevator->ops->elevator_dispatch_fn
(q, 0))
    6d62:       49 8b 44 24 18          mov    0x18(%r12),%rax
    6d67:       48 8b 00                mov    (%rax),%rax
    6d6a:       48 85 c0                test   %rax,%rax
    6d6d:       74 0c                   je     6d7b <blk_peek_request+0x11b>
    6d6f:       31 f6                   xor    %esi,%esi
    6d71:       4c 89 e7                mov    %r12,%rdi                                <----- here
    6d74:       ff 50 28                callq  *0x28(%rax)
    6d77:       85 c0                   test   %eax,%eax
    6d79:       75 da                   jne    6d55 <blk_peek_request+0xf5>
    6d7b:       45 31 ed                xor    %r13d,%r13d
blk_peek_request():
/home/steve/linux-2.6/block/blk-core.c:1929
                        break;
                }
        }

        return rq;
}



Boot logs attached as usual,

Steve.


View attachment "james5.txt" of type "text/plain" (25010 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ