linux-kernel - Re: Strange block/scsi/workqueue issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1302598152.2661.11.camel@dolmen>
Date:	Tue, 12 Apr 2011 09:49:12 +0100
From:	Steven Whitehouse <swhiteho@...hat.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	linux-kernel@...r.kernel.org, Jens Axboe <jaxboe@...ionio.com>,
	James Bottomley <James.Bottomley@...senPartnership.com>
Subject: Re: Strange block/scsi/workqueue issue

Hi,

On Tue, 2011-04-12 at 09:14 +0900, Tejun Heo wrote:
> Hello,
> 
> On Mon, Apr 11, 2011 at 06:52:10PM +0100, Steven Whitehouse wrote:
> > WARNING: at lib/kref.c:34 kref_get+0x2d/0x30()
> > Hardware name: PowerEdge R710
> > Modules linked in:
> > Pid: 12, comm: kworker/2:0 Not tainted 2.6.39-rc2+ #188
> > Call Trace:
> >  [<ffffffff8108fa9a>] warn_slowpath_common+0x7a/0xb0
> >  [<ffffffff8108fae5>] warn_slowpath_null+0x15/0x20
> >  [<ffffffff813c97cd>] kref_get+0x2d/0x30
> >  [<ffffffff813c81ca>] kobject_get+0x1a/0x30
> >  [<ffffffff814607f4>] get_device+0x14/0x20
> >  [<ffffffff81478b57>] scsi_request_fn+0x37/0x4a0
> >  [<ffffffff813aff2a>] __blk_run_queue+0x6a/0x110
> >  [<ffffffff813b1f66>] blk_delay_work+0x26/0x40
> >  [<ffffffff810aa9c7>] process_one_work+0x197/0x520
> >  [<ffffffff810acfec>] worker_thread+0x15c/0x330
> >  [<ffffffff810b1f16>] kthread+0xa6/0xb0
> >  [<ffffffff816870e4>] kernel_thread_helper+0x4/0x10
> > ---[ end trace 3681e9da2630a94b ]---
> 
> Hmm, it could be that the root cause of the problem is
> premature/double put of scsi_device.  Without the patch, it makes
> scsi_request_fn() call into device destruction path prematurely
> triggering deadlock while after the patch, the deadlock is gone but
> the ref count reaches zero prematurely triggering kref warning on the
> next request.
> 
> The problem doesn't seem widespread so something about the setup is
> peculiar.  Steven, can you please detail the setup (and steps needed
> to trigger the problem) and attach the full boot log?  James, any
> ideas?
> 
> Thanks.
> 
The hardware is as follows:

Dell R710 server with two 2GHz 4-core CPUs
Two 146G SAS disks with hardware mirroring as root/OS disk
Two 300G SAS disks with hardware mirroring as GFS2 test disk (note: not
mounted during boot process)
12G RAM (boots with mem=4G since I was originally running tests with
lower memory, but I don't think the memory size affects this at all)

I've attached the boot log from testing your patch, yesterday. If you
want the boot logs including James' patch or the original boot log, I
can send those too,

Steve.


View attachment "tejun.txt" of type "text/plain" (164473 bytes)