lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1302598152.2661.11.camel@dolmen>
Date:	Tue, 12 Apr 2011 09:49:12 +0100
From:	Steven Whitehouse <swhiteho@...hat.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	linux-kernel@...r.kernel.org, Jens Axboe <jaxboe@...ionio.com>,
	James Bottomley <James.Bottomley@...senPartnership.com>
Subject: Re: Strange block/scsi/workqueue issue

Hi,

On Tue, 2011-04-12 at 09:14 +0900, Tejun Heo wrote:
> Hello,
> 
> On Mon, Apr 11, 2011 at 06:52:10PM +0100, Steven Whitehouse wrote:
> > WARNING: at lib/kref.c:34 kref_get+0x2d/0x30()
> > Hardware name: PowerEdge R710
> > Modules linked in:
> > Pid: 12, comm: kworker/2:0 Not tainted 2.6.39-rc2+ #188
> > Call Trace:
> >  [<ffffffff8108fa9a>] warn_slowpath_common+0x7a/0xb0
> >  [<ffffffff8108fae5>] warn_slowpath_null+0x15/0x20
> >  [<ffffffff813c97cd>] kref_get+0x2d/0x30
> >  [<ffffffff813c81ca>] kobject_get+0x1a/0x30
> >  [<ffffffff814607f4>] get_device+0x14/0x20
> >  [<ffffffff81478b57>] scsi_request_fn+0x37/0x4a0
> >  [<ffffffff813aff2a>] __blk_run_queue+0x6a/0x110
> >  [<ffffffff813b1f66>] blk_delay_work+0x26/0x40
> >  [<ffffffff810aa9c7>] process_one_work+0x197/0x520
> >  [<ffffffff810acfec>] worker_thread+0x15c/0x330
> >  [<ffffffff810b1f16>] kthread+0xa6/0xb0
> >  [<ffffffff816870e4>] kernel_thread_helper+0x4/0x10
> > ---[ end trace 3681e9da2630a94b ]---
> 
> Hmm, it could be that the root cause of the problem is
> premature/double put of scsi_device.  Without the patch, it makes
> scsi_request_fn() call into device destruction path prematurely
> triggering deadlock while after the patch, the deadlock is gone but
> the ref count reaches zero prematurely triggering kref warning on the
> next request.
> 
> The problem doesn't seem widespread so something about the setup is
> peculiar.  Steven, can you please detail the setup (and steps needed
> to trigger the problem) and attach the full boot log?  James, any
> ideas?
> 
> Thanks.
> 
The hardware is as follows:

Dell R710 server with two 2GHz 4-core CPUs
Two 146G SAS disks with hardware mirroring as root/OS disk
Two 300G SAS disks with hardware mirroring as GFS2 test disk (note: not
mounted during boot process)
12G RAM (boots with mem=4G since I was originally running tests with
lower memory, but I don't think the memory size affects this at all)

I've attached the boot log from testing your patch, yesterday. If you
want the boot logs including James' patch or the original boot log, I
can send those too,

Steve.


View attachment "tejun.txt" of type "text/plain" (164473 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ