lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1313075131.4166.10.camel@mulgrave>
Date:	Thu, 11 Aug 2011 10:05:31 -0500
From:	James Bottomley <James.Bottomley@...senPartnership.com>
To:	Alan Stern <stern@...land.harvard.edu>
Cc:	Jun'ichi Nomura <j-nomura@...jp.nec.com>, jaxboe@...ionio.com,
	roland@...estorage.com, linux-scsi@...r.kernel.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	device-mapper development <dm-devel@...hat.com>,
	Kiyoshi Ueda <k-ueda@...jp.nec.com>
Subject: Re: [BUG] Oops when SCSI device under multipath is removed

On Thu, 2011-08-11 at 10:59 -0400, Alan Stern wrote:
> On Thu, 11 Aug 2011, James Bottomley wrote:
> 
> > > If the reason you moved scsi_free_queue into scsi_remove_device
> > > is marking the queue dead, how about the following patch?
> > > Do you think it's acceptable?
> > 
> > Well, it's just hiding the problem.  The essential problem is that only
> > block has the correctly refcounted knowledge to know the last release of
> > the queue reference.  Until that time, the holder of the reference can
> > use the queue regardless of whether blk_cleanup_queue() has been called.
> > This is the race you complain about since use of the queue involves the
> > lock which should be guarded by QUEUE_DEAD checks.
> > 
> > This is essentially unfixable with function calls.  The only way to fix
> > it is to have a callback model for freeing the external lock.
> 
> Assuming the queue is associated with a device, the queue could take a
> reference to the device, dropping that reference when the queue is
> freed.  Then the lock could safely be freed at the same time as the 
> device.

If that assumption is correct, there's no point refcounting the queue at
all because its use is entirely subordinated to the lifecycle of the
associated device.  Plus all the wittering about my previous patch is
pointless, because blk_cleanup_queue() has to do the final put of the
queue in the lock free path (otherwise the assumption is violated).

However, much as I'd like to accept this rosy view, the original oops
that started all of this in 2.6.38 was someone caught something with a
reference to a SCSI queue after the device release function had been
called.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ