[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1313075131.4166.10.camel@mulgrave>
Date: Thu, 11 Aug 2011 10:05:31 -0500
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: Jun'ichi Nomura <j-nomura@...jp.nec.com>, jaxboe@...ionio.com,
roland@...estorage.com, linux-scsi@...r.kernel.org,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
device-mapper development <dm-devel@...hat.com>,
Kiyoshi Ueda <k-ueda@...jp.nec.com>
Subject: Re: [BUG] Oops when SCSI device under multipath is removed
On Thu, 2011-08-11 at 10:59 -0400, Alan Stern wrote:
> On Thu, 11 Aug 2011, James Bottomley wrote:
>
> > > If the reason you moved scsi_free_queue into scsi_remove_device
> > > is marking the queue dead, how about the following patch?
> > > Do you think it's acceptable?
> >
> > Well, it's just hiding the problem. The essential problem is that only
> > block has the correctly refcounted knowledge to know the last release of
> > the queue reference. Until that time, the holder of the reference can
> > use the queue regardless of whether blk_cleanup_queue() has been called.
> > This is the race you complain about since use of the queue involves the
> > lock which should be guarded by QUEUE_DEAD checks.
> >
> > This is essentially unfixable with function calls. The only way to fix
> > it is to have a callback model for freeing the external lock.
>
> Assuming the queue is associated with a device, the queue could take a
> reference to the device, dropping that reference when the queue is
> freed. Then the lock could safely be freed at the same time as the
> device.
If that assumption is correct, there's no point refcounting the queue at
all because its use is entirely subordinated to the lifecycle of the
associated device. Plus all the wittering about my previous patch is
pointless, because blk_cleanup_queue() has to do the final put of the
queue in the lock free path (otherwise the assumption is violated).
However, much as I'd like to accept this rosy view, the original oops
that started all of this in 2.6.38 was someone caught something with a
reference to a SCSI queue after the device release function had been
called.
James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists