[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110316194726.GY12760@beardog.cce.hp.com>
Date:	Wed, 16 Mar 2011 14:47:26 -0500
From:	scameron@...rdog.cce.hp.com
To:	Arnd Bergmann <arnd@...db.de>
Cc:	linux-kernel@...r.kernel.org, axboe@...nel.dk, dab@...com,
	mike.miller@...com, scameron@...rdog.cce.hp.com
Subject: Re: Question about driver ioctl, big kernel lock, mutexes
On Wed, Mar 16, 2011 at 08:06:14PM +0100, Arnd Bergmann wrote:
> On Wednesday 16 March 2011 16:50:33 scameron@...rdog.cce.hp.com wrote:
> > 
> > I just noticed the existence of this change (I guess
> > I'm a little slow):
> > 
> > 	commit 2a48fc0ab24241755dc93bfd4f01d68efab47f5a
> > 	Author: Arnd Bergmann <arnd@...db.de>
> > 	Date:   Wed Jun 2 14:28:52 2010 +0200
> > 
> > 	    block: autoconvert trivial BKL users to private mutex
> > 	    
> > 	    The block device drivers have all gained new lock_kernel
> > 	    calls from a recent pushdown, and some of the drivers
> > 	    were already using the BKL before.
> > 	    
> > 	    This turns the BKL into a set of per-driver mutexes.
> > 	    Still need to check whether this is safe to do.
> > 
> > This changes the behavior of, for example, the CCISS_PASSTHRU and 
> > other ioctls.  Previously, these were "protected" by the Big
> > Kernel Lock.  Now, from what I understand and what I've observed,
> > the Big Kernel Lock is kind of strange, in that it auto-releases 
> > whenever you sleep.  This means that the ioctl path in cciss used
> > to allow concurrency (which is fine, they should all be thread
> > safe.  I'm kind of wondering why the BKL was there in the first
> > place.  I guess I assumed it was necessary for reasons having to
> > do with the upper layers.)  
> 
> The reason to have the BKL there is that the BKL used to protect
> every syscall. The block ioctl handling was simply one of the
> final places to lose it, after it was gone almost everywhere
> else.
> 
> Originally (before Linux-1.3!), the only concurrency that was
> going on in the kernel was when one task was sleeping, so it was
> straightforward to add the semantics of the BKL that look so
> strange in retrospect.
> 
> > Now, if I understand things correctly, with this new driver specific mutex
> > surrounding the ioctl path, only one thread is permitted in the ioctl path at a
> > time, and whether it sleeps or not, the mutex is not released until it's done.
> 
> Yes.
> 
> > That's a pretty big change in behavior.  Some commands sent through
> > the passthrough ioctls may take awhile, and they're going to block
> > any other ioctls from getting started while they're going.  Previously,
> > it was possible to saturate the controller using multiple threads or
> > processes hammering on the ioctl path.  This would appear to no longer
> > be the case.
> 
> Correct. Almost all ioctls in the kernel are of the "instantly
> return" kind, any ioctl that blocks for a potentially unlimited
> amount of time under a mutex would be a bug that I introduced
> in the conversion.
> 
> > That being said, there was previously no real limit (other than
> > pci_alloc_consistent eventually failing) on how many commands
> > could be shoved through the cciss ioctl path at once, which was probably
> > a bug.  It has occurred to me to put a limit on this, and return
> > -EBUSY when the limit is hit, though I don't know if programs using
> > the ioctl are prepared to receive EBUSY... but maybe that's not my
> > problem.
> > 
> > Thoughts?
> > 
> > I'm thinking that if there's no reason the ioctl path can't allow
> > concurrency, then it should allow concurrency.
> 
> Absolutely. The best way forward would be to prove that the mutex
> I introduced is actually pointless, or alternatively change the code
> so that the mutex is only held in the places where it is really
> required, if any.
Pretty sure the ioctl stuff is already thread safe, but I will look at it
again.
There are a couple of other places the mutex is used, open, release, I think
one other place, that I'm less sure about, only because I haven't thought
about them.  It should be a safe assumption that nothing outside
the driver depends on these mutexes (obviously, since they're declared
only in the driver) so that should make it pretty easy to figure out.
 
> 
> When I submitted the patches, I always tried to make it clear that
> most of the new mutexes are not required at all, but that proving
> this for every single driver takes a lot of hard work. 
> 
> Having a mutex that should not be there causes a performance regression
> or a deadlock, both of which are fairly easy to bisect and fix, but
> missing a mutex because of a subtle BKL dependency would have meant
> introducing a race condition that would be extremely hard to analyse.
> 
> Limiting the number of concurrent I/Os submitted to a device might
> be a good idea, but I think that's a separate discussion.
Ok.  Thanks for the explanation and confirmation of what I was thinking.
I'll start working on a fix for this.  May take me a little while as 
there are a bunch of other things I have to work on, but it's been
this way since June of 2010 so far with no complaints (that I know of),
so I guess it won't hurt too much for it to be this way for a little
while longer.
-- steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
