[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120829210838.GA15218@moria.home.lan>
Date: Wed, 29 Aug 2012 14:08:38 -0700
From: Kent Overstreet <koverstreet@...gle.com>
To: John Stoffel <john@...ffel.org>
Cc: Vivek Goyal <vgoyal@...hat.com>, Jens Axboe <axboe@...nel.dk>,
dm-devel@...hat.com, linux-kernel@...r.kernel.org,
linux-bcache@...r.kernel.org, mpatocka@...hat.com,
bharrosh@...asas.com, Tejun Heo <tj@...nel.org>
Subject: Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio
allocation by stacking drivers
On Wed, Aug 29, 2012 at 05:01:09PM -0400, John Stoffel wrote:
> >>>>> "Kent" == Kent Overstreet <koverstreet@...gle.com> writes:
>
> Kent> On Wed, Aug 29, 2012 at 03:39:14PM +0100, Alasdair G Kergon wrote:
> >> It's also instructive to remember why the code is the way it is: it used
> >> to process bios for underlying devices immediately, but this sometimes
> >> meant too much recursive stack growth. If a per-device rescuer thread
> >> is to be made available (as well as the mempool), the option of
> >> reinstating recursion is there too - only punting to workqueue when the
> >> stack actually becomes "too big". (Also bear in mind that some dm
> >> targets may have dependencies on their own mempools - submission can
> >> block there too.) I find it helpful only to consider splitting into two
> >> pieces - it must always be possible to process the first piece (i.e.
> >> process it at the next layer down in the stack) and complete it
> >> independently of what happens to the second piece (which might require
> >> further splitting and block until the first piece has completed).
>
> Kent> I'm sure it could be made to work (and it may well simpler), but it
> Kent> seems problematic from a performance pov.
>
> Kent> With stacked devices you'd then have to switch stacks on _every_ bio.
> Kent> That could be made fast enough I'm sure, but it wouldn't be free and I
> Kent> don't know of any existing code in the kernel that implements what we'd
> Kent> need (though if you know how you'd go about doing that, I'd love to
> Kent> know! Would be useful for other things).
>
> Kent> The real problem is that because we'd need these extra stacks for
> Kent> handling all bios we couldn't get by with just one per bio_set. We'd
> Kent> only need one to make forward progress so the rest could be allocated
> Kent> on demand (i.e. what the workqueue code does) but that sounds like it's
> Kent> starting to get expensive.
>
> Maybe we need to limit the size of BIOs to that of the bottom-most
> underlying device instead? Or maybe limit BIOs to some small
> multiple? As you stack up DM targets one on top of each other, they
> should respect the limits of the underlying devices and pass those
> limits up the chain.
That's the approach the block layer currently tries to take. It's
brittle, tricky and inefficient, and it completely breaks down when the
limits are dynamic - so things like dm and bcache are always going to
have to split anyways.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists