[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5389393D.2030305@kernel.dk>
Date: Fri, 30 May 2014 20:06:53 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Chinner <david@...morbit.com>
CC: Minchan Kim <minchan@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-mm <linux-mm@...ck.org>, "H. Peter Anvin" <hpa@...or.com>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Hugh Dickins <hughd@...gle.com>,
Rusty Russell <rusty@...tcorp.com.au>,
"Michael S. Tsirkin" <mst@...hat.com>,
Dave Hansen <dave.hansen@...el.com>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 2/2] x86_64: expand kernel stack to 16K
On 2014-05-28 20:42, Linus Torvalds wrote:
>> Regardless of whether it is swap or something external queues the
>> bio on the plug, perhaps we should look at why it's done inline
>> rather than by kblockd, where it was moved because it was blowing
>> the stack from schedule():
>
> So it sounds like we need to do this for io_schedule() too.
>
> In fact, we've generally found it to be a mistake every time we
> "automatically" unblock some IO queue. And I'm not saying that because
> of stack space, but because we've _often_ had the situation that eager
> unblocking results in IO that could have been done as bigger requests.
We definitely need to auto-unplug on the schedule path, otherwise we run
into all sorts of trouble. But making it async off the IO schedule path
is fine. By definition, it's not latency sensitive if we are hitting
unplug on schedule. I'm pretty sure it was run inline on CPU concerns
here, as running inline is certainly cheaper than punting to kblockd.
> Looking at that callchain, I have to say that ext4 doesn't look
> horrible compared to the whole block layer and virtio.. Yes,
> "ext4_writepages()" is using almost 400 bytes of stack, and most of
> that seems to be due to:
>
> struct mpage_da_data mpd;
> struct blk_plug plug;
Plus blk_plug is pretty tiny as it is. I queued up a patch to kill the
magic part of it, since that's never caught any bugs. Only saves 8
bytes, but may as well take that. Especially if we end up with nested plugs.
> Well, we've definitely have had some issues with deeper callchains
> with md, but I suspect virtio might be worse, and the new blk-mq code
> is lilkely worse in this respect too.
I don't think blk-mq is worse than the older stack, in fact it should be
better. The call chains are shorter, and a lot less cruft on the stack.
Historically the stack issues have been nested devices, however. And for
sync IO, we do run it inline, so if the driver chews up a lot of stack,
well...
Looks like I'm late here and the decision has been made to go 16K
stacks, which I think is a good one. We've been living on the edge (and
sometimes over) for heavy dm/md setups for a while, and have been
patching around that fact in the IO stack for years.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists