linux-kernel - Re: [RFC 2/2] x86_64: expand kernel stack to 16K

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5389393D.2030305@kernel.dk>
Date:	Fri, 30 May 2014 20:06:53 -0600
From:	Jens Axboe <axboe@...nel.dk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Chinner <david@...morbit.com>
CC:	Minchan Kim <minchan@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-mm <linux-mm@...ck.org>, "H. Peter Anvin" <hpa@...or.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Hugh Dickins <hughd@...gle.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Dave Hansen <dave.hansen@...el.com>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 2/2] x86_64: expand kernel stack to 16K

On 2014-05-28 20:42, Linus Torvalds wrote:
>> Regardless of whether it is swap or something external queues the
>> bio on the plug, perhaps we should look at why it's done inline
>> rather than by kblockd, where it was moved because it was blowing
>> the stack from schedule():
>
> So it sounds like we need to do this for io_schedule() too.
>
> In fact, we've generally found it to be a mistake every time we
> "automatically" unblock some IO queue. And I'm not saying that because
> of stack space, but because we've _often_ had the situation that eager
> unblocking results in IO that could have been done as bigger requests.

We definitely need to auto-unplug on the schedule path, otherwise we run 
into all sorts of trouble. But making it async off the IO schedule path 
is fine. By definition, it's not latency sensitive if we are hitting 
unplug on schedule. I'm pretty sure it was run inline on CPU concerns 
here, as running inline is certainly cheaper than punting to kblockd.

> Looking at that callchain, I have to say that ext4 doesn't look
> horrible compared to the whole block layer and virtio.. Yes,
> "ext4_writepages()" is using almost 400 bytes of stack, and most of
> that seems to be due to:
>
>          struct mpage_da_data mpd;
>          struct blk_plug plug;

Plus blk_plug is pretty tiny as it is. I queued up a patch to kill the 
magic part of it, since that's never caught any bugs. Only saves 8 
bytes, but may as well take that. Especially if we end up with nested plugs.

> Well, we've definitely have had some issues with deeper callchains
> with md, but I suspect virtio might be worse, and the new blk-mq code
> is lilkely worse in this respect too.

I don't think blk-mq is worse than the older stack, in fact it should be 
better. The call chains are shorter, and a lot less cruft on the stack. 
Historically the stack issues have been nested devices, however. And for 
sync IO, we do run it inline, so if the driver chews up a lot of stack, 
well...

Looks like I'm late here and the decision has been made to go 16K 
stacks, which I think is a good one. We've been living on the edge (and 
sometimes over) for heavy dm/md setups for a while, and have been 
patching around that fact in the IO stack for years.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/