linux-kernel - Re: [RFC 2/2] x86_64: expand kernel stack to 16K

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 29 May 2014 08:31:42 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Minchan Kim <minchan@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-mm <linux-mm@...ck.org>, "H. Peter Anvin" <hpa@...or.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Hugh Dickins <hughd@...gle.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Dave Hansen <dave.hansen@...el.com>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 2/2] x86_64: expand kernel stack to 16K

On Wed, May 28, 2014 at 09:09:23AM -0700, Linus Torvalds wrote:
> On Tue, May 27, 2014 at 11:53 PM, Minchan Kim <minchan@...nel.org> wrote:
> >
> > So, my stupid idea is just let's expand stack size and keep an eye
> > toward stack consumption on each kernel functions via stacktrace of ftrace.
.....
> But what *does* stand out (once again) is that we probably shouldn't
> do swap-out in direct reclaim. This came up the last time we had stack
> issues (XFS) too. I really do suspect that direct reclaim should only
> do the kind of reclaim that does not need any IO at all.
> 
> I think we _do_ generally avoid IO in direct reclaim, but swap is
> special. And not for a good reason, afaik. DaveC, remind me, I think
> you said something about the swap case the last time this came up..

Right, we do generally avoid IO through filesystems via direct
reclaim because delayed allocation requires significant amounts
of additional memory, stack space and IO.

However, swap doesn't have that overhead - it's just the IO stack
that it drives through submit_bio(), and the worst case I'd seen
through that path was much less than other reclaim stack path usage.
I haven't seen swap in any of the stack overflows from production
machines, and I only rarely see it in worst case stack usage
profiles on my test machines.

Indeed, the call chain reported here is not caused by swap issuing
IO.  We scheduled in the swap code (throttling waiting for
congestion, I think) with a plugged block device (from the ext4
writeback layer) with pending bios queued on it and the scheduler
has triggered a flush of the device.  submit_bio in the swap path
has much less stack usage than io_schedule() because it doesn't have
any of the scheduler or plug list flushing overhead in the stack.

So, realistically, the swap path is not worst case stack usage here
and disabling it won't prevent this stack overflow from happening.
Direct reclaim will simply throttle elsewhere and that will still
cause the plug to be flushed, the IO to be issued and the stack to
overflow.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/