lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140529015830.GG6677@dastard>
Date:	Thu, 29 May 2014 11:58:30 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Minchan Kim <minchan@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-mm <linux-mm@...ck.org>, "H. Peter Anvin" <hpa@...or.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Hugh Dickins <hughd@...gle.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Dave Hansen <dave.hansen@...el.com>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 2/2] x86_64: expand kernel stack to 16K

On Thu, May 29, 2014 at 11:30:07AM +1000, Dave Chinner wrote:
> On Wed, May 28, 2014 at 03:41:11PM -0700, Linus Torvalds wrote:
> commit a237c1c5bc5dc5c76a21be922dca4826f3eca8ca
> Author: Jens Axboe <jaxboe@...ionio.com>
> Date:   Sat Apr 16 13:27:55 2011 +0200
> 
>     block: let io_schedule() flush the plug inline
>     
>     Linus correctly observes that the most important dispatch cases
>     are now done from kblockd, this isn't ideal for latency reasons.
>     The original reason for switching dispatches out-of-line was to
>     avoid too deep a stack, so by _only_ letting the "accidental"
>     flush directly in schedule() be guarded by offload to kblockd,
>     we should be able to get the best of both worlds.
>     
>     So add a blk_schedule_flush_plug() that offloads to kblockd,
>     and only use that from the schedule() path.
>     
>     Signed-off-by: Jens Axboe <jaxboe@...ionio.com>
> 
> And now we have too deep a stack due to unplugging from io_schedule()...

So, if we make io_schedule() push the plug list off to the kblockd
like is done for schedule()....

> > IOW, swap-out directly caused that extra 3kB of stack use in what was
> > a deep call chain (due to memory allocation). I really don't
> > understand why you are arguing anything else on a pure technicality.
> >
> > I thought you had some other argument for why swap was different, and
> > against removing that "page_is_file_cache()" special case in
> > shrink_page_list().
> 
> I've said in the past that swap is different to filesystem
> ->writepage implementations because it doesn't require significant
> stack to do block allocation and doesn't trigger IO deep in that
> allocation stack. Hence it has much lower stack overhead than the
> filesystem ->writepage implementations and so is much less likely to
> have stack issues.
> 
> This stack overflow shows us that just the memory reclaim + IO
> layers are sufficient to cause a stack overflow,

.... we solve this problem directly by being able to remove the IO
stack usage from the direct reclaim swap path.

IOWs, we don't need to turn swap off at all in direct reclaim
because all the swap IO can be captured in a plug list and
dispatched via kblockd. This could be done either by io_schedule()
or a new blk_flush_plug_list() wrapper that pushes the work to
kblockd...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ