[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100414112015.GO13327@think>
Date: Wed, 14 Apr 2010 07:20:15 -0400
From: Chris Mason <chris.mason@...cle.com>
To: Andi Kleen <andi@...stfloor.org>
Cc: Mel Gorman <mel@....ul.ie>, Dave Chinner <david@...morbit.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
On Wed, Apr 14, 2010 at 12:06:36PM +0200, Andi Kleen wrote:
> Chris Mason <chris.mason@...cle.com> writes:
> >
> > Huh, 912 bytes...for select, really? From poll.h:
> >
> > /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
> > additional memory. */
> > #define MAX_STACK_ALLOC 832
> > #define FRONTEND_STACK_ALLOC 256
> > #define SELECT_STACK_ALLOC FRONTEND_STACK_ALLOC
> > #define POLL_STACK_ALLOC FRONTEND_STACK_ALLOC
> > #define WQUEUES_STACK_ALLOC (MAX_STACK_ALLOC - FRONTEND_STACK_ALLOC)
> > #define N_INLINE_POLL_ENTRIES (WQUEUES_STACK_ALLOC / sizeof(struct poll_table_entry))
> >
> > So, select is intentionally trying to use that much stack. It should be using
> > GFP_NOFS if it really wants to suck down that much stack...
>
> There are lots of other call chains which use multiple KB bytes by itself,
> so why not give select() that measly 832 bytes?
>
> You think only file systems are allowed to use stack? :)
Grin, most definitely.
>
> Basically if you cannot tolerate 1K (or more likely more) of stack
> used before your fs is called you're toast in lots of other situations
> anyways.
Well, on a 4K stack kernel, 832 bytes is a very large percentage for
just one function.
Direct reclaim is a problem because it splices parts of the kernel that
normally aren't connected together. The people that code in select see
832 bytes and say that's teeny, I should have taken 3832 bytes.
But they don't realize their function can dive down into ecryptfs then
the filesystem then maybe loop and then perhaps raid6 on top of a
network block device.
>
> > kernel had some sort of way to dynamically allocate ram, it could try
> > that too.
>
> It does this for large inputs, but the whole point of the stack fast
> path is to avoid it for common cases when a small number of fds is
> only needed.
>
> It's significantly slower to go to any external allocator.
Yeah, but since the call chain does eventually go into the allocator,
this function needs to be more stack friendly.
I do agree that we can't really solve this with noinline_for_stack pixie
dust, the long call chains are going to be a problem no matter what.
Reading through all the comments so far, I think the short summary is:
Cleaning pages in direct reclaim helps the VM because it is able to make
sure that lumpy reclaim finds adjacent pages. This isn't a fast
operation, it has to wait for IO (infinitely slow compared to the CPU).
Will it be good enough for the VM if we add a hint to the bdi writeback
threads to work on a general area of the file? The filesystem will get
writepages(), the VM will get the IO it needs started.
I know Mel mentioned before he wasn't interested in waiting for helper
threads, but I don't see how we can work without it.
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists