[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <52C2A564.4040809@linux.vnet.ibm.com>
Date: Tue, 31 Dec 2013 16:37:16 +0530
From: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Andrew Morton <akpm@...ux-foundation.org>, Jan Kara <jack@...e.cz>,
Fengguang Wu <fengguang.wu@...el.com>,
David Cohen <david.a.cohen@...ux.intel.com>,
Al Viro <viro@...iv.linux.org.uk>,
Damien Ramonda <damien.ramonda@...el.com>,
linux-mm <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC] mm readahead: Fix the readahead fail in case of empty
numa node
On 12/14/2013 06:09 AM, Linus Torvalds wrote:
> On Wed, Dec 11, 2013 at 3:05 PM, Andrew Morton
> <akpm@...ux-foundation.org> wrote:
>>
>> But I'm really struggling to think up an implementation! The current
>> code looks only at the caller's node and doesn't seem to make much
>> sense. Should we look at all nodes? Hard to say without prior
>> knowledge of where those pages will be coming from.
>
> I really think we want to put an upper bound on the read-ahead, and
> I'm not convinced we need to try to be excessively clever about it. We
> also probably don't want to make it too expensive to calculate,
> because afaik this ends up being called for each file we open when we
> don't have pages in the page cache yet.
>
> The current function seems reasonable on a single-node system. Let's
> not kill it entirely just because it has some odd corner-case on
> multi-node systems.
>
> In fact, for all I care, I think it would be perfectly ok to just use
> a truly stupid hard limit ("you can't read-ahead more than 16MB" or
> whatever).
>
> What we do *not* want to allow is to have people call "readahead"
> functions and basically kill the machine because you now have a
> unkillable IO that is insanely big. So I'd much rather limit it too
> much than too little. And on absolutely no sane IO susbsystem does it
> make sense to read ahead insane amounts.
>
> So I'd rather limit it to something stupid and small, than to not
> limit things at all.
>
> Looking at the interface, for example, the natural thing to do for the
> "readahead()" system call, for example, is to just give it a size of
> ~0ul, and let the system limit things, becaue limiting things in useer
> space is just not reasonable.
>
> So I really do *not* think it's fine to just remove the limit entirely.
>
Very sorry for late reply (was on very loong vacation).
How about having 16MB limit only for remote readaheads and continuing
the rest as is, something like below:
#define MAX_REMOTE_READAHEAD 4096UL
unsigned long max_sane_readahead(unsigned long nr)
{
unsigned long local_free_page = (node_page_state(numa_node_id(),
NR_INACTIVE_FILE)
+ node_page_state(numa_node_id(), NR_FREE_PAGES));
unsigned long sane_nr = min(nr, MAX_REMOTE_READAHEAD);
return (local_free_page ? min(nr, local_free_page / 2) : sane_nr);
}
or we can enforce 16MB limit for all the case too.
I 'll send a patch accordingly.
(readahead max will scale accordingly if we dont have 4k page size
above).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists