[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45E86BA0.50508@redhat.com>
Date: Fri, 02 Mar 2007 13:23:28 -0500
From: Rik van Riel <riel@...hat.com>
To: Christoph Lameter <clameter@...r.sgi.com>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mel@...net.ie>, npiggin@...e.de, mingo@...e.hu,
jschopp@...tin.ibm.com, arjan@...radead.org,
torvalds@...ux-foundation.org, mbligh@...igh.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: The performance and behaviour of the anti-fragmentation related
patches
Christoph Lameter wrote:
> On Fri, 2 Mar 2007, Andrew Morton wrote:
>
>>> One particular case is a 32GB system with a database that takes most
>>> of memory. The amount of actually freeable page cache memory is in
>>> the hundreds of MB.
>> Where's the rest of the memory? tmpfs? mlocked? hugetlb?
>
> The memory is likely in use but there is enough memory free in unmapped
> clean pagecache pages so that we occasionally are able to free pages. Then
> the app is reading more from disk replenishing that ...
> Thus we are forever cycling through the LRU lists moving pages between
> the lists aging etc etc. Can lead to a livelock.
In this particular case, the system even has swap free.
The kernel just chooses not to use it until it has scanned
some memory, due to the way the swappiness algorithm works.
With 32 CPUs diving into the page reclaim simultaneously,
each trying to scan a fraction of memory, this is disastrous
for performance. A 256GB system should be even worse.
>>> A third scenario is where a system has way more RAM than swap, and not
>>> a whole lot of freeable page cache. In this case, the VM ends up
>>> spending WAY too much CPU time scanning and shuffling around essentially
>>> unswappable anonymous memory and tmpfs files.
>> Well we've allegedly fixed that, but it isn't going anywhere without
>> testing.
>
> We have fixed the case in which we compile the kernel without swap. Then
> anonymous pages behave like mlocked pages. Did we do more than that?
Not AFAIK.
I would like to see separate pageout selection queues
for anonymous/tmpfs and page cache backed pages. That
way we can simply scan only that what we want to scan.
There are several ways available to balance pressure
between both sets of lists.
Splitting them out will also make it possible to do
proper use-once replacement for the page cache pages.
Ie. leaving the really active page cache pages on the
page cache active list, instead of deactivating them
because they're lower priority than anonymous pages.
That way we can do a backup without losting the page
cache working set.
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists