[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BCEAAC6.7070602@linux.vnet.ibm.com>
Date: Wed, 21 Apr 2010 09:35:34 +0200
From: Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com>
To: Rik van Riel <riel@...hat.com>
CC: Johannes Weiner <hannes@...xchg.org>, Mel Gorman <mel@....ul.ie>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
Nick Piggin <npiggin@...e.de>,
Chris Mason <chris.mason@...cle.com>,
Jens Axboe <jens.axboe@...cle.com>,
linux-kernel@...r.kernel.org, gregkh@...ell.com,
Corrado Zoccolo <czoccolo@...il.com>
Subject: Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure
Christian Ehrhardt wrote:
>
>
> Rik van Riel wrote:
>> On 04/20/2010 11:32 AM, Johannes Weiner wrote:
>>
>>> The idea is that it pans out on its own. If the workload changes, new
>>> pages get activated and when that set grows too large, we start
>>> shrinking
>>> it again.
>>>
>>> Of course, right now this unscanned set is way too large and we can end
>>> up wasting up to 50% of usable page cache on false active pages.
>>
>> Thing is, changing workloads often change back.
>>
>> Specifically, think of a desktop system that is doing
>> work for the user during the day and gets backed up
>> at night.
>>
>> You do not want the backup to kick the working set
>> out of memory, because when the user returns in the
>> morning the desktop should come back quickly after
>> the screensaver is unlocked.
>
> IMHO it is fine to prevent that nightly backup job from not being
> finished when the user arrives at morning because we didn't give him
> some more cache - and e.g. a 30 sec transition from/to both optimized
> states is fine.
> But eventually I guess the point is that both behaviors are reasonable
> to achieve - depending on the users needs.
>
> What we could do is combine all our thoughts we had so far:
> a) Rik could create an experimental patch that excludes the in flight pages
> b) Johannes could create one for his suggestion to "always scan active
> file pages but only deactivate them when the ratio is off and otherwise
> strip buffers of clean pages"
> c) I would extend the patch from Johannes setting the ratio of
> active/inactive pages to be a userspace tunable
A first revision of patch c is attached.
I tested assigning different percentages, so far e.g. 50 really behave
like before and 25 protects ~42M Buffers in my example which would match
the intended behavior - see patch for more details.
Checkpatch and some basic function tests went fine.
While it may be not perfect yet, I think it is ready for feedback now.
> a,b,a+b would then need to be tested if they achieve a better behavior.
>
> c on the other hand would be a fine tunable to let administrators
> (knowing their workloads) or distributions (e.g. different values for
> Desktop/Server defaults) adapt their installations.
>
> In theory a,b and c should work fine together in case we need all of them.
>
>> The big question is, what workload suffers from
>> having the inactive list at 50% of the page cache?
>>
>> So far the only big problem we have seen is on a
>> very unbalanced virtual machine, with 256MB RAM
>> and 4 fast disks. The disks simply have more IO
>> in flight at once than what fits in the inactive
>> list.
>
> Did I get you right that this means the write case - explaining why it
> is building up buffers to the 50% max?
>
Thinking about it I wondered for what these Buffers are protected.
If the intention to save these buffers is for reuse with similar loads I
wonder why I "need" three iozones to build up the 85M in my case.
Buffers start at ~0, after iozone run 1 they are at ~35, then after #2
~65 and after run #3 ~85.
Shouldn't that either allocate 85M for the first directly in case that
much is needed for a single run - or if not the second and third run
just "resuse" the 35M Buffers from the first run still held?
Note - "1 iozone run" means "iozone ... -i 0" which sequentially writes
and then rewrites a 2Gb file on 16 disks in my current case.
looking forward especially to patch b as I'd really like to see a kernel
able to win back these buffers if they are no more used for a longer
period while still allowing to grow&protect them while needed.
--
GrĂ¼sse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
View attachment "active-inacte-ratio-tunable.diff" of type "text/x-patch" (4673 bytes)
Powered by blists - more mailing lists