[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BCFEAD0.4010708@linux.vnet.ibm.com>
Date: Thu, 22 Apr 2010 08:21:04 +0200
From: Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com>
To: Rik van Riel <riel@...hat.com>
CC: Johannes Weiner <hannes@...xchg.org>, Mel Gorman <mel@....ul.ie>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
Nick Piggin <npiggin@...e.de>,
Chris Mason <chris.mason@...cle.com>,
Jens Axboe <jens.axboe@...cle.com>,
linux-kernel@...r.kernel.org, gregkh@...ell.com,
Corrado Zoccolo <czoccolo@...il.com>,
Ehrhardt Christian <ehrhardt@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure
Trying to answer and consolidate all open parts of this thread down below.
Rik van Riel wrote:
> On 04/21/2010 03:35 AM, Christian Ehrhardt wrote:
>>
>>
>> Christian Ehrhardt wrote:
>>>
>>>
>>> Rik van Riel wrote:
>>>> On 04/20/2010 11:32 AM, Johannes Weiner wrote:
>>>>
>>>>> The idea is that it pans out on its own. If the workload changes, new
>>>>> pages get activated and when that set grows too large, we start
>>>>> shrinking
>>>>> it again.
>>>>>
>>>>> Of course, right now this unscanned set is way too large and we can
>>>>> end
>>>>> up wasting up to 50% of usable page cache on false active pages.
>>>>
>>>> Thing is, changing workloads often change back.
>>>>
>>>> Specifically, think of a desktop system that is doing
>>>> work for the user during the day and gets backed up
>>>> at night.
>>>>
>>>> You do not want the backup to kick the working set
>>>> out of memory, because when the user returns in the
>>>> morning the desktop should come back quickly after
>>>> the screensaver is unlocked.
>>>
>>> IMHO it is fine to prevent that nightly backup job from not being
>>> finished when the user arrives at morning because we didn't give him
>>> some more cache - and e.g. a 30 sec transition from/to both optimized
>>> states is fine.
>>> But eventually I guess the point is that both behaviors are reasonable
>>> to achieve - depending on the users needs.
>>>
>>> What we could do is combine all our thoughts we had so far:
>>> a) Rik could create an experimental patch that excludes the in flight
>>> pages
>>> b) Johannes could create one for his suggestion to "always scan active
>>> file pages but only deactivate them when the ratio is off and
>>> otherwise strip buffers of clean pages"
>
> I think you are confusing "buffer heads" with "buffers".
>
> You can strip buffer heads off pages, but that is not
> your problem.
>
> "buffers" in /proc/meminfo stands for cached metadata,
> eg. the filesystem journal, inodes, directories, etc...
> Caching such metadata is legitimate, because it reduces
> the number of disk seeks down the line.
Yeah I mixed that as well, thanks for clarification (Johannes wrote a
similar response effectively kicking b) from the list of things we could
do).
Regarding your question from thread reply#3
> How on earth would a backup job benefit from cache?
>
> It only accesses each bit of data once, so caching the
> to-be-backed-up data is a waste of memory.
If it is a low memory system with a lot of disks (like in my case)
giving it more cache allows e.g. larger readaheads or less cache
trashing - but it might be ok, as it might be rare case to hit all those
constraints at once.
But as we discussed before on virtual servers it can happen from time to
time due to balooning and much more disk attachments etc.
So definitely not the majority of cases around, but some corner cases
here and there that would benefit at least from making the preserved
ratio configurable if we don't find a good way to let it take the memory
back without hurting the intended preservation functionality.
For that reason - how about the patch I posted yesterday (to consolidate
this spread out thread I attach it here again)
And finally I still would like to understand why writing the same files
three times increase the active file pages each time instead of reusing
those already brought into memory by the first run.
To collect that last open thread as well I'll cite my own question here:
> Thinking about it I wondered for what these Buffers are protected.
> If the intention to save these buffers is for reuse with similar
loads > I wonder why I "need" three iozones to build up the 85M in my case.
> Buffers start at ~0, after iozone run 1 they are at ~35, then after
#2 > ~65 and after run #3 ~85.
> Shouldn't that either allocate 85M for the first directly in case
that > much is needed for a single run - or if not the second and third
run > > just "resuse" the 35M Buffers from the first run still held?
> Note - "1 iozone run" means "iozone ... -i 0" which sequentially
> writes and then rewrites a 2Gb file on 16 disks in my current case.
Trying to answering this question my self using your buffer details
above doesn't completely fit without further clarification, as the same
files should have the same dir, inode, ... (all ext2 in my case, so no
journal data as well).
--
GrĂ¼sse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
View attachment "active-inacte-ratio-tunable.diff" of type "text/x-patch" (4673 bytes)
Powered by blists - more mailing lists