linux-kernel - Re: [RFC PATCH 0/3] Avoid the use of congestion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BCFEAD0.4010708@linux.vnet.ibm.com>
Date:	Thu, 22 Apr 2010 08:21:04 +0200
From:	Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com>
To:	Rik van Riel <riel@...hat.com>
CC:	Johannes Weiner <hannes@...xchg.org>, Mel Gorman <mel@....ul.ie>,
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	Nick Piggin <npiggin@...e.de>,
	Chris Mason <chris.mason@...cle.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	linux-kernel@...r.kernel.org, gregkh@...ell.com,
	Corrado Zoccolo <czoccolo@...il.com>,
	Ehrhardt Christian <ehrhardt@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure

Trying to answer and consolidate all open parts of this thread down below.

Rik van Riel wrote:
> On 04/21/2010 03:35 AM, Christian Ehrhardt wrote:
>>
>>
>> Christian Ehrhardt wrote:
>>>
>>>
>>> Rik van Riel wrote:
>>>> On 04/20/2010 11:32 AM, Johannes Weiner wrote:
>>>>
>>>>> The idea is that it pans out on its own. If the workload changes, new
>>>>> pages get activated and when that set grows too large, we start
>>>>> shrinking
>>>>> it again.
>>>>>
>>>>> Of course, right now this unscanned set is way too large and we can 
>>>>> end
>>>>> up wasting up to 50% of usable page cache on false active pages.
>>>>
>>>> Thing is, changing workloads often change back.
>>>>
>>>> Specifically, think of a desktop system that is doing
>>>> work for the user during the day and gets backed up
>>>> at night.
>>>>
>>>> You do not want the backup to kick the working set
>>>> out of memory, because when the user returns in the
>>>> morning the desktop should come back quickly after
>>>> the screensaver is unlocked.
>>>
>>> IMHO it is fine to prevent that nightly backup job from not being
>>> finished when the user arrives at morning because we didn't give him
>>> some more cache - and e.g. a 30 sec transition from/to both optimized
>>> states is fine.
>>> But eventually I guess the point is that both behaviors are reasonable
>>> to achieve - depending on the users needs.
>>>
>>> What we could do is combine all our thoughts we had so far:
>>> a) Rik could create an experimental patch that excludes the in flight
>>> pages
>>> b) Johannes could create one for his suggestion to "always scan active
>>> file pages but only deactivate them when the ratio is off and
>>> otherwise strip buffers of clean pages"
> 
> I think you are confusing "buffer heads" with "buffers".
> 
> You can strip buffer heads off pages, but that is not
> your problem.
> 
> "buffers" in /proc/meminfo stands for cached metadata,
> eg. the filesystem journal, inodes, directories, etc...
> Caching such metadata is legitimate, because it reduces
> the number of disk seeks down the line.

Yeah I mixed that as well, thanks for clarification (Johannes wrote a 
similar response effectively kicking b) from the list of things we could 
do).

Regarding your question from thread reply#3
 > How on earth would a backup job benefit from cache?
 >
 > It only accesses each bit of data once, so caching the
 > to-be-backed-up data is a waste of memory.

If it is a low memory system with a lot of disks (like in my case) 
giving it more cache allows e.g. larger readaheads or less cache 
trashing - but it might be ok, as it might be rare case to hit all those 
constraints at once.
But as we discussed before on virtual servers it can happen from time to 
time due to balooning and much more disk attachments etc.



So definitely not the majority of cases around, but some corner cases 
here and there that would benefit at least from making the preserved 
ratio configurable if we don't find a good way to let it take the memory 
back without hurting the intended preservation functionality.

For that reason - how about the patch I posted yesterday (to consolidate 
this spread out thread I attach it here again)



And finally I still would like to understand why writing the same files 
three times increase the active file pages each time instead of reusing 
those already brought into memory by the first run.
To collect that last open thread as well I'll cite my own question here:

 > Thinking about it I wondered for what these Buffers are protected.
 > If the intention to save these buffers is for reuse with similar 
loads > I wonder why I "need" three iozones to build up the 85M in my case.

 > Buffers start at ~0, after iozone run 1 they are at ~35, then after 
#2 > ~65 and after run #3 ~85.
 > Shouldn't that either allocate 85M for the first directly in case 
that > much is needed for a single run - or if not the second and third 
run > > just "resuse" the 35M Buffers from the first run still held?

 > Note - "1 iozone run" means "iozone ... -i 0" which sequentially
 > writes and then rewrites a 2Gb file on 16 disks in my current case.

Trying to answering this question my self using your buffer details 
above doesn't completely fit without further clarification, as the same 
files should have the same dir, inode, ... (all ext2 in my case, so no 
journal data as well).


-- 

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance

View attachment "active-inacte-ratio-tunable.diff" of type "text/x-patch" (4673 bytes)