linux-kernel - Re: [RFC PATCH 0/3] Avoid the use of congestion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4BCE7DD1.70900@linux.vnet.ibm.com>
Date:	Wed, 21 Apr 2010 06:23:45 +0200
From:	Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com>
To:	Rik van Riel <riel@...hat.com>
CC:	Johannes Weiner <hannes@...xchg.org>, Mel Gorman <mel@....ul.ie>,
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	Nick Piggin <npiggin@...e.de>,
	Chris Mason <chris.mason@...cle.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	linux-kernel@...r.kernel.org, gregkh@...ell.com,
	Corrado Zoccolo <czoccolo@...il.com>
Subject: Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure

Rik van Riel wrote:
> On 04/20/2010 11:32 AM, Johannes Weiner wrote:
> 
>> The idea is that it pans out on its own.  If the workload changes, new
>> pages get activated and when that set grows too large, we start shrinking
>> it again.
>>
>> Of course, right now this unscanned set is way too large and we can end
>> up wasting up to 50% of usable page cache on false active pages.
> 
> Thing is, changing workloads often change back.
> 
> Specifically, think of a desktop system that is doing
> work for the user during the day and gets backed up
> at night.
> 
> You do not want the backup to kick the working set
> out of memory, because when the user returns in the
> morning the desktop should come back quickly after
> the screensaver is unlocked.

IMHO it is fine to prevent that nightly backup job from not being 
finished when the user arrives at morning because we didn't give him 
some more cache - and e.g. a 30 sec transition from/to both optimized 
states is fine.
But eventually I guess the point is that both behaviors are reasonable 
to achieve - depending on the users needs.

What we could do is combine all our thoughts we had so far:
a) Rik could create an experimental patch that excludes the in flight pages
b) Johannes could create one for his suggestion to "always scan active 
file pages but only deactivate them when the ratio is off and otherwise 
strip buffers of clean pages"
c) I would extend the patch from Johannes setting the ratio of 
active/inactive pages to be a userspace tunable

a,b,a+b would then need to be tested if they achieve a better behavior.

c on the other hand would be a fine tunable to let administrators 
(knowing their workloads) or distributions (e.g. different values for 
Desktop/Server defaults) adapt their installations.

In theory a,b and c should work fine together in case we need all of them.

> The big question is, what workload suffers from
> having the inactive list at 50% of the page cache?
> 
> So far the only big problem we have seen is on a
> very unbalanced virtual machine, with 256MB RAM
> and 4 fast disks.  The disks simply have more IO
> in flight at once than what fits in the inactive
> list.

Did I get you right that this means the write case - explaining why it 
is building up buffers to the 50% max?

Note: It even uses up to 64 disks, with 1 disk per thread so e.g. 16 
threads => 16 disks.

For being "unbalanced" I'd like to mention that over the years I learned 
that sometimes, after a while, virtualized systems look that way without 
being intended - this happens by adding more and more guests and let 
guest memory balooning take care of it.

> This is a very untypical situation, and we can
> probably solve it by excluding the in-flight pages
> from the active/inactive file calculation.

-- 

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/