linux-kernel - Re: [PATCH] RFC: vmscan: add min_filelist_kbytes sysctl for protecting the working set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4CCF8151.3010202@redhat.com>
Date:	Mon, 01 Nov 2010 23:11:13 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Mandeep Singh Baines <msb@...omium.org>
CC:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mel@....ul.ie>,
	Minchan Kim <minchan.kim@...il.com>,
	Johannes Weiner <hannes@...xchg.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, wad@...omium.org,
	olofj@...omium.org, hughd@...omium.org
Subject: Re: [PATCH] RFC: vmscan: add min_filelist_kbytes sysctl for protecting
 the working set

On 11/01/2010 03:43 PM, Mandeep Singh Baines wrote:

> Yes, this prevents you from reclaiming the active list all at once. But if the
> memory pressure doesn't go away, you'll start to reclaim the active list
> little by little. First you'll empty the inactive list, and then
> you'll start scanning
> the active list and pulling pages from inactive to active. The problem is that
> there is no minimum time limit to how long a page will sit in the inactive list
> before it is reclaimed. Just depends on scan rate which does not depend
> on time.
>
> In my experiments, I saw the active list get smaller and smaller
> over time until eventually it was only a few MB at which point the system came
> grinding to a halt due to thrashing.

I believe that changing the active/inactive ratio has other
potential thrashing issues.  Specifically, when the inactive
list is too small, pages may not stick around long enough to
be accessed multiple times and get promoted to the active
list, even when they are in active use.

I prefer a more flexible solution, that automatically does
the right thing.

The problem you see is that the file list gets reclaimed
very quickly, even when it is already very small.

I wonder if a possible solution would be to limit how fast
file pages get reclaimed, when the page cache is very small.
Say, inactive_file * active_file < 2 * zone->pages_high ?

At that point, maybe we could slow down the reclaiming of
page cache pages to be significantly slower than they can
be refilled by the disk.  Maybe 100 pages a second - that
can be refilled even by an actual spinning metal disk
without even the use of readahead.

That can be rounded up to one batch of SWAP_CLUSTER_MAX
file pages every 1/4 second, when the number of page cache
pages is very low.

This way HPC and virtual machine hosting nodes can still
get rid of totally unused page cache, but on any system
that actually uses page cache, some minimal amount of
cache will be protected under heavy memory pressure.

Does this sound like a reasonable approach?

I realize the threshold may have to be tweaked...

The big question is, how do we integrate this with the
OOM killer?  Do we pretend we are out of memory when
we've hit our file cache eviction quota and kill something?

Would there be any downsides to this approach?

Are there any volunteers for implementing this idea?
(Maybe someone who needs the feature?)

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/