linux-kernel - Re: [patch 0/9] mm: thrash detection-based file cache sizing v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130812221524.GU715@cmpxchg.org>
Date:	Mon, 12 Aug 2013 18:15:24 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-mm@...ck.org, Andi Kleen <andi@...stfloor.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Greg Thelen <gthelen@...gle.com>,
	Christoph Hellwig <hch@...radead.org>,
	Hugh Dickins <hughd@...gle.com>, Jan Kara <jack@...e.cz>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Mel Gorman <mgorman@...e.de>,
	Minchan Kim <minchan.kim@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Rik van Riel <riel@...hat.com>,
	Michel Lespinasse <walken@...gle.com>,
	Seth Jennings <sjenning@...ux.vnet.ibm.com>,
	Roman Gushchin <klamm@...dex-team.ru>,
	Ozgun Erdogan <ozgun@...usdata.com>,
	Metin Doslu <metin@...usdata.com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 0/9] mm: thrash detection-based file cache sizing v3

On Fri, Aug 09, 2013 at 03:53:09PM -0700, Andrew Morton wrote:
> On Tue,  6 Aug 2013 18:44:01 -0400 Johannes Weiner <hannes@...xchg.org> wrote:
> 
> > This series solves the problem by maintaining a history of pages
> > evicted from the inactive list, enabling the VM to tell streaming IO
> > from thrashing and rebalance the page cache lists when appropriate.
> 
> Looks nice. The lack of testing results is conspicuous ;)
> 
> It only really solves the problem in the case where
> 
> 	size-of-inactive-list < size-of-working-set < size-of-total-memory
> 
> yes?  In fact less than that, because the active list presumably
> doesn't get shrunk to zero (how far *can* it go?).

It can theoretically shrink to 0 if the replacing working set needs
exactly 100% of the available memory and is perfectly sequential and
the page allocator is 100% fair.  So in practice, it probably won't.

It's more likely that after some active pages have been deactivated
and pushed out of memory that new pages get a chance to get activated
so there will always be some pages on the active list.

> I wonder how many workloads fit into those constraints in the real
> world.

If the working set exceeds memory and the reference frequency is the
same for each page in the set, there is nothing we can reasonably do
to cache.

If the working set exceeds memory and all reference distances are
bigger than memory but not all equal to each other, it would be great
to be able to detect the more frequently used pages and prefer
reclaiming the others over them.  But I don't think that's actually
possible without a true LRU algorithm (as opposed to our
approximation) because we would need to know about reference distances
in the active page list and compare them to the refault distances.

So yes, this algorithm is limited to interpreting reference distances
up to memory size.

The development of this was kicked off by actual bug reports and I'm
working with the reporters to get these patches tested in the
production environments that exhibited the problem.  The reporters
always had usecases where the working set should have fit into memory
but wasn't cached even after repeatedly referencing it, that's why
they complained in the first place.  So it's hard to tell how many
environments fall into this category, but they certainly do exist,
they are not unreasonable setups, and the behavior is pretty abysmal
(most accesses major faults when everything should fit in memory).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/