[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130603181202.GI15576@cmpxchg.org>
Date: Mon, 3 Jun 2013 14:12:02 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-mm@...ck.org, Andi Kleen <andi@...stfloor.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Greg Thelen <gthelen@...gle.com>,
Christoph Hellwig <hch@...radead.org>,
Hugh Dickins <hughd@...gle.com>, Jan Kara <jack@...e.cz>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Mel Gorman <mgorman@...e.de>,
Minchan Kim <minchan.kim@...il.com>,
Rik van Riel <riel@...hat.com>,
Michel Lespinasse <walken@...gle.com>,
Seth Jennings <sjenning@...ux.vnet.ibm.com>,
Roman Gushchin <klamm@...dex-team.ru>,
metin d <metdos@...oo.com>, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [patch 10/10] mm: workingset: keep shadow entries in check
On Mon, Jun 03, 2013 at 07:15:58PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 03, 2013 at 11:20:32AM -0400, Johannes Weiner wrote:
> > On Mon, Jun 03, 2013 at 10:25:33AM +0200, Peter Zijlstra wrote:
> > > On Thu, May 30, 2013 at 02:04:06PM -0400, Johannes Weiner wrote:
> > > > Previously, page cache radix tree nodes were freed after reclaim
> > > > emptied out their page pointers. But now reclaim stores shadow
> > > > entries in their place, which are only reclaimed when the inodes
> > > > themselves are reclaimed. This is problematic for bigger files that
> > > > are still in use after they have a significant amount of their cache
> > > > reclaimed, without any of those pages actually refaulting. The shadow
> > > > entries will just sit there and waste memory. In the worst case, the
> > > > shadow entries will accumulate until the machine runs out of memory.
> > > >
> > >
> > > Can't we simply prune all refault entries that have a distance larger
> > > than the memory size? Then we must assume that no refault entry means
> > > its too old, which I think is a fair assumption.
> >
> > Two workloads bound to two nodes might not push pages through the LRUs
> > at the same pace, so a distance might be bigger than memory due to the
> > faster moving node, yet still be a hit in the slower moving one. We
> > can't really know until we evaluate it on a per-zone basis.
>
> But wasn't patch 1 of this series about making sure each zone is scanned
> proportionally to its size?
Only within any given zonelist. It's just so that pages used together
are aged fairly. But if the tasks are isolated from each other their
pages may age at different paces without it being unfair since the
tasks do not contend for the same memory.
> But given that, sure maybe 1 memory size is a bit strict, but surely we
> can put a limit on things at about 2 memory sizes?
That's what this 10/10 patch does (prune everything older than 2 *
global_dirtyable_memory()), so I think we're talking past each other.
Maybe the wording of the changelog was confusing? The paragraph you
quoted above explains the problem resulting from 9/10 but which this
patch 10/10 fixes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists