[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090816054107.GA15320@localhost>
Date: Sun, 16 Aug 2009 13:41:07 +0800
From: Wu Fengguang <fengguang.wu@...el.com>
To: Balbir Singh <balbir@...ux.vnet.ibm.com>
Cc: Rik van Riel <riel@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Avi Kivity <avi@...hat.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Andrea Arcangeli <aarcange@...hat.com>,
"Dike, Jeffrey G" <jeffrey.g.dike@...el.com>,
"Yu, Wilfred" <wilfred.yu@...el.com>,
"Kleen, Andi" <andi.kleen@...el.com>,
Hugh Dickins <hugh.dickins@...cali.co.uk>,
Andrew Morton <akpm@...ux-foundation.org>,
Christoph Lameter <cl@...ux-foundation.org>,
Mel Gorman <mel@....ul.ie>,
LKML <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>
Subject: Re: [RFC] respect the referenced bit of KVM guest pages?
On Sun, Aug 16, 2009 at 01:09:03PM +0800, Balbir Singh wrote:
> * Wu Fengguang <fengguang.wu@...el.com> [2009-08-15 13:45:24]:
>
> > On Fri, Aug 14, 2009 at 09:19:35PM +0800, Rik van Riel wrote:
> > > Wu Fengguang wrote:
> > > > On Fri, Aug 14, 2009 at 05:10:55PM +0800, Johannes Weiner wrote:
> > >
> > > >> So even with the active list being a FIFO, we keep usage information
> > > >> gathered from the inactive list. If we deactivate pages in arbitrary
> > > >> list intervals, we throw this away.
> > > >
> > > > We do have the danger of FIFO, if inactive list is small enough, so
> > > > that (unconditionally) deactivated pages quickly get reclaimed and
> > > > their life window in inactive list is too small to be useful.
> > >
> > > This one of the reasons why we unconditionally deactivate
> > > the active anon pages, and do background scanning of the
> > > active anon list when reclaiming page cache pages.
> > >
> > > We want to always move some pages to the inactive anon
> > > list, so it does not get too small.
> >
> > Right, the current code tries to pull inactive list out of
> > smallish-size state as long as there are vmscan activities.
> >
> > However there is a possible (and tricky) hole: mem cgroups
> > don't do batched vmscan. shrink_zone() may call shrink_list()
> > with nr_to_scan=1, in which case shrink_list() _still_ calls
> > isolate_pages() with the much larger SWAP_CLUSTER_MAX.
> >
> > It effectively scales up the inactive list scan rate by 10 times when
> > it is still small, and may thus prevent it from growing up for ever.
> >
>
> I think we need to possibly export some scanning data under DEBUG_VM
> to cross verify.
Maybe we can do more general debugging code, but here is a quick patch
for examining the cgroup case. Note that even for the global zones,
max_scan may well not be the multiple of SWAP_CLUSTER_MAX, thus
shrink_inactive_list() will scan a little more in its last loop.
---
mm/vmscan.c | 7 +++++++
1 file changed, 7 insertions(+)
--- linux.orig/mm/vmscan.c 2009-08-16 13:24:25.000000000 +0800
+++ linux/mm/vmscan.c 2009-08-16 13:38:32.000000000 +0800
@@ -1043,6 +1043,13 @@ static unsigned long shrink_inactive_lis
struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
int lumpy_reclaim = 0;
+ if (!scanning_global_lru(sc))
+ printk("shrink inactive %s count=%lu scan=%lu\n",
+ file ? "file" : "anon",
+ mem_cgroup_zone_nr_pages(sc->mem_cgroup, zone,
+ LRU_INACTIVE_ANON + !!file),
+ max_scan);
+
/*
* If we need a large contiguous chunk of memory, or have
* trouble getting a small set of contiguous pages, we
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists