linux-kernel - Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 24 Jan 2012 18:08:21 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Hillf Danton <dhillf@...il.com>, linux-mm@...ck.org,
	Michal Hocko <mhocko@...e.cz>, Ying Han <yinghan@...gle.com>,
	Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed

On Tue, 24 Jan 2012 09:33:47 +0100
Johannes Weiner <hannes@...xchg.org> wrote:

> On Mon, Jan 23, 2012 at 08:30:42PM +0800, Hillf Danton wrote:
> > On Mon, Jan 23, 2012 at 6:47 PM, Johannes Weiner <hannes@...xchg.org> wrote:
> > > On Mon, Jan 23, 2012 at 09:55:07AM +0800, Hillf Danton wrote:
> > >> To avoid reduction in performance of reclaimee, checking overreclaim is added
> > >> after shrinking lru list, when pages are reclaimed from mem cgroup.
> > >>
> > >> If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
> > >> reclaim for reclaim/compaction.
> > >>
> > >> Signed-off-by: Hillf Danton <dhillf@...il.com>
> > >> ---
> > >>
> > >> --- a/mm/vmscan.c     Mon Jan 23 00:23:10 2012
> > >> +++ b/mm/vmscan.c     Mon Jan 23 09:57:20 2012
> > >> @@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
> > >>       unsigned long nr_reclaimed, nr_scanned;
> > >>       unsigned long nr_to_reclaim = sc->nr_to_reclaim;
> > >>       struct blk_plug plug;
> > >> +     bool memcg_over_reclaimed = false;
> > >>
> > >>  restart:
> > >>       nr_reclaimed = 0;
> > >> @@ -2103,6 +2104,11 @@ restart:
> > >>
> > >>                               nr_reclaimed += shrink_list(lru, nr_to_scan,
> > >>                                                           mz, sc, priority);
> > >> +
> > >> +                             memcg_over_reclaimed = !scanning_global_lru(mz)
> > >> +                                     && (nr_reclaimed >= nr_to_reclaim);
> > >> +                             if (memcg_over_reclaimed)
> > >> +                                     goto out;
> > >
> > > Since this merge window, scanning_global_lru() is always false when
> > > the memory controller is enabled, i.e. most common configurations and
> > > distribution kernels.
> > >
> > > This will with quite likely have bad effects on zone balancing,
> > > pressure balancing between anon/file lru etc, while you haven't shown
> > > that any workloads actually benefit from this.
> > >
> > Hi Johannes
> > 
> > Thanks for your comment, first.
> > 
> > Impact on zone balance and lru-list balance is introduced actually, but I
> > dont think the patch is totally responsible for the balance mentioned,
> > because soft limit, embedded in mem cgroup, is setup by users according to
> > whatever tastes they have.
> > 
> > Though there is room for the patch to be fine tuned in this direction or that,
> > over reclaim should not be neglected entirely, but be avoided as much as we
> > could, or users are enforced to set up soft limit with much care not to mess
> > up zone balance.
> 
> Overreclaim is absolutely horrible with soft limits, but I think there
> are more direct reasons than checking nr_to_reclaim only after a full
> zone scan, for example, soft limit reclaim is invoked on zones that
> are totally fine.
> 


IIUC..
 - Because zonelist is all visited by alloc_pages(), _all_ zones in zonelist
   are in memory shortage.
 - taking care of zone/node balancing. 

I know this 'full zone scan' affects latency of alloc_pages() if the number
of node is big.

IMHO, in case of direct-reclaim caused by memcg's limit, we should avoid
full zone scan because the reclaim is not caused by any memory shortage in zonelist.

In case of global memory reclaim, kswapd doesn't use zonelist.

So, only global-direct-reclaim is a problem here.
I think do-full-zone-scan will reduce the calls of try_to_free_pages() 
in future and may reduce lock contention but adds a thread too much
penalty.

In typical case, considering 4-node x86/64 NUMA, GFP_HIGHUSER_MOVABLE
allocation failure will reclaim 4*ZONE_NORMAL+ZONE_DMA32 = 160pages per scan.

If 16-node, it will be 16*ZONE_NORMAL+ZONE_DMA32 = 544? pages per scan.

32pages may be too small but don't we need to have some threshold to quit
full-zone-scan ?

Here, the topic is about softlimit reclaim. I think...

1. follow up for following comment(*) is required.
==
                        nr_soft_scanned = 0;
                        nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
                                                sc->order, sc->gfp_mask,
                                                &nr_soft_scanned);
                        sc->nr_reclaimed += nr_soft_reclaimed;
                        sc->nr_scanned += nr_soft_scanned;
                        /* need some check for avoid more shrink_zone() */ <----(*)
==

2. some threshold for avoinding full zone scan may be good.
   (But this may need deep discussion...)

3. About the patch, I think it will not break zone-balancing if (*) is
   handled in a good way.

   This check is not good.

+				memcg_over_reclaimed = !scanning_global_lru(mz)
+					&& (nr_reclaimed >= nr_to_reclaim);

   
  I like following 

  If (we-are-doing-softlimit-reclaim-for-global-direct-reclaim &&
      res_counter_soft_limit_excess(memcg->res))
       memcg_over_reclaimed = true;

Then another memcg will be picked up and soft-limit-reclaim() will continue.

Thanks,
-Kame











--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/