lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALWz4iwJfyWRineMy+W02YBvS0Y=Pv1y8Rb=8i5R=vUCfrO+iQ@mail.gmail.com>
Date:	Mon, 29 Aug 2011 00:15:57 -0700
From:	Ying Han <yinghan@...gle.com>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Michal Hocko <mhocko@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Minchan Kim <minchan.kim@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Mel Gorman <mgorman@...e.de>, Greg Thelen <gthelen@...gle.com>,
	Michel Lespinasse <walken@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Hugh Dickins <hughd@...gle.com>
Subject: Re: [patch 2/8] mm: memcg-aware global reclaim

On Thu, Aug 11, 2011 at 2:09 PM, Johannes Weiner <hannes@...xchg.org> wrote:
>
> On Thu, Aug 11, 2011 at 01:39:45PM -0700, Ying Han wrote:
> > Please consider including the following patch for the next post. It causes
> > crash on some of the tests where sc->mem_cgroup is NULL (global kswapd).
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index b72a844..12ab25d 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2768,7 +2768,8 @@ loop_again:
> >                          * Do some background aging of the anon list, to
> > give
> >                          * pages a chance to be referenced before
> > reclaiming.
> >                          */
> > -                       if (inactive_anon_is_low(zone, &sc))
> > +                       if (scanning_global_lru(&sc) &&
> > +                                       inactive_anon_is_low(zone, &sc))
> >                                 shrink_active_list(SWAP_CLUSTER_MAX, zone,
> >                                                         &sc, priority, 0);
>
> Thanks!  I completely overlooked this one and only noticed it after
> changing the arguments to shrink_active_list().
>
> On memcg configurations, scanning_global_lru() will essentially never
> be true again, so I moved the anon pre-aging to a separate function
> that also does a hierarchy loop to preage the per-memcg anon lists.
>
> I hope to send out the next revision soon.

Also, please consider to fold in the following patch as well. It fixes
the root cgroup lru accounting and we could easily trigger OOM while
doing some swapoff test w/o it.

mm:fix the lru accounting for root cgroup.

This patch is applied on top of:
"
mm: memcg-aware global reclaim
Signed-off-by: Johannes Weiner <hannes@...xchg.org>
"

This patch fixes the lru accounting for root cgroup.

After the "memcg-aware global reclaim" patch, one of the changes is to have
lru pages linked back to root. Under the global memory pressure, we start from
the root cgroup lru and walk through the memcg hierarchy of the system. For
each memcg, we reclaim pages based on the its lru size.

However for root cgroup, we used not having a seperate lru and only counting
the pages charged to root as part of root lru size. Without this patch, all
the pages which are linked to root lru but not charged to root like swapcache
readahead are not visible to page reclaim code and we are easily to get OOM.

After this patch, all the pages linked under root lru are counted in the lru
size, including Used and !Used.

Signed-off-by: Hugh Dickins <hughd@...gle.com>
Signed-off-by: Ying Han <yinghan@...gle.com>

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5518f54..f6c5f29 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -888,19 +888,21 @@ void mem_cgroup_del_lru_list(struct page *page,
enum lru_list lru)
 {
 >------struct page_cgroup *pc;
 >------struct mem_cgroup_per_zone *mz;
+>------struct mem_cgroup *mem;
·
 >------if (mem_cgroup_disabled())
 >------>-------return;
 >------pc = lookup_page_cgroup(page);
->------/* can happen while we handle swapcache. */
->------if (!TestClearPageCgroupAcctLRU(pc))
->------>-------return;
->------VM_BUG_ON(!pc->mem_cgroup);
->------/*
->------ * We don't check PCG_USED bit. It's cleared when the "page" is finally
->------ * removed from global LRU.
->------ */
->------mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
+
+>------if (TestClearPageCgroupAcctLRU(pc) || PageCgroupUsed(pc)) {
+>------>-------VM_BUG_ON(!pc->mem_cgroup);
+>------>-------mem = pc->mem_cgroup;
+>------} else {
+>------>-------/* can happen while we handle swapcache. */
+>------>-------mem = root_mem_cgroup;
+>------}
+
+>------mz = page_cgroup_zoneinfo(mem, page);
 >------MEM_CGROUP_ZSTAT(mz, lru) -= 1;
 >------VM_BUG_ON(list_empty(&pc->lru));
 >------list_del_init(&pc->lru);
@@ -961,22 +963,31 @@ void mem_cgroup_add_lru_list(struct page *page,
enum lru_list lru)
 {
 >------struct page_cgroup *pc;
 >------struct mem_cgroup_per_zone *mz;
+>------struct mem_cgroup *mem;
·
 >------if (mem_cgroup_disabled())
 >------>-------return;
 >------pc = lookup_page_cgroup(page);
 >------VM_BUG_ON(PageCgroupAcctLRU(pc));
->------/*
->------ * Used bit is set without atomic ops but after smp_wmb().
->------ * For making pc->mem_cgroup visible, insert smp_rmb() here.
->------ */
->------smp_rmb();
->------if (!PageCgroupUsed(pc))
->------>-------return;
·
->------mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
+>------if (PageCgroupUsed(pc)) {
+>------>-------/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
+>------>-------smp_rmb();
+>------>-------mem = pc->mem_cgroup;
+>------>-------SetPageCgroupAcctLRU(pc);
+>------} else {
+>------>-------/*
+>------>------- * If the page is no longer charged, add it to the
+>------>------- * root memcg's lru.  Either it will be freed soon, or
+>------>------- * it will get charged again and the charger will
+>------>------- * relink it to the right list.
+>------>-------mem = root_mem_cgroup;
+>------}
+
+>------mz = page_cgroup_zoneinfo(mem, page);
 >------MEM_CGROUP_ZSTAT(mz, lru) += 1;
->------SetPageCgroupAcctLRU(pc);
+
 >------list_add(&pc->lru, &mz->lists[lru]);
 }

--Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ