lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20110117092529.0708bc97.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Mon, 17 Jan 2011 09:25:29 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>
Cc:	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
	Greg Thelen <gthelen@...gle.com>, hannes@...xchg.org,
	aarcange@...hat.com,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Subject: Re: [PATCH 4/4] [BUGFIX] fix account leak at force_empty, rmdir
 with THP

On Mon, 17 Jan 2011 09:15:33 +0900
Daisuke Nishimura <nishimura@....nes.nec.co.jp> wrote:

> Hi, thank you for your great works!
> 
> I've not read this series in detail, but one quick comment for move_parent.
> 
> > @@ -2245,6 +2253,7 @@ static int mem_cgroup_move_parent(struct
> >  	struct cgroup *cg = child->css.cgroup;
> >  	struct cgroup *pcg = cg->parent;
> >  	struct mem_cgroup *parent;
> > +	int charge_size = PAGE_SIZE;
> >  	int ret;
> >  
> >  	/* Is ROOT ? */
> > @@ -2256,16 +2265,19 @@ static int mem_cgroup_move_parent(struct
> >  		goto out;
> >  	if (isolate_lru_page(page))
> >  		goto put;
> > +	/* The page is isolated from LRU and we have no race with splitting */
> > +	if (PageTransHuge(page))
> > +		charge_size = PAGE_SIZE << compound_order(page);
> >  
> >  	parent = mem_cgroup_from_cont(pcg);
> >  	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false,
> > -				      PAGE_SIZE);
> > +				      charge_size);
> >  	if (ret || !parent)
> >  		goto put_back;
> >  
> > -	ret = mem_cgroup_move_account(pc, child, parent, true);
> > +	ret = mem_cgroup_move_account(pc, child, parent, true, charge_size);
> >  	if (ret)
> > -		mem_cgroup_cancel_charge(parent, PAGE_SIZE);
> > +		mem_cgroup_cancel_charge(parent, charge_size);
> >  put_back:
> >  	putback_lru_page(page);
> >  put:
> I think there is possibility that the page is split after "if (PageTransHuge(page))".
> 
> In RHEL6, this part looks like:
> 
>    1598         if (PageTransHuge(page))
>    1599                 page_size = PAGE_SIZE << compound_order(page);
>    1600
>    1601         ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, page,
>    1602                                       page_size);
>    1603         if (ret || !parent)
>    1604                 return ret;
>    1605
>    1606         if (!get_page_unless_zero(page)) {
>    1607                 ret = -EBUSY;
>    1608                 goto uncharge;
>    1609         }
>    1610
>    1611         ret = isolate_lru_page(page);
>    1612
>    1613         if (ret)
>    1614                 goto cancel;
>    1615
>    1616         compound_lock_irqsave(page, &flags);
>    1617         ret = mem_cgroup_move_account(pc, child, parent, page_size);
>    1618         compound_unlock_irqrestore(page, flags);
>    1619
> 
> In fact, I found a bug of res_counter underflow around here, and I've already send
> a patch to RedHat.
> 

Okay, I'll take care of that in the next version.

Thanks,
-Kame


> ===
> From: Daisuke Nishimura <nishimura@....nes.nec.co.jp>
> 
> In mem_cgroup_move_parent(), the page can be split by other context after we
> check PageTransHuge() and before hold the compound_lock of the page later.
> 
> This means a race can happen like:
> 
> 	__split_huge_page_refcount()		mem_cgroup_move_parent()
>     ---------------------------------------------------------------------------
> 						if (PageTransHuge())
> 						-> true
> 						-> set "page_size" to huge page
> 						   size.
> 						__mem_cgroup_try_charge()
> 						-> charge "page_size" to the
> 						   parent.
> 	compound_lock()
> 	mem_cgroup_split_hugepage_commit()
> 	-> commit all the tail pages to the
> 	   "current"(i.e. child) cgroup.
> 	   iow, pc->mem_cgroup of tail pages
> 	   point to the child.
> 	ClearPageCompound()
> 	compound_unlock()
> 						compound_lock()
> 						mem_cgroup_move_account()
> 						-> make pc->mem_cgroup of the
> 						   head page point to the parent.
> 						-> uncharge "page_size" from
> 						   the child.
> 						compound_unlock()
> 
> This can causes at least 2 problems.
> 
> 1. Tail pages are linked to LRU of the child, even though usages(res_counter) of
>    them have been already uncharged from the chilid. This causes res_counter
>    underflow at removing the child directory.
> 2. Usage of the parent is increased by the huge page size at moving charge of
>    the head page, but usage will be decreased only by the normal page size when
>    the head page is uncharged later because it is not PageTransHuge() anymore.
>    This means the parent doesn't have enough pages on its LRU to decrease the
>    usage to 0 and it cannot be rmdir'ed.
> 
> This patch fixes this problem by re-checking PageTransHuge() again under the
> compound_lock.
> 
> Signed-off-by: Daisuke Nishimura <nishimura@....nes.nec.co.jp>
> 
> diff -uprN linux-2.6.32.x86_64.org/mm/memcontrol.c linux-2.6.32.x86_64/mm/memcontrol.c
> --- linux-2.6.32.x86_64.org/mm/memcontrol.c	2010-07-15 16:44:57.000000000 +0900
> +++ linux-2.6.32.x86_64/mm/memcontrol.c	2010-07-15 17:34:12.000000000 +0900
> @@ -1608,6 +1608,17 @@ static int mem_cgroup_move_parent(struct
>  		goto cancel;
>  
>  	compound_lock_irqsave(page, &flags);
> +	/* re-check under compound_lock because the page might be split */
> +	if (unlikely(page_size != PAGE_SIZE && !PageTransHuge(page))) {
> +		unsigned long extra = page_size - PAGE_SIZE;
> +		/* uncharge extra charges from parent */
> +		if (!mem_cgroup_is_root(parent)) {
> +			res_counter_uncharge(&parent->res, extra);
> +			if (do_swap_account)
> +				res_counter_uncharge(&parent->memsw, extra);
> +		}
> +		page_size = PAGE_SIZE;
> +	}
>  	ret = mem_cgroup_move_account(pc, child, parent, page_size);
>  	compound_unlock_irqrestore(page, flags);
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ