linux-kernel - Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.11.1401230451490.1563@eggly.anvils>
Date:	Thu, 23 Jan 2014 04:53:16 -0800 (PST)
From:	Hugh Dickins <hughd@...gle.com>
To:	Michal Hocko <mhocko@...e.cz>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Greg Thelen <gthelen@...gle.com>, linux-mm@...ck.org,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop
 in mem_cgroup_iter

On Thu, 23 Jan 2014, Michal Hocko wrote:
> On Thu 23-01-14 02:42:58, Hugh Dickins wrote:
> > > 
> > > Actually both patches are needed. If we had only 2/2 then we wouldn't
> > > endless loop inside mem_cgroup_iter but we could still return root to
> > > caller all the time because mem_cgroup_iter_load would return NULL on
> > > css_tryget failure on the cached root. Or am I missing something that
> > > would prevent that?
> > 
> > In theory I agree with you; and if you're missing something, I can't see
> > it either.  But in practice, all my earlier testing of 3.12 and 3.13 was
> > just with 2/2, and I've tried without your 1/2 since: whereas I have hung
> > on 3.12 and 3.13 a convincing number of times without 2/2, I have never
> > hung on them with 2/2 without 1/2.  In practice 1/2 appears essential
> > for 3.10 and 3.11, but irrelevant for 3.12 and 3.13.
> > 
> > That could be easy to explain if there were a difference at the calling
> > end, shrink_zone(), between those releases: but I don't see that.  Odd.
> > Either we're both missing something, or my testing is even less reliable
> > than I'd thought.  But since I certainly don't dispute 1/2, it is merely
> > academic.  Though still bothersome.
> 
> I would assume that it is (sc->nr_reclaimed >= sc->nr_to_reclaim) that
> helps us to back off. SWAP_CLUSTER_MAX shouldn't be that hard to get to
> before css_offline racing part reparents all the memory.

But wouldn't explain why I could see it on 3.10,11 but not on 3.12,13.

Perhaps the 2/2 problem is a lot easier to hit than the 1/2 problem,
and I mistakenly expected to see the 1/2 problem in the timescale that
I saw the 2/2 problem; but I don't really think either is the case.

> 
> Anyway, I would feel safer if this was pushed fixed although you haven't
> reporoduced it.

Absolutely.

> > Before Andrew sends these all off to Linus, I should admit that there's
> > probably a refinement still to come to the CSS_ONLINE one.  I'm ashamed
> > to admit that I overlooked a much earlier comment from Greg Thelen, who
> > suggested that a memory barrier might be required.
> 
> I was thinking about mem barrier while reviewing your patch but then I
> convinced myself that we should be safe also without using one when
> checking CSS_ONLINE.
> We have basically two situations.
> 	- online_css when we can miss it being set which is OK because
> 	  we would miss a new empty group.
> 	- offline_css when we could still see the flag being set but
> 	  then css_tryget would be already failing.
> 
> So while all this is subtle and relies on cgroup core heavily I think we
> should be safe wrt. memory barriers.
> 
> Or did you mean something else here?

Something else.  My CSS_OFFLINE patch claims to prevent the iterator
from returning an uninitialized struct mem_cgroup: if that is to be
relied upon, then it needs to make sure that the initialization of
the mem_cgroup is visible to the caller before the CSS_OFFLINE flag.

kernel/cgroup.c online_css() does nowadays have an smp_wmb() buried
in its rcu_assign_pointer(); but it's not in the right place to
make this particular guarantee.  And an smp_rmb() needed somewhere
too, if it doesn't already come for free somehow.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/