linux-kernel - Re: [PATCH v2] mm: memcontrol: switch to rcu protection in drain_all

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20190805195047.GA16917@tower.DHCP.thefacebook.com>
Date:   Mon, 5 Aug 2019 19:50:52 +0000
From:   Roman Gushchin <guro@...com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Johannes Weiner <hannes@...xchg.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        Hillf Danton <hdanton@...a.com>
Subject: Re: [PATCH v2] mm: memcontrol: switch to rcu protection in
 drain_all_stock()

On Mon, Aug 05, 2019 at 01:11:35PM +0200, Michal Hocko wrote:
> On Fri 02-08-19 12:22:41, Roman Gushchin wrote:
> > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > which are supposed to protect the target memory cgroup from being
> > released during the mem_cgroup_is_descendant() call.
> > 
> > However, it's not completely safe. In theory, memcg can go away
> > between reading stock->cached pointer and calling css_tryget().
> > 
> > This can happen if drain_all_stock() races with drain_local_stock()
> > performed on the remote cpu as a result of a work, scheduled
> > by the previous invocation of drain_all_stock().
> 
> Maybe I am still missing something but I do not see how 72f0184c8a00
> changed the existing race. get_online_cpus doesn't prevent the same race
> right? If this is the case then it would be great to clarify that. I
> know that you are mostly after clarifying that css_tryget is
> insufficient but the above sounds like 72f0184c8a00 has introduced a
> regression.

Yeah, I'm not blaming 72f0184c8a00 for the race, which as I said,
is barely reproducible at all. There is no "Fixes" tag, and I don't think
we need to backport it to stable.
Let's think about this patch as a refactoring patch, which makes the code
cleaner.

> 
> > The race is a bit theoretical and there are few chances to trigger
> > it, but the current code looks a bit confusing, so it makes sense
> > to fix it anyway. The code looks like as if css_tryget() and
> > css_put() are used to protect stocks drainage. It's not necessary
> > because stocked pages are holding references to the cached cgroup.
> > And it obviously won't work for works, scheduled on other cpus.
> > 
> > So, let's read the stock->cached pointer and evaluate the memory
> > cgroup inside a rcu read section, and get rid of
> > css_tryget()/css_put() calls.
> > 
> > v2: added some explanations to the commit message, no code changes
> > 
> > Signed-off-by: Roman Gushchin <guro@...com>
> > Cc: Michal Hocko <mhocko@...e.com>
> > Cc: Hillf Danton <hdanton@...a.com>
> 
> Other than that.
> Acked-by: Michal Hocko <mhocko@...e.com>

Thanks!