linux-kernel - Re: [PATCH] memcg: remove KMEM_ACCOUNTED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131209152256.GD3597@dhcp22.suse.cz>
Date:	Mon, 9 Dec 2013 16:22:56 +0100
From:	Michal Hocko <mhocko@...e.cz>
To:	Vladimir Davydov <vdavydov@...allels.com>
Cc:	Glauber Costa <glommer@...il.com>,
	LKML <linux-kernel@...r.kernel.org>, cgroups@...r.kernel.org,
	devel@...nvz.org, Johannes Weiner <hannes@...xchg.org>,
	Balbir Singh <bsingharora@...il.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [PATCH] memcg: remove KMEM_ACCOUNTED_ACTIVATED

On Wed 04-12-13 15:56:51, Vladimir Davydov wrote:
> On 12/04/2013 02:08 PM, Glauber Costa wrote:
> >>> Could you do something clever with just one flag? Probably yes. But I
> >>> doubt it would
> >>> be that much cleaner, this is just the way that patching sites work.
> >> Thank you for spending your time to listen to me.
> >>
> > Don't worry! I thank you for carrying this forward.
> >
> >> Let me try to explain what is bothering me.
> >>
> >> We have two state bits for each memcg, 'active' and 'activated'. There
> >> are two scenarios where the bits can be modified:
> >>
> >> 1) The kmem limit is set on a memcg for the first time -
> >> memcg_update_kmem_limit(). Here we call memcg_update_cache_sizes(),
> >> which sets the 'activated' bit on success, then update static branching,
> >> then set the 'active' bit. All three actions are done atomically in
> >> respect to other tasks setting the limit due to the set_limit_mutex.
> >> After both bits are set, they never get cleared for the memcg.
> >>
> > So far so good. But again, note how you yourself describe it:
> > the cations are done atomically  *in respect to other tasks setting the limit*
> >
> > But there are also tasks that are running its courses naturally and
> > just allocating
> > memory. For those, some call sites will be on, some will be off. We need to make
> > sure that *none* of them uses the patched site until *all* of them are patched.
> > This has nothing to do with updates, this is all about the readers.
> >
> >> 2) When a subgroup of a kmem-active cgroup is created -
> >> memcg_propagate_kmem(). Here we copy kmem_account_flags from the parent,
> >> then increase static branching refcounter, then call
> >> memcg_update_cache_sizes() for the new memcg, which may clear the
> >> 'activated' bit on failure. After successful execution, the state bits
> >> never get cleared for the new memcg.
> >>
> >> In scenario 2 there is no need bothering about the flags setting order,
> >> because we don't have any tasks in the cgroup yet - the tasks can be
> >> moved in only after css_online finishes when we have both of the bits
> >> set and the static branching enabled. Actually, we already do not bother
> >> about it, because we have both bits set before the cgroup is fully
> >> initialized (memcg_update_cache_sizes() is called).
> >>
> > Yes, after the first cgroup is set none of that matters. But it is just easier
> > and less error prone just to follow the same path every time. As I have said,
> > if you can come up with a more clever way to deal with the problem above
> > that doesn't involve the double flag - and you can prove it works - I
> > am definitely
> > fine with it. But this is subtle code, and in the past - Michal can
> > attest this - we've
> > changed this being sure it would work just to see it explode in our faces.
> >
> > So although I am willing to review every patch for correctness on that
> > front (I never
> > said I liked the 2-flags scheme...), unless you have a bug or real
> > problem on it,
> > I would advise against changing it if its just to make it more readable.
> >
> > But again, don't take me too seriously on this. If you and Michal think you can
> > come up with something better, I'm all for it.
> 
> All right, I finally get you :-)
> 
> Although I still don't think we need the second flag, I now understand
> that it's better not to change the code that works fine especially the
> change does not make it neither more readable nor more effective. Since
> I can be mistaken about the flags usage (it's by far not unlikely), it's
> better to leave it as is rather than being at risk of catching spurious
> hangs that might be caused by this modification.
> 
> Thanks for the detailed explanation!

It would be really great if we could push some of that into the
comments, please?

Anyway, reading this thread again, I guess I finally got what you meant
Vladimir.
You are basically saying that the two stage enabling can be done
by static_key_slow_inc in the first step and memcg_kmem_set_active
in the second step without an additional flag.
Assuming that the writers cannot race (they cannot currently because
they are linearized by set_limit_mutex and memcg_create_mutex) and
readers (charging paths) are _always_ checking the static key before
checking active flags?

I guess this should work. But it would require a deep audit that the
above is correct in all places. For example we do not bother to check
static key during offline/free paths. I guess it should be harmless as
is but who knows...

I would rather see more detailed description of the current state first.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/