lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 16 Jun 2019 14:30:00 +0800
From:   Xunlei Pang <xlpang@...ux.alibaba.com>
To:     Chris Down <chris@...isdown.name>
Cc:     Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] memcg: Ignore unprotected parent in
 mem_cgroup_protected()

Hi Chirs,

On 2019/6/16 AM 12:08, Chris Down wrote:
> Hi Xunlei,
> 
> Xunlei Pang writes:
>> Currently memory.min|low implementation requires the whole
>> hierarchy has the settings, otherwise the protection will
>> be broken.
>>
>> Our hierarchy is kind of like(memory.min value in brackets),
>>
>>               root
>>                |
>>             docker(0)
>>              /    \
>>         c1(max)   c2(0)
>>
>> Note that "docker" doesn't set memory.min. When kswapd runs,
>> mem_cgroup_protected() returns "0" emin for "c1" due to "0"
>> @parent_emin of "docker", as a result "c1" gets reclaimed.
>>
>> But it's hard to maintain parent's "memory.min" when there're
>> uncertain protected children because only some important types
>> of containers need the protection.  Further, control tasks
>> belonging to parent constantly reproduce trivial memory which
>> should not be protected at all.  It makes sense to ignore
>> unprotected parent in this scenario to achieve the flexibility.
> 
> I'm really confused by this, why don't you just set memory.{min,low} in
> the docker cgroup and only propagate it to the children that want it?
> 
> If you only want some children to have the protection, only request it
> in those children, or create an additional intermediate layer of the
> cgroup hierarchy with protections further limited if you don't trust the
> task to request the right amount.
> 
> Breaking the requirement for hierarchical propagation of protections
> seems like a really questionable API change, not least because it makes
> it harder to set systemwide policies about the constraints of
> protections within a subtree.

docker and various types(different memory capacity) of containers
are managed by k8s, it's a burden for k8s to maintain those dynamic
figures, simply set "max" to key containers is always welcome.

Set "max" to docker also protects docker cgroup memory(as docker
itself has tasks) unnecessarily.

This patch doesn't take effect on any intermediate layer with
positive memory.min set, it requires all the ancestors having
0 memory.min to work.

Nothing special change, but more flexible to business deployment...

Powered by blists - more mailing lists