linux-kernel - Re: [PATCH] memcg: Ignore unprotected parent in mem_cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190615160820.GB1307@chrisdown.name>
Date:   Sun, 16 Jun 2019 00:08:20 +0800
From:   Chris Down <chris@...isdown.name>
To:     Xunlei Pang <xlpang@...ux.alibaba.com>
Cc:     Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] memcg: Ignore unprotected parent in
 mem_cgroup_protected()

Hi Xunlei,

Xunlei Pang writes:
>Currently memory.min|low implementation requires the whole
>hierarchy has the settings, otherwise the protection will
>be broken.
>
>Our hierarchy is kind of like(memory.min value in brackets),
>
>               root
>                |
>             docker(0)
>              /    \
>         c1(max)   c2(0)
>
>Note that "docker" doesn't set memory.min. When kswapd runs,
>mem_cgroup_protected() returns "0" emin for "c1" due to "0"
>@parent_emin of "docker", as a result "c1" gets reclaimed.
>
>But it's hard to maintain parent's "memory.min" when there're
>uncertain protected children because only some important types
>of containers need the protection.  Further, control tasks
>belonging to parent constantly reproduce trivial memory which
>should not be protected at all.  It makes sense to ignore
>unprotected parent in this scenario to achieve the flexibility.

I'm really confused by this, why don't you just set memory.{min,low} in the 
docker cgroup and only propagate it to the children that want it?

If you only want some children to have the protection, only request it in those 
children, or create an additional intermediate layer of the cgroup hierarchy 
with protections further limited if you don't trust the task to request the 
right amount.

Breaking the requirement for hierarchical propagation of protections seems like 
a really questionable API change, not least because it makes it harder to set 
systemwide policies about the constraints of protections within a subtree.