lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190616103745.GA2117@chrisdown.name>
Date:   Sun, 16 Jun 2019 18:37:45 +0800
From:   Chris Down <chris@...isdown.name>
To:     Xunlei Pang <xlpang@...ux.alibaba.com>
Cc:     Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] memcg: Ignore unprotected parent in
 mem_cgroup_protected()

Hi Xunlei,

Xunlei Pang writes:
>docker and various types(different memory capacity) of containers
>are managed by k8s, it's a burden for k8s to maintain those dynamic
>figures, simply set "max" to key containers is always welcome.

Right, setting "max" is generally a fine way of going about it.

>Set "max" to docker also protects docker cgroup memory(as docker
>itself has tasks) unnecessarily.

That's not correct -- leaf memcgs have to _explicitly_ request memory 
protection. From the documentation:

    memory.low

    [...]

    Best-effort memory protection.  If the memory usages of a
    cgroup and all its ancestors are below their low boundaries,
    the cgroup's memory won't be reclaimed unless memory can be
    reclaimed from unprotected cgroups.

Note the part that the cgroup itself also must be within its low boundary, 
which is not implied simply by having ancestors that would permit propagation 
of protections.

In this case, Docker just shouldn't request it for those Docker-related tasks, 
and they won't get any. That seems a lot simpler and more intuitive than 
special casing "0" in ancestors.

>This patch doesn't take effect on any intermediate layer with
>positive memory.min set, it requires all the ancestors having
>0 memory.min to work.
>
>Nothing special change, but more flexible to business deployment...

Not so, this change is extremely "special". It violates the basic expectation 
that 0 means no possibility of propagation of protection, and I still don't see 
a compelling argument why Docker can't just set "max" in the intermediate 
cgroup and not accept any protection in leaf memcgs that it doesn't want 
protection for.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ