lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1801171420330.86895@chino.kir.corp.google.com>
Date:   Wed, 17 Jan 2018 14:31:57 -0800 (PST)
From:   David Rientjes <rientjes@...gle.com>
To:     Roman Gushchin <guro@...com>
cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Tejun Heo <tj@...nel.org>, kernel-team@...com,
        cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch -mm 0/4] mm, memcg: introduce oom policies

On Wed, 17 Jan 2018, Roman Gushchin wrote:

> You're introducing a new oom_policy knob, which has two separate sets
> of possible values for the root and non-root cgroups. I don't think
> it aligns with the existing cgroup v2 design.
> 

The root mem cgroup can use "none" or "cgroup" to either disable or 
enable, respectively, the cgroup aware oom killer.  These are available to 
non root mem cgroups as well, with the addition of hierarchical usage 
comparison to prevent the ability to completely evade the oom killer by 
the user by creating sub cgroups.  Just as we have files that are made 
available on the root cgroup and not others, I think it's fine to allow 
only certain policies on the root mem cgroup.  As I wrote to Tejun, I'm 
fine with the policy being separated from the mechanism.  That can be 
resolved in that email thread, but the overall point of this patchset is 
to allow hierarchical comparison on some subtrees and not on others.  We 
can avoid the mount option in the same manner.

> For the root cgroup it works exactly as mount option, and both "none"
> and "cgroup" values have no meaning outside of the root cgroup. We can
> discuss if a knob on root cgroup is better than a mount option, or not
> (I don't think so), but it has nothing to do with oom policy as you
> define it for non-root cgroups.
> 

It certainly does have value outside of the root cgroup: for system oom 
conditions it is possible to choose the largest process, largest single 
cgroup, or largest subtree to kill from.  For memcg oom conditions it's 
possible for a consumer in a subtree to define whether it wants the 
largest memory hogging process to be oom killed or the largest of its 
child sub cgroups.  This would be needed for a job scheduler in its own 
subtree, for example, that creates its own subtrees to limit jobs 
scheduled by individual users who have the power over their subtree but 
not their limit.  This is a real-world example.  Michal also provided his 
own example concerning top-level /admins and /students.  They want to use 
"cgroup" themselves and then /students/roman would be forced to "tree".

Keep in mind that this patchset alone makes the interface extensible and 
addresses one of my big three concerns, but the comparison of the root mem 
cgroup compared to other individual cgroups and cgroups maintaining a 
subtree needs to be fixed separately so that we don't completely evade all 
of these oom policies by using /proc/pid/oom_score_adj in the root mem 
cgroup.  That's a separate issue that needs to be addressed.  My last 
concern, about userspace influence, can probably be addressed after this 
is merged by yet another tunable since it's vital that important cgroups 
can be protected.  It does make sense for an important cgroup to be able 
to use 50% of memory without being oom killed, and that's impossible if 
cgroup v2 has been mounted with your mount option.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ