linux-kernel - Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y4ihyRqQzyFFLqh6@dhcp22.suse.cz>
Date:   Thu, 1 Dec 2022 13:44:57 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     程垲涛 Chengkaitao Cheng 
        <chengkaitao@...iglobal.com>
Cc:     Tao pilgrim <pilgrimtao@...il.com>,
        "tj@...nel.org" <tj@...nel.org>,
        "lizefan.x@...edance.com" <lizefan.x@...edance.com>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "corbet@....net" <corbet@....net>,
        "roman.gushchin@...ux.dev" <roman.gushchin@...ux.dev>,
        "shakeelb@...gle.com" <shakeelb@...gle.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "songmuchun@...edance.com" <songmuchun@...edance.com>,
        "cgel.zte@...il.com" <cgel.zte@...il.com>,
        "ran.xiaokai@....com.cn" <ran.xiaokai@....com.cn>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "zhengqi.arch@...edance.com" <zhengqi.arch@...edance.com>,
        "ebiederm@...ssion.com" <ebiederm@...ssion.com>,
        "Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>,
        "chengzhihao1@...wei.com" <chengzhihao1@...wei.com>,
        "haolee.swjtu@...il.com" <haolee.swjtu@...il.com>,
        "yuzhao@...gle.com" <yuzhao@...gle.com>,
        "willy@...radead.org" <willy@...radead.org>,
        "vasily.averin@...ux.dev" <vasily.averin@...ux.dev>,
        "vbabka@...e.cz" <vbabka@...e.cz>,
        "surenb@...gle.com" <surenb@...gle.com>,
        "sfr@...b.auug.org.au" <sfr@...b.auug.org.au>,
        "mcgrof@...nel.org" <mcgrof@...nel.org>,
        "sujiaxun@...ontech.com" <sujiaxun@...ontech.com>,
        "feng.tang@...el.com" <feng.tang@...el.com>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        Bagas Sanjaya <bagasdotme@...il.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being
 oom killed

On Thu 01-12-22 10:52:35, 程垲涛 Chengkaitao Cheng wrote:
> At 2022-12-01 16:49:27, "Michal Hocko" <mhocko@...e.com> wrote:
> >On Thu 01-12-22 04:52:27, 程垲涛 Chengkaitao Cheng wrote:
> >> At 2022-12-01 00:27:54, "Michal Hocko" <mhocko@...e.com> wrote:
> >> >On Wed 30-11-22 15:46:19, 程垲涛 Chengkaitao Cheng wrote:
> >> >> On 2022-11-30 21:15:06, "Michal Hocko" <mhocko@...e.com> wrote:
> >> >> > On Wed 30-11-22 15:01:58, chengkaitao wrote:
> >> >> > > From: chengkaitao <pilgrimtao@...il.com>
> >> >> > >
> >> >> > > We created a new interface <memory.oom.protect> for memory, If there is
> >> >> > > the OOM killer under parent memory cgroup, and the memory usage of a
> >> >> > > child cgroup is within its effective oom.protect boundary, the cgroup's
> >> >> > > tasks won't be OOM killed unless there is no unprotected tasks in other
> >> >> > > children cgroups. It draws on the logic of <memory.min/low> in the
> >> >> > > inheritance relationship.
> >> >> >
> >> >> > Could you be more specific about usecases?
> >> >
> >> >This is a very important question to answer.
> >> 
> >> usecases 1: users say that they want to protect an important process 
> >> with high memory consumption from being killed by the oom in case 
> >> of docker container failure, so as to retain more critical on-site 
> >> information or a self recovery mechanism. At this time, they suggest 
> >> setting the score_adj of this process to -1000, but I don't agree with 
> >> it, because the docker container is not important to other docker 
> >> containers of the same physical machine. If score_adj of the process 
> >> is set to -1000, the probability of oom in other container processes will 
> >> increase.
> >> 
> >> usecases 2: There are many business processes and agent processes 
> >> mixed together on a physical machine, and they need to be classified 
> >> and protected. However, some agents are the parents of business 
> >> processes, and some business processes are the parents of agent 
> >> processes, It will be troublesome to set different score_adj for them. 
> >> Business processes and agents cannot determine which level their 
> >> score_adj should be at, If we create another agent to set all processes's 
> >> score_adj, we have to cycle through all the processes on the physical 
> >> machine regularly, which looks stupid.
> >
> >I do agree that oom_score_adj is far from ideal tool for these usecases.
> >But I also agree with Roman that these could be addressed by an oom
> >killer implementation in the userspace which can have much better
> >tailored policies. OOM protection limits would require tuning and also
> >regular revisions (e.g. memory consumption by any workload might change
> >with different kernel versions) to provide what you are looking for.
> 
> There is a misunderstanding, oom.protect does not replace the user's 
> tailed policies, Its purpose is to make it easier and more efficient for 
> users to customize policies, or try to avoid users completely abandoning 
> the oom score to formulate new policies.

Then you should focus on explaining on how this makes those policies and
easier and moe efficient. I do not see it.

[...]

> >Why cannot you simply discount the protection from all processes
> >equally? I do not follow why the task_usage has to play any role in
> >that.
> 
> If all processes are protected equally, the oom protection of cgroup is 
> meaningless. For example, if there are more processes in the cgroup, 
> the cgroup can protect more mems, it is unfair to cgroups with fewer 
> processes. So we need to keep the total amount of memory that all 
> processes in the cgroup need to protect consistent with the value of 
> eoom.protect.

You are mixing two different concepts together I am afraid. The per
memcg protection should protect the cgroup (i.e. all processes in that
cgroup) while you want it to be also process aware. This results in a
very unclear runtime behavior when a process from a more protected memcg
is selected based on its individual memory usage.
-- 
Michal Hocko
SUSE Labs