linux-kernel - Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being oom killed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y4jFnY7kMdB8ReSW@dhcp22.suse.cz>
Date:   Thu, 1 Dec 2022 16:17:49 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     程垲涛 Chengkaitao Cheng 
        <chengkaitao@...iglobal.com>
Cc:     Tao pilgrim <pilgrimtao@...il.com>,
        "tj@...nel.org" <tj@...nel.org>,
        "lizefan.x@...edance.com" <lizefan.x@...edance.com>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "corbet@....net" <corbet@....net>,
        "roman.gushchin@...ux.dev" <roman.gushchin@...ux.dev>,
        "shakeelb@...gle.com" <shakeelb@...gle.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "songmuchun@...edance.com" <songmuchun@...edance.com>,
        "cgel.zte@...il.com" <cgel.zte@...il.com>,
        "ran.xiaokai@....com.cn" <ran.xiaokai@....com.cn>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "zhengqi.arch@...edance.com" <zhengqi.arch@...edance.com>,
        "ebiederm@...ssion.com" <ebiederm@...ssion.com>,
        "Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>,
        "chengzhihao1@...wei.com" <chengzhihao1@...wei.com>,
        "haolee.swjtu@...il.com" <haolee.swjtu@...il.com>,
        "yuzhao@...gle.com" <yuzhao@...gle.com>,
        "willy@...radead.org" <willy@...radead.org>,
        "vasily.averin@...ux.dev" <vasily.averin@...ux.dev>,
        "vbabka@...e.cz" <vbabka@...e.cz>,
        "surenb@...gle.com" <surenb@...gle.com>,
        "sfr@...b.auug.org.au" <sfr@...b.auug.org.au>,
        "mcgrof@...nel.org" <mcgrof@...nel.org>,
        "sujiaxun@...ontech.com" <sujiaxun@...ontech.com>,
        "feng.tang@...el.com" <feng.tang@...el.com>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        Bagas Sanjaya <bagasdotme@...il.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH] mm: memcontrol: protect the memory in cgroup from being
 oom killed

On Thu 01-12-22 14:30:11, 程垲涛 Chengkaitao Cheng wrote:
> At 2022-12-01 21:08:26, "Michal Hocko" <mhocko@...e.com> wrote:
> >On Thu 01-12-22 13:44:58, Michal Hocko wrote:
> >> On Thu 01-12-22 10:52:35, 程垲涛 Chengkaitao Cheng wrote:
> >> > At 2022-12-01 16:49:27, "Michal Hocko" <mhocko@...e.com> wrote:
> >[...]
> >> There is a misunderstanding, oom.protect does not replace the user's 
> >> tailed policies, Its purpose is to make it easier and more efficient for 
> >> users to customize policies, or try to avoid users completely abandoning 
> >> the oom score to formulate new policies.
> >
> > Then you should focus on explaining on how this makes those policies and
> > easier and moe efficient. I do not see it.
> 
> In fact, there are some relevant contents in the previous chat records. 
> If oom.protect is applied, it will have the following benefits
> 1. Users only need to focus on the management of the local cgroup, not the 
> impact on other users' cgroups.

Protection based balancing cannot really work in an isolation.

> 2. Users and system do not need to spend extra time on complicated and 
> repeated scanning and configuration. They just need to configure the 
> oom.protect of specific cgroups, which is a one-time task

This will not work same way as the memory reclaim protection cannot work
in an isolation on the memcg level.

> >> > >Why cannot you simply discount the protection from all processes
> >> > >equally? I do not follow why the task_usage has to play any role in
> >> > >that.
> >> > 
> >> > If all processes are protected equally, the oom protection of cgroup is 
> >> > meaningless. For example, if there are more processes in the cgroup, 
> >> > the cgroup can protect more mems, it is unfair to cgroups with fewer 
> >> > processes. So we need to keep the total amount of memory that all 
> >> > processes in the cgroup need to protect consistent with the value of 
> >> > eoom.protect.
> >> 
> >> You are mixing two different concepts together I am afraid. The per
> >> memcg protection should protect the cgroup (i.e. all processes in that
> >> cgroup) while you want it to be also process aware. This results in a
> >> very unclear runtime behavior when a process from a more protected memcg
> >> is selected based on its individual memory usage.
> >
> The correct statement here should be that each memcg protection should 
> protect the number of mems specified by the oom.protect. For example, 
> a cgroup's usage is 6G, and it's oom.protect is 2G, when an oom killer occurs, 
> In the worst case, we will only reduce the memory used by this cgroup to 2G 
> through the om killer.

I do not see how that could be guaranteed. Please keep in mind that a
non-trivial amount of memory resources could be completely independent
on any process life time (just consider tmpfs as a trivial example).

> >Let me be more specific here. Although it is primarily processes which
> >are the primary source of memcg charges the memory accounted for the oom
> >badness purposes is not really comparable to the overal memcg charged
> >memory. Kernel memory, non-mapped memory all that can generate rather
> >interesting cornercases.
> 
> Sorry, I'm thoughtless enough about some special memory statistics. I will fix 
> it in the next version

Let me just emphasise that we are talking about fundamental disconnect.
Rss based accounting has been used for the OOM killer selection because
the memory gets unmapped and _potentially_ freed when the process goes
away. Memcg changes are bound to the object life time and as said in
many cases there is no direct relation with any process life time.

Hope that clarifies.
-- 
Michal Hocko
SUSE Labs