[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y5LxAbOB2AYp42hi@dhcp22.suse.cz>
Date: Fri, 9 Dec 2022 09:25:37 +0100
From: Michal Hocko <mhocko@...e.com>
To: 程垲涛 Chengkaitao Cheng
<chengkaitao@...iglobal.com>
Cc: chengkaitao <pilgrimtao@...il.com>,
"tj@...nel.org" <tj@...nel.org>,
"lizefan.x@...edance.com" <lizefan.x@...edance.com>,
"hannes@...xchg.org" <hannes@...xchg.org>,
"corbet@....net" <corbet@....net>,
"roman.gushchin@...ux.dev" <roman.gushchin@...ux.dev>,
"shakeelb@...gle.com" <shakeelb@...gle.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"songmuchun@...edance.com" <songmuchun@...edance.com>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"zhengqi.arch@...edance.com" <zhengqi.arch@...edance.com>,
"ebiederm@...ssion.com" <ebiederm@...ssion.com>,
"Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>,
"chengzhihao1@...wei.com" <chengzhihao1@...wei.com>,
"haolee.swjtu@...il.com" <haolee.swjtu@...il.com>,
"yuzhao@...gle.com" <yuzhao@...gle.com>,
"willy@...radead.org" <willy@...radead.org>,
"vasily.averin@...ux.dev" <vasily.averin@...ux.dev>,
"vbabka@...e.cz" <vbabka@...e.cz>,
"surenb@...gle.com" <surenb@...gle.com>,
"sfr@...b.auug.org.au" <sfr@...b.auug.org.au>,
"mcgrof@...nel.org" <mcgrof@...nel.org>,
"sujiaxun@...ontech.com" <sujiaxun@...ontech.com>,
"feng.tang@...el.com" <feng.tang@...el.com>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH v2] mm: memcontrol: protect the memory in cgroup from
being oom killed
On Fri 09-12-22 05:07:15, 程垲涛 Chengkaitao Cheng wrote:
> At 2022-12-08 22:23:56, "Michal Hocko" <mhocko@...e.com> wrote:
[...]
> >oom killer is a memory reclaim of the last resort. So yes, there is some
> >difference but fundamentally it is about releasing some memory. And long
> >term we have learned that the more clever it tries to be the more likely
> >corner cases can happen. It is simply impossible to know the best
> >candidate so this is a just a best effort. We try to aim for
> >predictability at least.
>
> Is the current oom_score strategy predictable? I don't think so. The score_adj
> has broken the predictability of oom_score (it is no longer simply killing the
> process that uses the most mems).
oom_score as reported to the userspace already considers oom_score_adj
which means that you can compare processes and get a reasonable guess
what would be the current oom_victim. There is a certain fuzz level
because this is not atomic and also there is no clear candidate when
multiple processes have equal score. So yes, it is not 100% predictable.
memory.reclaim as you propose doesn't change that though.
Is oom_score_adj a good interface? No, not really. If I could go back in
time I would nack it but here we are. We have an interface that
promises quite much but essentially it only allows two usecases
(OOM_SCORE_ADJ_MIN, OOM_SCORE_ADJ_MAX) reliably. Everything in between
is clumsy at best because a real user space oom policy would require to
re-evaluate the whole oom domain (be it global or memcg oom) as the
memory consumption evolves over time. I am really worried that your
memory.oom.protection directs a very similar trajectory because
protection really needs to consider other memcgs to balance properly.
[...]
> > But I am really open
> >to be convinced otherwise and this is in fact what I have been asking
> >for since the beginning. I would love to see some examples on the
> >reasonable configuration for a practical usecase.
>
> Here is a simple example. In a docker container, users can divide all processes
> into two categories (important and normal), and put them in different cgroups.
> One cgroup's oom.protect is set to "max", the other is set to "0". In this way,
> important processes in the container can be protected.
That is effectivelly oom_score_adj = OOM_SCORE_ADJ_MIN - 1 to all
processes in the important group. I would argue you can achieve a very
similar result by the process launcher to set the oom_score_adj and
inherit it to all processes in that important container. You do not need
any memcg tunable for that. I am really much more interested in examples
when the protection is to be fine tuned.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists