linux-kernel - Re: [PATCH v4 0/2] memcontrol: support cgroup level OOM protection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkbHKQBoz7kn6ZjMTMoxLKYs7x9w4uRGWLvuyOogmBkZ_g@mail.gmail.com>
Date:   Wed, 17 May 2023 01:09:50 -0700
From:   Yosry Ahmed <yosryahmed@...gle.com>
To:     程垲涛 Chengkaitao Cheng 
        <chengkaitao@...iglobal.com>
Cc:     "tj@...nel.org" <tj@...nel.org>,
        "lizefan.x@...edance.com" <lizefan.x@...edance.com>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "corbet@....net" <corbet@....net>,
        "mhocko@...nel.org" <mhocko@...nel.org>,
        "roman.gushchin@...ux.dev" <roman.gushchin@...ux.dev>,
        "shakeelb@...gle.com" <shakeelb@...gle.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "muchun.song@...ux.dev" <muchun.song@...ux.dev>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "zhengqi.arch@...edance.com" <zhengqi.arch@...edance.com>,
        "ebiederm@...ssion.com" <ebiederm@...ssion.com>,
        "Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>,
        "chengzhihao1@...wei.com" <chengzhihao1@...wei.com>,
        "pilgrimtao@...il.com" <pilgrimtao@...il.com>,
        "haolee.swjtu@...il.com" <haolee.swjtu@...il.com>,
        "yuzhao@...gle.com" <yuzhao@...gle.com>,
        "willy@...radead.org" <willy@...radead.org>,
        "vasily.averin@...ux.dev" <vasily.averin@...ux.dev>,
        "vbabka@...e.cz" <vbabka@...e.cz>,
        "surenb@...gle.com" <surenb@...gle.com>,
        "sfr@...b.auug.org.au" <sfr@...b.auug.org.au>,
        "mcgrof@...nel.org" <mcgrof@...nel.org>,
        "feng.tang@...el.com" <feng.tang@...el.com>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH v4 0/2] memcontrol: support cgroup level OOM protection

On Wed, May 17, 2023 at 1:01 AM 程垲涛 Chengkaitao Cheng
<chengkaitao@...iglobal.com> wrote:
>
> At 2023-05-17 14:59:06, "Yosry Ahmed" <yosryahmed@...gle.com> wrote:
> >+David Rientjes
> >
> >On Tue, May 16, 2023 at 8:20 PM chengkaitao <chengkaitao@...iglobal.com> wrote:
> >>
> >> Establish a new OOM score algorithm, supports the cgroup level OOM
> >> protection mechanism. When an global/memcg oom event occurs, we treat
> >> all processes in the cgroup as a whole, and OOM killers need to select
> >> the process to kill based on the protection quota of the cgroup.
> >>
> >
> >Perhaps this is only slightly relevant, but at Google we do have a
> >different per-memcg approach to protect from OOM kills, or more
> >specifically tell the kernel how we would like the OOM killer to
> >behave.
> >
> >We define an interface called memory.oom_score_badness, and we also
> >allow it to be specified per-process through a procfs interface,
> >similar to oom_score_adj.
> >
> >These scores essentially tell the OOM killer the order in which we
> >prefer memcgs to be OOM'd, and the order in which we want processes in
> >the memcg to be OOM'd. By default, all processes and memcgs start with
> >the same score. Ties are broken based on the rss of the process or the
> >usage of the memcg (prefer to kill the process/memcg that will free
> >more memory) -- similar to the current OOM killer.
>
> Thank you for providing a new application scenario. You have described a
> new per-memcg approach, but a simple introduction cannot explain the
> details of your approach clearly. If you could compare and analyze my
> patches for possible defects, or if your new approach has advantages
> that my patches do not have, I would greatly appreciate it.

Sorry if I was not clear, I am not implying in any way that the
approach I am describing is better than your patches. I am guilty of
not conducting the proper analysis you are requesting.

I just saw the thread and thought it might be interesting to you or
others to know the approach that we have been using for years in our
production. I guess the target is the same, be able to tell the OOM
killer which memcgs/processes are more important to protect. The
fundamental difference is that instead of tuning this based on the
memory usage of the memcg (your approach), we essentially give the OOM
killer the ordering in which we want memcgs/processes to be OOM
killed. This maps to jobs priorities essentially.

If this approach works for you (or any other audience), that's great,
I can share more details and perhaps we can reach something that we
can both use :)

>
> >This has been brought up before in other discussions without much
> >interest [1], but just thought it may be relevant here.
> >
> >[1]https://lore.kernel.org/lkml/CAHS8izN3ej1mqUpnNQ8c-1Bx5EeO7q5NOkh0qrY_4PLqc8rkHA@mail.gmail.com/#t
>
> --
> Thanks for your comment!
> chengkaitao
>