linux-kernel - Re: [RFC PATCH 0/5] mm: Select victim memcg using BPF_OOM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7347aad5-f25c-6b76-9db5-9f1be3a9f303@bytedance.com>
Date:   Thu, 27 Jul 2023 20:12:01 +0800
From:   Chuyi Zhou <zhouchuyi@...edance.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     hannes@...xchg.org, roman.gushchin@...ux.dev, ast@...nel.org,
        daniel@...earbox.net, andrii@...nel.org, bpf@...r.kernel.org,
        linux-kernel@...r.kernel.org, wuyun.abel@...edance.com,
        robin.lu@...edance.com
Subject: Re: [RFC PATCH 0/5] mm: Select victim memcg using BPF_OOM_POLICY

在 2023/7/27 16:15, Michal Hocko 写道:
> On Thu 27-07-23 15:36:27, Chuyi Zhou wrote:
>> This patchset tries to add a new bpf prog type and use it to select
>> a victim memcg when global OOM is invoked. The mainly motivation is
>> the need to customizable OOM victim selection functionality so that
>> we can protect more important app from OOM killer.
> 
> This is rather modest to give an idea how the whole thing is supposed to
> work. I have looked through patches very quickly but there is no overall
> design described anywhere either.
> 
> Please could you give us a high level design description and reasoning
> why certain decisions have been made? e.g. why is this limited to the
> global oom sitation, why is the BPF program forced to operate on memcgs
> as entities etc...
> Also it would be very helpful to call out limitations of the BPF
> program, if there are any.
> 
> Thanks!

Hi,

Thanks for your advice.

The global/memcg OOM victim selection uses process as the base search 
granularity. However, we can see a need for cgroup level protection and 
there's been some discussion[1]. It seems reasonable to consider using 
memcg as a search granularity in victim selection algorithm.

Besides, it seems pretty well fit for offloading policy decisions to a 
BPF program, since BPF is scalable and flexible. That's why the new BPF
program operate on memcgs as entities.

The idea is to let user choose which leaf in the memcg tree should be 
selected as the victim. At the first layer, if we choose A, then it 
protects the memcg under the B, C, and D subtrees.

         root
      /   ｜ \  \
     A    B  C  D
    /\
   E F

Using the BPF prog, we are allowed to compare the OOM priority between
two siblings so that we can choose the best victim in each layer.
For example:

run_prog(B, C) -> choose B
run_prog(B, D) -> choose D
run_prog(A, D) -> choose A

Once we select A as the victim in the first layer, the victim in next 
layer would be selected among A's children. Finally, we select a leaf 
memcg as victim.

In our scenarios, the impact caused by global OOM's is much more common, 
so we only considered global in this patchset. But it seems that the 
idea can also be applied to memcg OOM.

[1]https://lore.kernel.org/lkml/ZIgodGWoC%2FR07eak@dhcp22.suse.cz/

Thanks!
-- 
Chuyi Zhou