[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eb764131-6d2f-c088-5481-99d605a67349@bytedance.com>
Date: Mon, 31 Jul 2023 14:00:22 +0800
From: Chuyi Zhou <zhouchuyi@...edance.com>
To: Michal Hocko <mhocko@...e.com>
Cc: hannes@...xchg.org, roman.gushchin@...ux.dev, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org, wuyun.abel@...edance.com,
robin.lu@...edance.com, muchun.song@...ux.dev,
zhengqi.arch@...edance.com
Subject: Re: [RFC PATCH 0/5] mm: Select victim memcg using BPF_OOM_POLICY
Hello, Michal
在 2023/7/28 01:23, Michal Hocko 写道:
> On Thu 27-07-23 20:12:01, Chuyi Zhou wrote:
>>
>>
>> 在 2023/7/27 16:15, Michal Hocko 写道:
>>> On Thu 27-07-23 15:36:27, Chuyi Zhou wrote:
>>>> This patchset tries to add a new bpf prog type and use it to select
>>>> a victim memcg when global OOM is invoked. The mainly motivation is
>>>> the need to customizable OOM victim selection functionality so that
>>>> we can protect more important app from OOM killer.
>>>
>>> This is rather modest to give an idea how the whole thing is supposed to
>>> work. I have looked through patches very quickly but there is no overall
>>> design described anywhere either.
>>>
>>> Please could you give us a high level design description and reasoning
>>> why certain decisions have been made? e.g. why is this limited to the
>>> global oom sitation, why is the BPF program forced to operate on memcgs
>>> as entities etc...
>>> Also it would be very helpful to call out limitations of the BPF
>>> program, if there are any.
>>>
>>> Thanks!
>>
>> Hi,
>>
>> Thanks for your advice.
>>
>> The global/memcg OOM victim selection uses process as the base search
>> granularity. However, we can see a need for cgroup level protection and
>> there's been some discussion[1]. It seems reasonable to consider using memcg
>> as a search granularity in victim selection algorithm.
>
> Yes, it can be reasonable for some policies but making it central to the
> design is very limiting.
>
>> Besides, it seems pretty well fit for offloading policy decisions to a BPF
>> program, since BPF is scalable and flexible. That's why the new BPF
>> program operate on memcgs as entities.
>
> I do not follow your line of argumentation here. The same could be
> argued for processes or beans.
>
>> The idea is to let user choose which leaf in the memcg tree should be
>> selected as the victim. At the first layer, if we choose A, then it protects
>> the memcg under the B, C, and D subtrees.
>>
>> root
>> / | \ \
>> A B C D
>> /\
>> E F
>>
>>
>> Using the BPF prog, we are allowed to compare the OOM priority between
>> two siblings so that we can choose the best victim in each layer.
>
> How is the priority defined and communicated to the userspace.
>
>> For example:
>>
>> run_prog(B, C) -> choose B
>> run_prog(B, D) -> choose D
>> run_prog(A, D) -> choose A
>>
>> Once we select A as the victim in the first layer, the victim in next layer
>> would be selected among A's children. Finally, we select a leaf memcg as
>> victim.
>
> This sounds like a very specific oom policy and that is fine. But the
> interface shouldn't be bound to any concepts like priorities let alone
> be bound to memcg based selection. Ideally the BPF program should get
> the oom_control as an input and either get a hook to kill process or if
> that is not possible then return an entity to kill (either process or
> set of processes).
Here are two interfaces I can think of. I was wondering if you could
give me some feedback.
1. Add a new hook in select_bad_process(), we can attach it and return a
set of pids or cgroup_ids which are pre-selected by user-defined policy,
suggested by Roman. Then we could use oom_evaluate_task to find a
final victim among them. It's user-friendly and we can offload the OOM
policy to userspace.
2. Add a new hook in oom_evaluate_task() and return a point to override
the default oom_badness return-value. The simplest way to use this is to
protect certain processes by setting the minimum score.
Of course if you have a better idea, please let me know.
Thanks!
---
Chuyi Zhou
>
>> In our scenarios, the impact caused by global OOM's is much more common, so
>> we only considered global in this patchset. But it seems that the idea can
>> also be applied to memcg OOM.
>
> The global and memcg OOMs shouldn't have a different interface. If a
> specific BPF program wants to implement a different policy for global
> vs. memcg OOM then be it but this should be a decision of the said
> program not an inherent limitation of the interface.
>
>>
>> [1]https://lore.kernel.org/lkml/ZIgodGWoC%2FR07eak@dhcp22.suse.cz/
>>
>> Thanks!
>> --
>> Chuyi Zhou
>
Powered by blists - more mailing lists