linux-kernel - Re: Re: [RFC PATCH 1/2] mm, oom: Introduce bpf_select

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fdec0f4c-a65f-df16-b4ee-7cfd977c8d7f@bytedance.com>
Date:   Thu, 10 Aug 2023 12:00:36 +0800
From:   Abel Wu <wuyun.abel@...edance.com>
To:     Michal Hocko <mhocko@...e.com>,
        Roman Gushchin <roman.gushchin@...ux.dev>
Cc:     Chuyi Zhou <zhouchuyi@...edance.com>, hannes@...xchg.org,
        ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
        muchun.song@...ux.dev, bpf@...r.kernel.org,
        linux-kernel@...r.kernel.org, robin.lu@...edance.com
Subject: Re: Re: [RFC PATCH 1/2] mm, oom: Introduce bpf_select_task

On 8/9/23 3:53 PM, Michal Hocko wrote:
> On Tue 08-08-23 14:41:17, Roman Gushchin wrote:
>> It would be also nice to come up with some practical examples of bpf programs.
>> What are meaningful scenarios which can be covered with the proposed approach
>> and are not covered now with oom_score_adj.
> 
> Agreed here as well. This RFC serves purpose of brainstorming on all of
> this.
> 
> There is a fundamental question whether we need BPF for this task in the
> first place. Are there any huge advantages to export the callback and
> allow a kernel module to hook into it?

The ancient oom-killer largely depends on memory usage when choosing
victims, which might not fit the need of modern scenarios. It's common
nowadays that multiple workloads (tenants) with different 'priorities'
run together, and the decisions made by the oom-killer doesn't always
obey the service level agreements.

While the oom_score_adj only adjusts the usage-based decisions, so it
can hardly be translated into 'priority' semantic. How can we properly
configure it given that we don't know how much memory the workloads
will use? It's really hard for a static strategy to deal with dynamic
provision. IMHO the oom_score_adj is just another demon.

Reworking the oom-killer's internal algorithm or patching some random
metrics may satisfy the immediate needs, but for the next 10 years? I
doubt it. So I think we do need the flexibility to bypass the legacy
usage-based algorithm, through bpf or pre-select interfaces.

Regards,
	Abel