linux-kernel - Re: [PATCH bpf-next v3 00/17] mm: BPF OOM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aXnDdkYmtipGwSjK@tiehlicka>
Date: Wed, 28 Jan 2026 09:06:14 +0100
From: Michal Hocko <mhocko@...e.com>
To: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: bpf@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
	Matt Bobrowski <mattbobrowski@...gle.com>,
	Shakeel Butt <shakeel.butt@...ux.dev>,
	JP Kobryn <inwardvessel@...il.com>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, Suren Baghdasaryan <surenb@...gle.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH bpf-next v3 00/17] mm: BPF OOM

On Tue 27-01-26 21:01:48, Roman Gushchin wrote:
> Michal Hocko <mhocko@...e.com> writes:
> 
> > On Mon 26-01-26 18:44:03, Roman Gushchin wrote:
> >> This patchset adds an ability to customize the out of memory
> >> handling using bpf.
> >> 
> >> It focuses on two parts:
> >> 1) OOM handling policy,
> >> 2) PSI-based OOM invocation.
> >> 
> >> The idea to use bpf for customizing the OOM handling is not new, but
> >> unlike the previous proposal [1], which augmented the existing task
> >> ranking policy, this one tries to be as generic as possible and
> >> leverage the full power of the modern bpf.
> >> 
> >> It provides a generic interface which is called before the existing OOM
> >> killer code and allows implementing any policy, e.g. picking a victim
> >> task or memory cgroup or potentially even releasing memory in other
> >> ways, e.g. deleting tmpfs files (the last one might require some
> >> additional but relatively simple changes).
> >
> > Are you planning to write any highlevel documentation on how to use the
> > existing infrastructure to implement proper/correct OOM handlers with
> > these generic interfaces?
> 
> What do you expect from such a document, can you, please, elaborate?

Sure. Essentially an expected structure of the handler. What is the API
it can use, what is has to do and what it must not do. Essentially a
single place you can read and get enough information to start developing
your oom handler.

> I'm asking because the main promise of bpf is to provide some sort
> of a safe playground, so anyone can experiment with writing their
> bpf implementations (like sched_ext schedulers or bpf oom policies)
> with minimum risk. Yes, it might work sub-optimally and kill too many
> tasks, but it won't crash or deadlock the system.
> So in way I don't want to prescribe the "right way" of writing
> oom handler, but it totally makes sense to provide an example.
> 
> As of now the best way to get an example of a bpf handler is to look
> into the commit "[PATCH bpf-next v3 12/17] bpf: selftests: BPF OOM
> struct ops test".

Examples are really great but having a central place to document
available API is much more helpful IMHO. The generally scattered nature
of BPF hooks makes it really hard to even know what is available to oom
handlers to use.

> Another viable idea (also suggested by Andrew Morton) is to develop
> a production ready memcg-aware OOM killer in BPF, put the source code
> into the kernel tree and make it loadable by default (obviously under a
> config option). Myself or one of my colleagues will try to explore it a
> bit later: the tricky part is this by-default loading because there are
> no existing precedents.

It certainly makes sense to have trusted implementation of a commonly
requested oom policy that we couldn't implement due to specific nature
that doesn't really apply to many users. And have that in the tree. I am
not thrilled about auto-loading because this could be easily done by a
simple tooling.

-- 
Michal Hocko
SUSE Labs