[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYPbubIPIsfFiMhD@google.com>
Date: Wed, 4 Feb 2026 23:52:25 +0000
From: Matt Bobrowski <mattbobrowski@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Michal Hocko <mhocko@...e.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
Shakeel Butt <shakeel.butt@...ux.dev>,
JP Kobryn <inwardvessel@...il.com>,
LKML <linux-kernel@...r.kernel.org>, linux-mm <linux-mm@...ck.org>,
Suren Baghdasaryan <surenb@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Josh Don <joshdon@...gle.com>
Subject: Re: [PATCH bpf-next v3 00/17] mm: BPF OOM
On Mon, Feb 02, 2026 at 09:50:05AM -0800, Alexei Starovoitov wrote:
> On Sun, Feb 1, 2026 at 7:26 PM Matt Bobrowski <mattbobrowski@...gle.com> wrote:
> >
> > On Wed, Jan 28, 2026 at 08:59:34AM -0800, Alexei Starovoitov wrote:
> > > On Wed, Jan 28, 2026 at 12:06 AM Michal Hocko <mhocko@...e.com> wrote:
> > > >
> > > >
> > > > > Another viable idea (also suggested by Andrew Morton) is to develop
> > > > > a production ready memcg-aware OOM killer in BPF, put the source code
> > > > > into the kernel tree and make it loadable by default (obviously under a
> > > > > config option). Myself or one of my colleagues will try to explore it a
> > > > > bit later: the tricky part is this by-default loading because there are
> > > > > no existing precedents.
> > > >
> > > > It certainly makes sense to have trusted implementation of a commonly
> > > > requested oom policy that we couldn't implement due to specific nature
> > > > that doesn't really apply to many users. And have that in the tree. I am
> > > > not thrilled about auto-loading because this could be easily done by a
> > > > simple tooling.
> > >
> > > Production ready bpf-oom program(s) must be part of this set.
> > > We've seen enough attempts to add bpf st_ops in various parts of
> > > the kernel without providing realistic bpf progs that will drive
> > > those hooks. It's great to have flexibility and people need
> > > to have a freedom to develop their own bpf-oom policy, but
> > > the author of the patch set who's advocating for the new
> > > bpf hooks must provide their real production progs and
> > > share their real use case with the community.
> > > It's not cool to hide it.
> > > In that sense enabling auto-loading without requiring an end user
> > > to install the toolchain and build bpf programs/rust/whatnot
> > > is necessary too.
> > > bpf-oom can be a self contained part of vmlinux binary.
> > > We already have a mechanism to do that.
> > > This way the end user doesn't need to be a bpf expert, doesn't need
> > > to install clang, build the tools, etc.
> > > They can just enable fancy new bpf-oom policy and see whether
> > > it's helping their apps or not while knowing nothing about bpf.
> >
> > For the auto-loading capability you speak of here, I'm currently
> > interpreting it as being some form of conceptually similar extension
> > to the BPF preload functionality. Have I understood this correctly? If
> > so, I feel as though something like this would be a completely
> > independent stream of work, orthogonal to this BPF OOM feature, right?
> > Or, is that you'd like this new auto-loading capability completed as a
> > hard prerequisite before pulling in the BPF OOM feature?
>
> It's not a hard prerequisite, but it has to be thought through.
> bpf side is ready today. bpf preload is an example of it.
> The oom side needs to design an interface to do it.
> sysctl to enable builtin bpf-oom policy is probably too rigid.
> Maybe a file in cgroupfs? Writing a name of bpf-oom policy would
> trigger load and attach to that cgroup.
> Or you can plug it exactly like bpf preload:
> when bpffs is mounted all builtin bpf progs get loaded and create
> ".debug" files in bpffs.
>
> I recall we discussed an ability to create files in bpffs from
> tracepoints. This way bpffs can replicate cgroupfs directory
> structure without user space involvement. New cgroup -> new directory
> in cgroupfs -> tracepoint -> bpf prog -> new directory in bpffs
> -> create "enable_bpf_oom.debug" file in there.
> Writing to that file we trigger bpf prog that will attach bpf-oom
> prog to that cgroup.
> Could be any combination of the above or something else,
> but needs to be designed and agreed upon.
> Otherwise, I'm afraid, we will have bpf-oom progs in selftests
> and users who want to experiment with it would need kernel source
> code, clang, etc to try it. We need to lower the barrier to use it.
OK, I see what you're saying here. I'll have a chat to Roman about
this and see what his thoughts are on it.
Powered by blists - more mailing lists