[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87jyx1zhjf.fsf@linux.dev>
Date: Wed, 28 Jan 2026 11:03:16 -0800
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Josh Don <joshdon@...gle.com>
Cc: bpf@...r.kernel.org, Michal Hocko <mhocko@...e.com>, Alexei
Starovoitov <ast@...nel.org>, Matt Bobrowski <mattbobrowski@...gle.com>,
Shakeel Butt <shakeel.butt@...ux.dev>, JP Kobryn
<inwardvessel@...il.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Suren Baghdasaryan <surenb@...gle.com>, Johannes
Weiner <hannes@...xchg.org>, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH bpf-next v3 07/17] mm: introduce BPF OOM struct ops
Josh Don <joshdon@...gle.com> writes:
> Thanks Roman!
>
> On Mon, Jan 26, 2026 at 6:51 PM Roman Gushchin <roman.gushchin@...ux.dev> wrote:
>>
>> Introduce a bpf struct ops for implementing custom OOM handling
>> policies.
>>
>> +bool bpf_handle_oom(struct oom_control *oc)
>> +{
>> + struct bpf_struct_ops_link *st_link;
>> + struct bpf_oom_ops *bpf_oom_ops;
>> + struct mem_cgroup *memcg;
>> + struct bpf_map *map;
>> + int ret = 0;
>> +
>> + /*
>> + * System-wide OOMs are handled by the struct ops attached
>> + * to the root memory cgroup
>> + */
>> + memcg = oc->memcg ? oc->memcg : root_mem_cgroup;
>> +
>> + rcu_read_lock_trace();
>> +
>> + /* Find the nearest bpf_oom_ops traversing the cgroup tree upwards */
>> + for (; memcg; memcg = parent_mem_cgroup(memcg)) {
>> + st_link = rcu_dereference_check(memcg->css.cgroup->bpf.bpf_oom_link,
>> + rcu_read_lock_trace_held());
>> + if (!st_link)
>> + continue;
>> +
>> + map = rcu_dereference_check((st_link->map),
>> + rcu_read_lock_trace_held());
>> + if (!map)
>> + continue;
>> +
>> + /* Call BPF OOM handler */
>> + bpf_oom_ops = bpf_struct_ops_data(map);
>> + ret = bpf_ops_handle_oom(bpf_oom_ops, st_link, oc);
>> + if (ret && oc->bpf_memory_freed)
>> + break;
>> + ret = 0;
>> + }
>> +
>> + rcu_read_unlock_trace();
>> +
>> + return ret && oc->bpf_memory_freed;
>
> If bpf claims to have freed memory but didn't actually do so, that
> seems like something potentially worth alerting to. Perhaps something
> to add to the oom header output?
Michal pointed at a more fundamental problem: if a bpf handler performed
some actions (e.g. killed a program), how to safely allow other bpf
handlers to exit without performing redundant destructive operations?
Now it works on marking victim processes, so that subsequent kernel
oom handlers just bail out if they see a marked process.
I don't know to extend it to generic actions. E.g. we can have an atomic
counter attached to the bpf oom instance (link), we can bump it on
performing a destructive operation, but it's not clear when to clear it.
So maybe it's not worth it at all and it's better to drop this
protection mechanism altogether.
Thanks!
Powered by blists - more mailing lists