linux-kernel - Re: [PATCH rfc 10/12] mm: introduce bpf_out_of

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aBHQ69_rCqjnDaDl@tiehlicka>
Date: Wed, 30 Apr 2025 09:27:39 +0200
From: Michal Hocko <mhocko@...e.com>
To: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: linux-kernel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
	Alexei Starovoitov <ast@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Shakeel Butt <shakeel.butt@...ux.dev>,
	Suren Baghdasaryan <surenb@...gle.com>,
	David Rientjes <rientjes@...gle.com>, Josh Don <joshdon@...gle.com>,
	Chuyi Zhou <zhouchuyi@...edance.com>, cgroups@...r.kernel.org,
	linux-mm@...ck.org, bpf@...r.kernel.org
Subject: Re: [PATCH rfc 10/12] mm: introduce bpf_out_of_memory() bpf kfunc

On Tue 29-04-25 21:31:35, Roman Gushchin wrote:
> On Tue, Apr 29, 2025 at 01:46:07PM +0200, Michal Hocko wrote:
> > On Mon 28-04-25 03:36:15, Roman Gushchin wrote:
> > > Introduce bpf_out_of_memory() bpf kfunc, which allows to declare
> > > an out of memory events and trigger the corresponding kernel OOM
> > > handling mechanism.
> > > 
> > > It takes a trusted memcg pointer (or NULL for system-wide OOMs)
> > > as an argument, as well as the page order.
> > > 
> > > Only one OOM can be declared and handled in the system at once,
> > > so if the function is called in parallel to another OOM handling,
> > > it bails out with -EBUSY.
> > 
> > This makes sense for the global OOM handler because concurrent handlers
> > are cooperative. But is this really correct for memcg ooms which could
> > happen for different hierarchies? Currently we do block on oom_lock in
> > that case to make sure one oom doesn't starve others. Do we want the
> > same behavior for custom OOM handlers?
> 
> It's a good point and I had similar thoughts when I was working on it.
> But I think it's orthogonal to the customization of the oom handling.
> Even for the existing oom killer it makes no sense to serialize memcg ooms
> in independent memcg subtrees. But I'm worried about the dmesg reporting,
> it can become really messy for 2+ concurrent OOMs.
> 
> Also, some memory can be shared, so one OOM can eliminate a need for another
> OOM, even if they look independent.
> 
> So my conclusion here is to leave things as they are until we'll get signs
> of real world problems with the (lack of) concurrency between ooms.

How do we learn about that happening though? I do not think we have any
counters to watch to suspect that some oom handlers cannot run.
-- 
Michal Hocko
SUSE Labs