[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <oogiwonpxtqurkad7rt2zxc3ffgeujtilivno3ibcybzucsliw@ym7jm6r5kdil>
Date: Tue, 26 Aug 2025 15:26:37 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, zhongjinji <zhongjinji@...or.com>, mhocko@...e.com,
rientjes@...gle.com, akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, tglx@...utronix.de, liulu.liu@...or.com, feng.han@...or.com
Subject: Re: [PATCH v5 2/2] mm/oom_kill: Have the OOM reaper and exit_mmap()
traverse the maple tree in opposite order
On Tue, Aug 26, 2025 at 11:21:13AM -0400, Liam R. Howlett wrote:
> * Lorenzo Stoakes <lorenzo.stoakes@...cle.com> [250826 09:50]:
> > On Tue, Aug 26, 2025 at 09:37:22AM -0400, Liam R. Howlett wrote:
> > > I really don't think this is worth doing. We're avoiding a race between
> > > oom and a task unmap - the MMF bits should be used to avoid this race -
> > > or at least mitigate it.
> >
> > Yes for sure, as explored at length in previous discussions this feels like
> > we're papering over cracks here.
> >
> > _However_, I'm sort of ok with a minimalistic fix that solves the proximate
> > issue even if it is that, as long as it doesn't cause issues in doing so.
> >
> > So this is my take on the below and why I'm open to it!
> >
> > >
> > > They are probably both under the read lock, but considering how rare it
> > > would be, would a racy flag check be enough - it is hardly critical to
> > > get right. Either would reduce the probability.
> >
> > Zongjinji - I'm stil not sure that you've really indicated _why_ you're
> > seeing such a tight and unusual race. Presumably some truly massive number
> > of tasks being OOM'd and unmapping but... yeah that seems odd anyway.
> >
> > But again, if we can safely fix this in a way that doesn't hurt stuff too
> > much I'm ok with it (of course, these are famous last words in the kernel
> > often...!)
> >
> > Liam - are you open to a solution on the basis above, or do you feel we
> > ought simply to fix the underlying issue here?
>
> At least this is a benign race.
Is this really a race or rather a contention? IIUC exit_mmap and the oom
reaper are trying to unmap the address space of the oom-killed process
and can compete on page table locks. If both are running concurrently on
two cpus then the contention can continue for whole address space and
can slow down the actual memory freeing. Making oom reaper traverse in
opposite direction can drastically reduce the contention and faster
memory freeing.
> I'd think using MMF_ to reduce the race
> would achieve the same goal with less risk - which is why I bring it up.
>
With MMF_ flag, are you suggesting oom reaper to skip the unmapping of
the oom-killed process?
> Really, both methods should be low risk, so I'm fine with either way.
>
> But I am interested in hearing how this race is happening enough to
> necessitate a fix. Reversing the iterator is a one-spot fix - if this
> happens elsewhere then we're out of options. Using the MMF_ flags is
> more of a scalable fix, if it achieves the same results.
On the question of if this is a rare situaion and worth the patch. I
would say this scenario is not that rare particularly on low memory
devices and on highly utilized overcommitted systems. Memory pressure
and oom-kills are norm on such systems. The point of oom reaper is to
bring the system out of the oom situation quickly and having two cpus
unmapping the oom-killed process can potentially bring the system out of
oom situation faster.
I think the patch (with your suggestions) is simple enough and I don't
see any risk in including it.
Powered by blists - more mailing lists