[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87wnfwf4e5.ffs@tglx>
Date: Mon, 11 Apr 2022 09:47:14 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Michal Hocko <mhocko@...e.com>
Cc: Joel Savitz <jsavitz@...hat.com>, Nico Pache <npache@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, linux-mm@...ck.org,
linux-kernel <linux-kernel@...r.kernel.org>,
Rafael Aquini <aquini@...hat.com>,
Waiman Long <longman@...hat.com>, Baoquan He <bhe@...hat.com>,
Christoph von Recklinghausen <crecklin@...hat.com>,
Don Dutile <ddutile@...hat.com>,
"Herton R . Krzesinski" <herton@...hat.com>,
David Rientjes <rientjes@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Davidlohr Bueso <dave@...olabs.net>,
Ingo Molnar <mingo@...hat.com>,
Darren Hart <dvhart@...radead.org>, stable@...nel.org
Subject: Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing
the robust_list_head
Michal,
On Mon, Apr 11 2022 at 08:48, Michal Hocko wrote:
> On Fri 08-04-22 23:41:11, Thomas Gleixner wrote:
>> So why would a process private robust mutex be any different from a
>> process shared one?
>
> Purely from the OOM POV they are slightly different because the OOM
> killer always kills all threads which share the mm with the selected
> victim (with an exception of the global init - see __oom_kill_process).
> Note that this is including those threads which are not sharing signals
> handling.
> So clobbering private locks shouldn't be observable to an alive thread
> unless I am missing something.
Yes, it kills everything, but the reaper also reaps non-shared VMAs. So
if the process private futex sits in a reaped VMA the shared one becomes
unreachable.
> On the other hand I do agree that delayed oom_reaper execution is a
> reasonable workaround and the most simplistic one.
I think it's more than a workaround. It's a reasonable expectation that
the kernel side of the user space threads can mop up the mess the user
space part created. So even if one of of N threads is stuck in a place
where it can't, then N-1 can still reach do_exit() and mop their mess
up.
The oom reaper is the last resort to resolve the situation in case of a
stuck task. No?
> If I understand your example code then we would need to evaluate the
> whole robust list and that is simply not feasible because that would
> require a #PF in general case.
Right. The robust list exit code does the user access with pagefaults
disabled and if it fails, it terminates the list walk. Bad luck :)
Thanks,
tglx
Powered by blists - more mailing lists