[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee07a31c-c514-4a88-599f-14a30e93f32e@redhat.com>
Date: Fri, 8 Apr 2022 04:41:27 -0400
From: Nico Pache <npache@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Rafael Aquini <aquini@...hat.com>,
Waiman Long <longman@...hat.com>, Baoquan He <bhe@...hat.com>,
Christoph von Recklinghausen <crecklin@...hat.com>,
Don Dutile <ddutile@...hat.com>,
"Herton R . Krzesinski" <herton@...hat.com>,
David Rientjes <rientjes@...gle.com>,
Michal Hocko <mhocko@...e.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Davidlohr Bueso <dave@...olabs.net>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Joel Savitz <jsavitz@...hat.com>,
Darren Hart <dvhart@...radead.org>, stable@...nel.org
Subject: Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing
the robust_list_head
On 4/8/22 04:15, Peter Zijlstra wrote:
> On Thu, Apr 07, 2022 at 11:28:09PM -0400, Nico Pache wrote:
>> The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can
>> be targeted by the oom reaper. This mapping is used to store the futex
>> robust list head; the kernel does not keep a copy of the robust list and
>> instead references a userspace address to maintain the robustness during
>> a process death. A race can occur between exit_mm and the oom reaper that
>> allows the oom reaper to free the memory of the futex robust list before
>> the exit path has handled the futex death:
>>
>> CPU1 CPU2
>> ------------------------------------------------------------------------
>> page_fault
>> do_exit "signal"
>> wake_oom_reaper
>> oom_reaper
>> oom_reap_task_mm (invalidates mm)
>> exit_mm
>> exit_mm_release
>> futex_exit_release
>> futex_cleanup
>> exit_robust_list
>> get_user (EFAULT- can't access memory)
>>
>> If the get_user EFAULT's, the kernel will be unable to recover the
>> waiters on the robust_list, leaving userspace mutexes hung indefinitely.
>>
>> Use the robust_list address stored in the kernel to skip the VMA that holds
>> it, allowing a successful futex_cleanup.
>>
>> Theoretically a failure can still occur if there are locks mapped as
>> PRIVATE|ANON; however, the robust futexes are a best-effort approach.
>> This patch only strengthens that best-effort.
>>
>> The following case can still fail:
>> robust head (skipped) -> private lock (reaped) -> shared lock (skipped)
>
> This is still all sorts of confused.. it's a list head, the entries can
> be in any random other VMA. You must not remove *any* user memory before
> doing the robust thing. Not removing the VMA that contains the head is
> pointless in the extreme.
Not sure how its pointless if it fixes all the different reproducers we've
written for it. As for the private lock case we stated here, we havent been able
to reproduce it, but I could see how it can be a potential issue (which is why
its noted).
>
> Did you not read the previous discussion?
I did... Thats why I added the blurb about best-effort and the case that can
theoretically still fail.
The oom reaper doesn't reap shared memory but WITHOUT this change it was reaping
the head (PRIVATE|ANON) which is needed to get to those shared mappings; so
shared locks are safe with this added change.
If a user implements private locks they can only be used within that process. If
that process is OOMing then it doesnt really matter what happens to the
futexes... all of those threads running under that process will terminate anyways.
Perhaps I'm misunderstanding you.
-- Nico
Powered by blists - more mailing lists