[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220408081549.GM2731@worktop.programming.kicks-ass.net>
Date: Fri, 8 Apr 2022 10:15:49 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Nico Pache <npache@...hat.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Rafael Aquini <aquini@...hat.com>,
Waiman Long <longman@...hat.com>, Baoquan He <bhe@...hat.com>,
Christoph von Recklinghausen <crecklin@...hat.com>,
Don Dutile <ddutile@...hat.com>,
"Herton R . Krzesinski" <herton@...hat.com>,
David Rientjes <rientjes@...gle.com>,
Michal Hocko <mhocko@...e.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Davidlohr Bueso <dave@...olabs.net>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Joel Savitz <jsavitz@...hat.com>,
Darren Hart <dvhart@...radead.org>, stable@...nel.org
Subject: Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing
the robust_list_head
On Thu, Apr 07, 2022 at 11:28:09PM -0400, Nico Pache wrote:
> The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can
> be targeted by the oom reaper. This mapping is used to store the futex
> robust list head; the kernel does not keep a copy of the robust list and
> instead references a userspace address to maintain the robustness during
> a process death. A race can occur between exit_mm and the oom reaper that
> allows the oom reaper to free the memory of the futex robust list before
> the exit path has handled the futex death:
>
> CPU1 CPU2
> ------------------------------------------------------------------------
> page_fault
> do_exit "signal"
> wake_oom_reaper
> oom_reaper
> oom_reap_task_mm (invalidates mm)
> exit_mm
> exit_mm_release
> futex_exit_release
> futex_cleanup
> exit_robust_list
> get_user (EFAULT- can't access memory)
>
> If the get_user EFAULT's, the kernel will be unable to recover the
> waiters on the robust_list, leaving userspace mutexes hung indefinitely.
>
> Use the robust_list address stored in the kernel to skip the VMA that holds
> it, allowing a successful futex_cleanup.
>
> Theoretically a failure can still occur if there are locks mapped as
> PRIVATE|ANON; however, the robust futexes are a best-effort approach.
> This patch only strengthens that best-effort.
>
> The following case can still fail:
> robust head (skipped) -> private lock (reaped) -> shared lock (skipped)
This is still all sorts of confused.. it's a list head, the entries can
be in any random other VMA. You must not remove *any* user memory before
doing the robust thing. Not removing the VMA that contains the head is
pointless in the extreme.
Did you not read the previous discussion?
Powered by blists - more mailing lists