linux-kernel - Re: [PATCH] mm/oom_kill: wake futex waiters before annihilating victim shared mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAL1p7m7mWxLE-7Qf_QjmREJ2AvfSexPvybPyHvxTUugxsPPxjQ@mail.gmail.com>
Date:   Fri, 14 Jan 2022 09:39:55 -0500
From:   Joel Savitz <jsavitz@...hat.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Waiman Long <longman@...hat.com>, linux-mm@...ck.org,
        Nico Pache <npache@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Darren Hart <dvhart@...radead.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        André Almeida <andrealmeid@...labora.com>
Subject: Re: [PATCH] mm/oom_kill: wake futex waiters before annihilating
 victim shared mutex

> What has happened to the oom victim and why it has never exited?

What appears to happen is that the oom victim is sent SIGKILL by the
process that triggers the oom while also being marked as an oom
victim.

As you mention in your patchset introducing the oom reaper in commit
aac4536355496 ("mm, oom: introduce oom reaper"), the purpose the the
oom reaper is to try and free more memory more quickly than it
otherwise would have been by assuming anonymous or swapped out pages
won't be needed in the exit path as the owner is already dying.
However, this assumption is violated by the futex_cleanup() path,
which needs access to userspace in fetch_robust_entry() when it is
called in exit_robust_list(). Trace_printk()s in this failure path
reveal an apparent race between the oom reaper thread reaping the
victim's mm and the futex_cleanup() path. There may be other ways that
this race manifests but we have been most consistently able to trace
that one.

Since in the case of an oom victim using robust futexes the core
assumption of the oom reaper is violated, we propose to solve this
problem by either canceling or delaying the waking of the oom reaper
thread by wake_oom_reaper in the case that tsk->robust_list is
non-NULL.

e.g. the bug does not reproduce with this patch (from npache@...hat.com):

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 989f35a2bbb1..b8c518fdcf4d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -665,6 +665,19 @@ static void wake_oom_reaper(struct task_struct *tsk)
        if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
                return;

+#ifdef CONFIG_FUTEX
+       /*
+        * don't wake the oom_reaper thread if we still have a robust
list to handle
+        * This will then rely on the sigkill to handle the cleanup of memory
+        */
+       if(tsk->robust_list)
+               return;
+#ifdef CONFIG_COMPAT
+       if(tsk->compat_robust_list)
+               return;
+#endif
+#endif
+
        get_task_struct(tsk);

        spin_lock(&oom_reaper_lock);

Best,
Joel Savitz