lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 21 Mar 2022 09:55:57 +0100 From: Michal Hocko <mhocko@...e.com> To: Nico Pache <npache@...hat.com> Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, Rafael Aquini <aquini@...hat.com>, Waiman Long <longman@...hat.com>, Baoquan He <bhe@...hat.com>, Christoph von Recklinghausen <crecklin@...hat.com>, Don Dutile <ddutile@...hat.com>, "Herton R . Krzesinski" <herton@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Darren Hart <dvhart@...radead.org>, Davidlohr Bueso <dave@...olabs.net>, Andre Almeida <andrealmeid@...labora.com>, David Rientjes <rientjes@...gle.com>, Andrea Arcangeli <aarcange@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, Joel Savitz <jsavitz@...hat.com> Subject: Re: [PATCH v5] mm/oom_kill.c: futex: Close a race between do_exit and the oom_reaper On Thu 17-03-22 21:36:21, Nico Pache wrote: > The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can > be targeted by the oom reaper. This mapping is used to store the futex > robust list; the kernel does not keep a copy of the robust list and instead > references a userspace address to maintain the robustness during a process > death. A race can occur between exit_mm and the oom reaper that allows > the oom reaper to free the memory of the futex robust list before the > exit path has handled the futex death: > > CPU1 CPU2 > ------------------------------------------------------------------------ > page_fault > out_of_memory > do_exit "signal" > wake_oom_reaper > oom_reaper > oom_reap_task_mm (invalidates mm) > exit_mm > exit_mm_release > futex_exit_release > futex_cleanup > exit_robust_list > get_user (EFAULT- can't access memory) I still think it is useful to be explicit about the consequences of the EFAULT here. Did you want to mention that a failing get_user in this path would result in a hang because nobody is woken up when the current holder of the lock terminates. > While in the oom reaper thread, we must handle the futex cleanup without > sleeping. To achieve this, add the boolean `try` to futex_exit_begin(). > This will control weather or not we use a trylock. Also introduce > try_futex_exit_release() which will utilize the trylock version of the > futex_cleanup_begin(). Also call kthread_use_mm in this context to assure > the get_user call in futex_cleanup() does not return with EFAULT. This alone is not sufficient. get_user can sleep in the #PF handler path (e.g. by waiting for swap in). Or is there any guarantee that the page is never swapped out? If we cannot rule out #PF then this is not a viable way to address the problem I am afraid. [...] > @@ -587,6 +588,18 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) > goto out_unlock; > } > > + /* We can't reap a process holding a robust_list; the pthread > + * struct is allocated in userspace using PRIVATE | ANONYMOUS > + * memory which when reaped before futex_cleanup() can leave > + * the waiting process stuck. Try to perform the futex_cleanup, > + * and if unsuccessful, skip the OOM reaping. > + */ > + if (task_has_robust_list(tsk) && !try_futex_exit_release(tsk)) { > + trace_skip_task_reaping(tsk->pid); > + pr_info("oom_reaper: skipping task as it contains a robust list"); > + goto out_finish; > + } > + > trace_start_task_reaping(tsk->pid); > > /* failed to reap part of the address space. Try again later */ Please also note that this all is done after mmap_lock has been already taken so a page fault could deadlock on the mmap_lock. The more I am thinking about this the more I am getting convinced that we should rather approach this differently and skip over vmas which can be holding the list. Have you considered this option? -- Michal Hocko SUSE Labs
Powered by blists - more mailing lists