linux-kernel - Re: [PATCH] [BUGFIX] update mm->owner even if no next owner.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.00.1106121828220.31463@sister.anvils>
Date:	Sun, 12 Jun 2011 18:41:58 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Johannes Weiner <hannes@...xchg.org>
cc:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Ying Han <yinghan@...gle.com>, Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] [BUGFIX] update mm->owner even if no next owner.

On Sat, 11 Jun 2011, Johannes Weiner wrote:
> On Sat, Jun 11, 2011 at 01:54:42AM +0200, Johannes Weiner wrote:
> > On Fri, Jun 10, 2011 at 02:49:35PM -0700, Hugh Dickins wrote:
> > > On Fri, 10 Jun 2011, KAMEZAWA Hiroyuki wrote:
> > > > 
> > > > I think this can be a fix. 
> > > 
> > > Sorry, I think not: I've not digested your rationale,
> > > but three things stand out:
> > > 
> > > 1. Why has this only just started happening?  I may not have run that
> > >    test on 3.0-rc1, but surely I ran it for hours with 2.6.39;
> > >    maybe not with khugepaged, but certainly with ksmd.
> > > 
> > > 2. Your hunk below:
> > > > -	if (!mm_need_new_owner(mm, p))
> > > > +	if (!mm_need_new_owner(mm, p)) {
> > > > +		rcu_assign_pointer(mm->owner, NULL);
> > >    is now setting mm->owner to NULL at times when we were sure it did not
> > >    need updating before (task is not the owner): you're damaging mm->owner.
> 
> This is a problem with the patch, but I think Kame's analysis and
> approach to fix it are still correct.

Yes, I was looking at his patch, when I should have spent more time
reading his comments: you're right, the analysis is fine, and I too
dislike stale pointers.

> 
> mm_update_next_owner() does not set mm->owner to NULL when the last
> possible owner goes away, but leaves it pointing to a possibly stale
> task struct.
> 
> Noone cared before khugepaged, and up to Andrea's patch khugepaged
> prevented the last possible owner from exiting until the call into the
> memory controller had finished.
> 
> Here is a revised version of Kame's fix.

It seems to be strangely difficult to get right!
I have no idea what your
	if (atomic_read(&mm->mm_users <= 1)) {
actually ends up doing, I'm surprised it only gives compiler warnings
rather than an error.

The version I've signed off and am actually testing is below;
but I've not had enough time to spare on the machine which reproduced
it before, and another I thought I'd delegate it to last night,
failed to reproduce without the patch.  Try again tonight.

Thought I'd better respond despite inadequate testing, given the flaw
in the posted patch.  Hope the one below is flawless.

Hugh

> 
> ---

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: [PATCH] [BUGFIX] mm: clear mm->owner when last possible owner leaves

The following crash was reported:

> Call Trace:
>  [<ffffffff81139792>] mem_cgroup_from_task+0x15/0x17
>  [<ffffffff8113a75a>] __mem_cgroup_try_charge+0x148/0x4b4
>  [<ffffffff810493f3>] ? need_resched+0x23/0x2d
>  [<ffffffff814cbf43>] ? preempt_schedule+0x46/0x4f
>  [<ffffffff8113afe8>] mem_cgroup_charge_common+0x9a/0xce
>  [<ffffffff8113b6d1>] mem_cgroup_newpage_charge+0x5d/0x5f
>  [<ffffffff81134024>] khugepaged+0x5da/0xfaf
>  [<ffffffff81078ea0>] ? __init_waitqueue_head+0x4b/0x4b
>  [<ffffffff81133a4a>] ? add_mm_counter.constprop.5+0x13/0x13
>  [<ffffffff81078625>] kthread+0xa8/0xb0
>  [<ffffffff814d13e8>] ? sub_preempt_count+0xa1/0xb4
>  [<ffffffff814d5664>] kernel_thread_helper+0x4/0x10
>  [<ffffffff814ce858>] ? retint_restore_args+0x13/0x13
>  [<ffffffff8107857d>] ? __init_kthread_worker+0x5a/0x5a

What happens is that khugepaged tries to charge a huge page against an
mm whose last possible owner has already exited, and the memory
controller crashes when the stale mm->owner is used to look up the
cgroup to charge.

mm->owner has never been set to NULL with the last owner going away,
but nobody cared until khugepaged came along.

Even then it wasn't a problem because the final mmput() on an mm was
forced to acquire and release mmap_sem in write-mode, preventing an
exiting owner to go away while the mmap_sem was held, and until
"692e0b3 mm: thp: optimize memcg charge in khugepaged", the memory
cgroup charge was protected by mmap_sem in read-mode.

Instead of going back to relying on the mmap_sem to enforce lifetime
of a task, this patch ensures that mm->owner is properly set to NULL
when the last possible owner is exiting, which the memory controller
can handle just fine.

Reported-by: Hugh Dickins <hughd@...gle.com>
Reported-by: Dave Jones <davej@...hat.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Signed-off-by: Johannes Weiner <hannes@...xchg.org>
Reviewed-by: Andrea Arcangeli <aarcange@...hat.com>
Signed-off-by: Hugh Dickins <hughd@...gle.com>
---

diff --git a/kernel/exit.c b/kernel/exit.c
index 20a4064..ef8ff79 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -563,27 +563,27 @@ void exit_files(struct task_struct *tsk)
 /*
  * Task p is exiting and it owned mm, lets find a new owner for it
  */
-static inline int
-mm_need_new_owner(struct mm_struct *mm, struct task_struct *p)
-{
-	/*
-	 * If there are other users of the mm and the owner (us) is exiting
-	 * we need to find a new owner to take on the responsibility.
-	 */
-	if (atomic_read(&mm->mm_users) <= 1)
-		return 0;
-	if (mm->owner != p)
-		return 0;
-	return 1;
-}
-
 void mm_update_next_owner(struct mm_struct *mm)
 {
 	struct task_struct *c, *g, *p = current;
 
 retry:
-	if (!mm_need_new_owner(mm, p))
+	/*
+	 * If the exiting or execing task is not the owner, it's
+	 * someone else's problem.
+	 */
+	if (mm->owner != p)
+		return;
+
+	/*
+	 * The current owner is exiting/execing and there are no other
+	 * candidates.  Do not leave the mm pointing to a possibly
+	 * freed task structure.
+	 */
+	if (atomic_read(&mm->mm_users) <= 1) {
+		mm->owner = NULL;
 		return;
+	}
 
 	read_lock(&tasklist_lock);
 	/*