lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120530000315.GW21339@redhat.com>
Date:	Wed, 30 May 2012 02:03:15 +0200
From:	Andrea Arcangeli <aarcange@...hat.com>
To:	"Kirill A. Shutemov" <kirill@...temov.name>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Hillf Danton <dhillf@...il.com>, Dan Smith <danms@...ibm.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Mike Galbraith <efault@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Bharata B Rao <bharata.rao@...il.com>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	Christoph Lameter <cl@...ux.com>
Subject: Re: [PATCH 23/35] autonuma: core

On Tue, May 29, 2012 at 02:45:54PM +0300, Kirill A. Shutemov wrote:
> On Fri, May 25, 2012 at 07:02:27PM +0200, Andrea Arcangeli wrote:
> 
> > +static int knumad_do_scan(void)
> > +{
> 
> ...
> 
> > +	if (knumad_test_exit(mm) || !vma) {
> > +		mm_autonuma = mm->mm_autonuma;
> > +		if (mm_autonuma->mm_node.next != &knumad_scan.mm_head) {
> > +			mm_autonuma = list_entry(mm_autonuma->mm_node.next,
> > +						 struct mm_autonuma, mm_node);
> > +			knumad_scan.mm = mm_autonuma->mm;
> > +			atomic_inc(&knumad_scan.mm->mm_count);
> > +			knumad_scan.address = 0;
> > +			knumad_scan.mm->mm_autonuma->numa_fault_pass++;
> > +		} else
> > +			knumad_scan.mm = NULL;
> 
> knumad_scan.mm should be nulled only after list_del otherwise you will
> have race with autonuma_exit():

Thanks for noticing I managed to reproduce it by setting
knuma_scand/scan_sleep_millisecs and
knuma_scand/scan_sleep_pass_millisecs both to 0 and running a loop of
"while :; do memhog -r10 10m &>/dev/null; done".

So the problem was that if knuma_scand would change the knumad_scan.mm
after the mm->mm_users was 0 but before autonuma_exit run,
autonuma_exit wouldn't notice that the mm->mm_auotnuma was already
unlinked and it would unlink again.

autonuma_exit itself doesn't need to tell anything to knuma_scand,
because if it notices knuma_scand.mm == mm, it will do nothing and it
_always_ relies on knumad_scan to unlink it.

And if instead knuma_scand.mm is != mm, then autonuma_exit knows the
knuma_scand daemon will never have a chance to see the "mm" in the
list again if it arrived first (setting mm_autonuma->mm = NULL there
is just a debug tweak according to the comment).

The "serialize" event is there only to wait the knuma_scand main loop
before taking down the mm (it's not related to the list management).

The mm_autonuma->mm is useless after the "mm_autonuma" has been
unlinked so it's ok to use that to track if knuma_scand arrives first.

The exit path of the kernel daemon also forgot to check for
knumad_test_exit(mm) before unlinking, but that only runs if
kthread_should_stop() is true, and nobody calls kthread_stop so it's
only a theoretical improvement.

So this seems to fix it.

diff --git a/mm/autonuma.c b/mm/autonuma.c
index c2a5a82..768250a 100644
--- a/mm/autonuma.c
+++ b/mm/autonuma.c
@@ -679,9 +679,12 @@ static int knumad_do_scan(void)
 		} else
 			knumad_scan.mm = NULL;
 
-		if (knumad_test_exit(mm))
+		if (knumad_test_exit(mm)) {
 			list_del(&mm->mm_autonuma->mm_node);
-		else
+			/* tell autonuma_exit not to list_del */
+			VM_BUG_ON(mm->mm_autonuma->mm != mm);
+			mm->mm_autonuma->mm = NULL;
+		} else
 			mm_numa_fault_flush(mm);
 
 		mmdrop(mm);
@@ -770,8 +773,12 @@ static int knuma_scand(void *none)
 
 	mm = knumad_scan.mm;
 	knumad_scan.mm = NULL;
-	if (mm)
+	if (mm && knumad_test_exit(mm)) {
 		list_del(&mm->mm_autonuma->mm_node);
+		/* tell autonuma_exit not to list_del */
+		VM_BUG_ON(mm->mm_autonuma->mm != mm);
+		mm->mm_autonuma->mm = NULL;
+	}
 	mutex_unlock(&knumad_mm_mutex);
 
 	if (mm)
@@ -996,11 +1003,15 @@ void autonuma_exit(struct mm_struct *mm)
 	mutex_lock(&knumad_mm_mutex);
 	if (knumad_scan.mm == mm)
 		serialize = true;
-	else
+	else if (mm->mm_autonuma->mm) {
+		VM_BUG_ON(mm->mm_autonuma->mm != mm);
+		mm->mm_autonuma->mm = NULL; /* debug */
 		list_del(&mm->mm_autonuma->mm_node);
+	}
 	mutex_unlock(&knumad_mm_mutex);
 
 	if (serialize) {
+		/* prevent the mm to go away under knumad_do_scan main loop */
 		down_write(&mm->mmap_sem);
 		up_write(&mm->mmap_sem);
 	}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ