lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 10 Aug 2009 16:55:18 +0100
From:	Catalin Marinas <catalin.marinas@....com>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: kmemleak: Protect the seq start/next/stop sequence by
	rcu_read_lock()

Hi Ingo,

On Sun, 2009-08-02 at 13:14 +0200, Ingo Molnar wrote:
> hm, some recent kmemleak patch is causing frequent hard and 
> soft lockups in -tip testing (-rc5 based).

Thanks for reporting this. It shouldn't be caused by the patch mentioned
in the subject as this only deals with reading the seq file which
doesn't seem to be the case here.

Would enabling CONFIG_PREEMPT make a difference?

> The pattern is similar: the kmemleak thread keeps spinning 
> in scan_objects() and never seems to finish:
> 
> [  177.093253]  <NMI>  [<ffffffff82d2cc90>] nmi_watchdog_tick+0xe8/0x200
> [  177.093253]  [<ffffffff810c76c8>] ? notify_die+0x3d/0x53
> [  177.093253]  [<ffffffff82d2bf4a>] default_do_nmi+0x84/0x22b
> [  177.093253]  [<ffffffff82d2c164>] do_nmi+0x73/0xcc
> [  177.093253]  [<ffffffff82d2b8a0>] nmi+0x20/0x39
> [  177.093253]  [<ffffffff82d2b560>] ? page_fault+0x0/0x30
> [  177.093253]  <<EOE>>  [<ffffffff8118bd42>] ? scan_block+0x40/0x123
> [  177.093253]  [<ffffffff82d2ac48>] ? _spin_lock_irqsave+0x8a/0xac
> [  177.093253]  [<ffffffff8118c17e>] kmemleak_scan+0x359/0x61e
> [  177.093253]  [<ffffffff8118be25>] ? kmemleak_scan+0x0/0x61e
> [  177.093253]  [<ffffffff8118cbed>] ? kmemleak_scan_thread+0x0/0xd0
> [  177.093253]  [<ffffffff8118cc62>] kmemleak_scan_thread+0x75/0xd0
> [  177.093253]  [<ffffffff810c157c>] kthread+0xa8/0xb0

I'm not sure exactly which scan_block call (or calls) is locked up.
Usually the task stacks scanning may take a significant amount of time
with the tasklist_lock held. You can disable this by echoing stack=off
to the /sys/kernel/debug/kmemleak file. The kmemleak branch currently
merged in -next avoids this problem by treating task stacks as any other
allocated object (top two commits at
http://www.linux-arm.org/git?p=linux-2.6.git;a=shortlog;h=kmemleak and
maybe the one called "Allow rescheduling during an object scanning").

There is also commit 2587362eaf5c which keeps scanning newly allocated
objects several times but there are cond_resched() calls and shouldn't
look like a lockup, unless some list gets corrupted and become circular.
Does the patch below make any difference:

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 4872673..c192c57 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1076,8 +1076,7 @@ repeat:
 		object = tmp;
 	}
 
-	if (scan_should_stop() || ++gray_list_pass >= GRAY_LIST_PASSES)
-		goto scan_end;
+	goto scan_end;
 
 	/*
 	 * Check for new objects allocated during this scanning and add them

> Yesterday i let one of the testboxes run overnight in this 
> state and it never recovered from the lockup.

What other tests are run on such testbox when kmemleak locks up? Are
there lots of processes created or modules loaded/unloaded frequently?

Sorry for asking more questions than providing solutions but I cannot
currently reproduce the lockup (short lockups yes, but not a permanent
one). If you have time, maybe you could just merge the "kmemleak" branch
from git://linux-arm.org/linux-2.6.git and see whether it improves
things.

Thanks.

-- 
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ