lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200911081929.53418.kernel@kolivas.org>
Date:	Sun, 8 Nov 2009 19:29:53 +1100
From:	Con Kolivas <kernel@...ivas.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Kevin Winchester <kjwinchester@...il.com>,
	Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...e.hu>,
	LKML <linux-kernel@...r.kernel.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Steven Rostedt <rostedt@...dmis.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Paul E. McKenney <paulmck"@linux.vnet.i
Subject: Re: Intermittent early panic in try_to_wake_up

On Sun, 8 Nov 2009 03:35:54 Peter Zijlstra wrote:
> On Sat, 2009-11-07 at 12:24 -0400, Kevin Winchester wrote:
> > Mike Galbraith wrote:
> > > On Fri, 2009-11-06 at 19:49 -0400, Kevin Winchester wrote:
> > >> The patch below does not apply to mainline, unless I'm doing something
> > >> wrong. It's against -tip, I assume?  Is it just as applicable to
> > >> mainline?
> > >
> > > It was mainline, but I had the scheduler pull request and another in
> > > for testing as well.  Linus has pulled, so it'll apply now, with
> > > offsets.
> >
> > It did end up applying, but did not have any effect.  Looking at the
> > patch again, I see that it appears to only affect CONFIG_SMP, which I am
> > not running (and in fact it adds a build warning for the !SMP case).  So
> > there was not much chance of it fixing anything, I suppose.
> >
> > Any other ideas?  I don't have a serial console, and the trace scrolls
> > off my console, so I don't know if any debug printks would help.  Would
> > it help if I copied the entire panic message entirely, including the Code
> > section? I can try that the next time it happens.
> 
> Use vga=ask boot_delay=100 select the highest res possible.
> 
> Possibly you could use a digital (video) camera to record the output.
> 

For what it's worth I've seen this on BFS and assumed it was a bfs issue until 
I spotted this thread so I'll tell you what I discovered when I was 
investigating it, but unfortunately I did not find the root cause.

Incredibly the bug happened in try_to_wake_up where the task struct that was 
in the call function (p) gets dereferenced before the rq lock is grabbed. Then 
when the rq lock is attempted to be grabbed it has no p to reference.

Further investigation showed it to always be ksoftirqd spawning on bootup only 
and never in any other situation. The factors that were common was that there 
would always be a conditional resched that occurred and that's how it would 
get lost. I tried stepping through the boot process on kvm but always came up 
stumped as to how on earth it even happened. The only common variable was that 
it -only- ever happened with voluntary preempt enabled, and not with full 
preempt or no-preempt. cond_resched is called 2 or 3 times during the boot 
sequence via might_sleep by that stage, but if I removed each might_sleep one 
at a time it would just happen from a different might_sleep, suggesting we 
weren't sleeping when we shouldn't. Since I'm anti-fan of voluntary preempt, I 
gave up trying to find the root cause and put this nonsense workaround in 
__cond_resched :

static void __cond_resched(void)
{
	if (unlikely(system_state != SYSTEM_RUNNING))
		return;

And it's still there in BFS, but it fixes the problem, in case someone wanted 
to use voluntary with bfs. I've long since lost the config that caused the 
problem reliably and can't guarantee that it's the same thing happening on 
mainline, but figured the information might be helpful.

Regards,
-- 
-ck
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ