lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1357881760.5716.31.camel@marge.simpson.net>
Date:	Fri, 11 Jan 2013 06:22:40 +0100
From:	Mike Galbraith <bitbucket@...ine.de>
To:	Shawn Bohrer <sbohrer@...advisors.com>
Cc:	Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
	mingo@...e.hu, peterz@...radead.org
Subject: Re: kernel BUG at kernel/sched_rt.c:493!

On Thu, 2013-01-10 at 13:58 -0600, Shawn Bohrer wrote:

> Here is the output:
> 
> [   81.278842] SysRq : Changing Loglevel
> [   81.279027] Loglevel set to 9
> [   83.285456] Initial want: 50000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 900000000
> [   85.286452] Initial want: 50000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 900000000
> [   85.289625] Initial want: 50000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 900000000
> [   87.287435] Initial want: 100000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 850000000
> [   87.290718] Initial want: 50000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 900000000
> [   89.288469] Initial want: -50000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 1000000000
> [   89.291550] Initial want: 150000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 800000000
> [   89.292940] Initial want: 100000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 850000000
> [   89.294082] Initial want: 100000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 850000000
> [   89.295194] Initial want: 50000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 900000000
> [   89.296274] Initial want: 50000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 900000000
> [   90.959004] [sched_delayed] sched: RT throttling activated
> [   91.289470] Initial want: 200000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 750000000
> [   91.292767] Initial want: 200000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 750000000
> [   91.294037] Initial want: 200000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 750000000
> [   91.295364] Initial want: 200000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 750000000
> [   91.296355] BUG triggered, want: 200000000
> [   91.296355] 
> [   91.296355] rt_rq[7]:
> [   91.296355]   .rt_nr_running                 : 0
> [   91.296355]   .rt_throttled                  : 0
> [   91.296355]   .rt_time                       : 0.000000
> [   91.296355]   .rt_runtime                    : 750.000000
> [   91.307332] Initial want: -50000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 1000000000
> [   91.308440] Initial want: -100000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 1050000000
> [   91.309586] Initial want: -150000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 1100000000
> [   91.310716] Initial want: -200000000 rt_b->rt_runtime: 950000000 rt_rq->rt_runtime: 1150000000
> [   91.311707] BUG triggered, want: -200000000
> [   91.311707] 
> [   91.311707] rt_rq[6]:
> [   91.311707]   .rt_nr_running                 : 1
> [   91.311707]   .rt_throttled                  : 0
> [   91.311707]   .rt_time                       : 307.209987
> [   91.311707]   .rt_runtime                    : 1150.000000

That makes about as much sense as my crash did.  There is no leak, but
cpu found nada.  So rd/span is changing on us?  We saw nada as we
traversed, release locks, poof, it's there.  In my dump, at crash time,
rd/span was fine, but at the top of the stack, I found a mighty
suspicious reference to def_root_domain.span.  At _crash_ time, both it
and rq->rd.span had pilfer-able goodies, but whatever it was that we
traversed had nada just a wee bit earlier.

If the darn thing would trigger for me again, I'd use trace_printk(),
check rd and cpumask_weight(rd->span) before/after traverse, print cpu
and rt_runtime as we traverse, set /proc/sys/kernel/ftrace_dump_on_oops
to 1.. and hope bug doesn't like to play heisenbug games.

-Mike 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ