[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090904092742.GA11014@elte.hu>
Date: Fri, 4 Sep 2009 11:27:42 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Andreas Herrmann <andreas.herrmann3@....com>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
linux-kernel@...r.kernel.org, Gautham R Shenoy <ego@...ibm.com>,
Balbir Singh <balbir@...ibm.com>
Subject: [crash] Re: [RFC][PATCH 0/8] load-balancing and cpu_power -v2
i've queued up Peter's patches, with your and Gautham's fixes
embedded. It works mostly fine - except on two larger boxes, where
-tip stress-testing triggered this crash:
aldebaran login: [ 1774.088275] divide error: 0000 [#1] SMP
[ 1774.092293] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
[ 1774.100355] CPU 13
[ 1774.102498] Modules linked in:
[ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN
[ 1774.114807] RIP: 0010:[<ffffffff81041c38>] [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4
[ 1774.123676] RSP: 0018:ffff880306c1fd58 EFLAGS: 00010246
[ 1774.129037] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1774.136287] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000040
[ 1774.143554] RBP: ffff880306c1fde8 R08: 0000000000000000 R09: ffffc9000140f4c8
[ 1774.150748] R10: ffff88031288c650 R11: ffff880306c1fe08 R12: ffffc90001a0f3a0
[ 1774.158007] R13: ffffc9000140f4b0 R14: 0000000000000000 R15: 0000000000014f00
[ 1774.165248] FS: 0000000000000000(0000) GS:ffffc90001a00000(0063) knlGS:00000000f7f156c0
[ 1774.173473] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 1774.179320] CR2: 000000004822c0ac CR3: 000000031357b000 CR4: 00000000000006e0
[ 1774.186586] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1774.193826] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1774.201101] Process hackbench (pid: 30881, threadinfo ffff880306c1e000, task ffff88030b4aa710)
[ 1774.209836] Stack:
[ 1774.211861] 0000000000014f00 0000000000014f00 0000000000014f00 0000000000014f10
[ 1774.219300] <0> 0000000000014f10 0000000d00000008 ffff88030b4aa710 ffffc9000140f4c0
[ 1774.227129] <0> 0000000006c1fdd8 0000007d00000001 ffff88030b4aa900 ffffc9000100f4b0
[ 1774.235225] Call Trace:
[ 1774.237704] [<ffffffff810444c4>] sched_fork+0x2c/0x15f
[ 1774.243032] [<ffffffff8104942d>] copy_process+0x407/0xda7
[ 1774.248608] [<ffffffff81049f16>] do_fork+0x149/0x309
[ 1774.253809] [<ffffffff8120ff12>] ? __up_read+0x9e/0xa8
[ 1774.259101] [<ffffffff8106df32>] ? up_read+0xe/0x10
[ 1774.264129] [<ffffffff81565d66>] ? do_page_fault+0x291/0x2c3
[ 1774.269968] [<ffffffff81035028>] sys32_clone+0x2c/0x2e
[ 1774.275267] [<ffffffff81034d05>] ia32_ptregs_common+0x25/0x4c
[ 1774.281200] Code: cb 48 8b 7d a8 ff c2 be 40 00 00 00 48 63 d2 e8 73 9b 1c 00 3b 05 19 2e 8c 00 89 c2 7c 8d 41 8b 4d 08 48 c1 e3 0a 31 d2 48 89 d8 <48> f7 f1 83 7d b4 00 48 89 c1 75 16 4c 39 f0 73 0d 49 89 c6 48
[ 1774.301423] RIP [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4
[ 1774.307903] RSP <ffff880306c1fd58>
[ 1774.311474] ---[ end trace a56b661c1598b0fc ]---
[ 1774.316202] Kernel panic - not syncing: Fatal exception
[ 1774.321497] Pid: 30881, comm: hackbench Tainted: G D 2.6.31-rc8-tip-01308-g484d664-dirty #1629
[ 1774.330925] Call Trace:
[ 1774.333402] [<ffffffff81561aaa>] panic+0x7a/0x125
[ 1774.338269] [<ffffffff815645d2>] oops_end+0xaa/0xba
[ 1774.343321] [<ffffffff8100f4f1>] die+0x5a/0x63
[ 1774.347887] [<ffffffff81563ff6>] do_trap+0x110/0x11f
[ 1774.353052] [<ffffffff8100d8ab>] do_divide_error+0x90/0x99
[ 1774.358691] [<ffffffff81041c38>] ? sched_balance_self+0x19b/0x2d4
[ 1774.364966] [<ffffffff810d8021>] ? zone_statistics+0x65/0x6a
[ 1774.370831] [<ffffffff810cb2ef>] ? get_page_from_freelist+0x4a2/0x675
[ 1774.377487] [<ffffffff8100cad5>] divide_error+0x15/0x20
[ 1774.382894] [<ffffffff81041c38>] ? sched_balance_self+0x19b/0x2d4
[ 1774.389173] [<ffffffff81041c21>] ? sched_balance_self+0x184/0x2d4
[ 1774.395479] [<ffffffff810444c4>] sched_fork+0x2c/0x15f
[ 1774.400792] [<ffffffff8104942d>] copy_process+0x407/0xda7
[ 1774.406397] [<ffffffff81049f16>] do_fork+0x149/0x309
[ 1774.411562] [<ffffffff8120ff12>] ? __up_read+0x9e/0xa8
[ 1774.416897] [<ffffffff8106df32>] ? up_read+0xe/0x10
[ 1774.421957] [<ffffffff81565d66>] ? do_page_fault+0x291/0x2c3
[ 1774.427815] [<ffffffff81035028>] sys32_clone+0x2c/0x2e
[ 1774.433124] [<ffffffff81034d05>] ia32_ptregs_common+0x25/0x4c
config attached as well.
the domain setup is this:
SD flag: 4717
+ 1: SD_LOAD_BALANCE: Do load balancing on this domain
- 2: SD_BALANCE_NEWIDLE: Balance when about to become idle
+ 4: SD_BALANCE_EXEC: Balance on exec
+ 8: SD_BALANCE_FORK: Balance on fork, clone
- 16: SD_WAKE_IDLE: Wake to idle CPU on task wakeup
+ 32: SD_WAKE_AFFINE: Wake task to waking CPU
+ 64: SD_WAKE_BALANCE: Perform balancing at task wakeup
+ 512: SD_SHARE_PKG_RESOURCES: Domain members share cpu pkg resources
current val on cpu0/domain1:
SD flag: 1133
+ 1: SD_LOAD_BALANCE: Do load balancing on this domain
- 2: SD_BALANCE_NEWIDLE: Balance when about to become idle
+ 4: SD_BALANCE_EXEC: Balance on exec
+ 8: SD_BALANCE_FORK: Balance on fork, clone
- 16: SD_WAKE_IDLE: Wake to idle CPU on task wakeup
+ 32: SD_WAKE_AFFINE: Wake task to waking CPU
+ 64: SD_WAKE_BALANCE: Perform balancing at task wakeup
+1024: SD_SERIALIZE: Only a single load balancing instance
it's a 4x4 Opteron testbox.
Ingo
View attachment "config" of type "text/plain" (65209 bytes)
Powered by blists - more mailing lists