linux-kernel - [crash] Re: [RFC][PATCH 0/8] load-balancing and cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090904092742.GA11014@elte.hu>
Date:	Fri, 4 Sep 2009 11:27:42 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Andreas Herrmann <andreas.herrmann3@....com>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	linux-kernel@...r.kernel.org, Gautham R Shenoy <ego@...ibm.com>,
	Balbir Singh <balbir@...ibm.com>
Subject: [crash] Re: [RFC][PATCH 0/8] load-balancing and cpu_power -v2


i've queued up Peter's patches, with your and Gautham's fixes 
embedded. It works mostly fine - except on two larger boxes, where 
-tip stress-testing triggered this crash:

aldebaran login: [ 1774.088275] divide error: 0000 [#1] SMP 
[ 1774.092293] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
[ 1774.100355] CPU 13 
[ 1774.102498] Modules linked in:
[ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN
[ 1774.114807] RIP: 0010:[<ffffffff81041c38>]  [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4
[ 1774.123676] RSP: 0018:ffff880306c1fd58  EFLAGS: 00010246
[ 1774.129037] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1774.136287] RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000040
[ 1774.143554] RBP: ffff880306c1fde8 R08: 0000000000000000 R09: ffffc9000140f4c8
[ 1774.150748] R10: ffff88031288c650 R11: ffff880306c1fe08 R12: ffffc90001a0f3a0
[ 1774.158007] R13: ffffc9000140f4b0 R14: 0000000000000000 R15: 0000000000014f00
[ 1774.165248] FS:  0000000000000000(0000) GS:ffffc90001a00000(0063) knlGS:00000000f7f156c0
[ 1774.173473] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 1774.179320] CR2: 000000004822c0ac CR3: 000000031357b000 CR4: 00000000000006e0
[ 1774.186586] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1774.193826] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1774.201101] Process hackbench (pid: 30881, threadinfo ffff880306c1e000, task ffff88030b4aa710)
[ 1774.209836] Stack:
[ 1774.211861]  0000000000014f00 0000000000014f00 0000000000014f00 0000000000014f10
[ 1774.219300] <0> 0000000000014f10 0000000d00000008 ffff88030b4aa710 ffffc9000140f4c0
[ 1774.227129] <0> 0000000006c1fdd8 0000007d00000001 ffff88030b4aa900 ffffc9000100f4b0
[ 1774.235225] Call Trace:
[ 1774.237704]  [<ffffffff810444c4>] sched_fork+0x2c/0x15f
[ 1774.243032]  [<ffffffff8104942d>] copy_process+0x407/0xda7
[ 1774.248608]  [<ffffffff81049f16>] do_fork+0x149/0x309
[ 1774.253809]  [<ffffffff8120ff12>] ? __up_read+0x9e/0xa8
[ 1774.259101]  [<ffffffff8106df32>] ? up_read+0xe/0x10
[ 1774.264129]  [<ffffffff81565d66>] ? do_page_fault+0x291/0x2c3
[ 1774.269968]  [<ffffffff81035028>] sys32_clone+0x2c/0x2e
[ 1774.275267]  [<ffffffff81034d05>] ia32_ptregs_common+0x25/0x4c
[ 1774.281200] Code: cb 48 8b 7d a8 ff c2 be 40 00 00 00 48 63 d2 e8 73 9b 1c 00 3b 05 19 2e 8c 00 89 c2 7c 8d 41 8b 4d 08 48 c1 e3 0a 31 d2 48 89 d8 <48> f7 f1 83 7d b4 00 48 89 c1 75 16 4c 39 f0 73 0d 49 89 c6 48 
[ 1774.301423] RIP  [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4
[ 1774.307903]  RSP <ffff880306c1fd58>
[ 1774.311474] ---[ end trace a56b661c1598b0fc ]---
[ 1774.316202] Kernel panic - not syncing: Fatal exception
[ 1774.321497] Pid: 30881, comm: hackbench Tainted: G      D    2.6.31-rc8-tip-01308-g484d664-dirty #1629
[ 1774.330925] Call Trace:
[ 1774.333402]  [<ffffffff81561aaa>] panic+0x7a/0x125
[ 1774.338269]  [<ffffffff815645d2>] oops_end+0xaa/0xba
[ 1774.343321]  [<ffffffff8100f4f1>] die+0x5a/0x63
[ 1774.347887]  [<ffffffff81563ff6>] do_trap+0x110/0x11f
[ 1774.353052]  [<ffffffff8100d8ab>] do_divide_error+0x90/0x99
[ 1774.358691]  [<ffffffff81041c38>] ? sched_balance_self+0x19b/0x2d4
[ 1774.364966]  [<ffffffff810d8021>] ? zone_statistics+0x65/0x6a
[ 1774.370831]  [<ffffffff810cb2ef>] ? get_page_from_freelist+0x4a2/0x675
[ 1774.377487]  [<ffffffff8100cad5>] divide_error+0x15/0x20
[ 1774.382894]  [<ffffffff81041c38>] ? sched_balance_self+0x19b/0x2d4
[ 1774.389173]  [<ffffffff81041c21>] ? sched_balance_self+0x184/0x2d4
[ 1774.395479]  [<ffffffff810444c4>] sched_fork+0x2c/0x15f
[ 1774.400792]  [<ffffffff8104942d>] copy_process+0x407/0xda7
[ 1774.406397]  [<ffffffff81049f16>] do_fork+0x149/0x309
[ 1774.411562]  [<ffffffff8120ff12>] ? __up_read+0x9e/0xa8
[ 1774.416897]  [<ffffffff8106df32>] ? up_read+0xe/0x10
[ 1774.421957]  [<ffffffff81565d66>] ? do_page_fault+0x291/0x2c3
[ 1774.427815]  [<ffffffff81035028>] sys32_clone+0x2c/0x2e
[ 1774.433124]  [<ffffffff81034d05>] ia32_ptregs_common+0x25/0x4c

config attached as well.

the domain setup is this:

 SD flag: 4717
 +   1: SD_LOAD_BALANCE:          Do load balancing on this domain
 -   2: SD_BALANCE_NEWIDLE:       Balance when about to become idle
 +   4: SD_BALANCE_EXEC:          Balance on exec
 +   8: SD_BALANCE_FORK:          Balance on fork, clone
 -  16: SD_WAKE_IDLE:             Wake to idle CPU on task wakeup
 +  32: SD_WAKE_AFFINE:           Wake task to waking CPU
 +  64: SD_WAKE_BALANCE:          Perform balancing at task wakeup
 + 512: SD_SHARE_PKG_RESOURCES:   Domain members share cpu pkg resources
 current val on cpu0/domain1:
 SD flag: 1133
 +   1: SD_LOAD_BALANCE:          Do load balancing on this domain
 -   2: SD_BALANCE_NEWIDLE:       Balance when about to become idle
 +   4: SD_BALANCE_EXEC:          Balance on exec
 +   8: SD_BALANCE_FORK:          Balance on fork, clone
 -  16: SD_WAKE_IDLE:             Wake to idle CPU on task wakeup
 +  32: SD_WAKE_AFFINE:           Wake task to waking CPU
 +  64: SD_WAKE_BALANCE:          Perform balancing at task wakeup
 +1024: SD_SERIALIZE:             Only a single load balancing instance

it's a 4x4 Opteron testbox.

	Ingo

View attachment "config" of type "text/plain" (65209 bytes)