lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 10 Feb 2011 13:39:37 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	raz ben yehuda <raz@...lemp.com>
Cc:	linux-kernel@...r.kernel.org, mingo@...hat.com,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mike Galbraith <efault@....de>, Jack Steiner <steiner@....com>,
	Cliff Wickman <cpw@....com>, Mike Travis <travis@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [BUG] soft lockup while booting machine with more than 700 cores


* raz ben yehuda <raz@...lemp.com> wrote:

> Mingo Hello
> 
> Bellow is a boot of a 2.6.32.19 kernel over a machine with more than 700 cores. I 
> am failing to boot it due to a soft lockup in rebalance_domains area. I did not 
> find anything related in mainline git and kernel's bugzilla.
> 
> thank you
> Raz
> 
> 
>  [  929.614315] TCP cubic registered 
>  [  929.614577] NET: Registered protocol family 17 
>  [  930.785915] Bridge firewalling registered 
>  [  930.928396] Freeing unused kernel memory: 1380k freed 
>  =============================================================================== 
>  Running /disklessrc 
>  Mounting /proc 
>  Creating /dev 
>  Creating initial device nodes 
> [  931.327841] usb 5-1: configuration #1 chosen from 1 choice 
> [  931.657469] input: HP Virtual Keyboard as /class/input/input0 
> [  931.671560] generic-usb 0003:03F0:1027.0001: input: USB HID v1.01 Keyboard [H
> P Virtual Keyboard] on usb-0000:01:04.0-1/input0 
>  [  931.911480] input: HP Virtual Keyboard as /class/input/input1 
>  [  931.926135] generic-usb 0003:03F0:1027.0002: input: USB HID v1.01 Mouse [HP V
>  irtual Keyboard] on usb-0000:01:04.0-1/input1 
>  [  932.247432] scsi 0:0:0:0: Direct-Access     Generic  USB Flash Disk   0.00 PQ
>  : 0 ANSI: 2 
>  [  932.301626] sd 0:0:0:0: Attached scsi generic sg0 type 0 
>  [  932.416279] sd 0:0:0:0: [sda] 7892992 512-byte logical blocks: (4.04 GB/3.76 
>  GiB) 
>  [  932.559424] sd 0:0:0:0: [sda] Write Protect is off 
>  [  932.563238] sd 0:0:0:0: [sda] Assuming drive cache: write through 
>  [  932.802006] sd 0:0:0:0: [sda] Assuming drive cache: write through 
>  [  932.805070]  sda: sda1 
>  [  934.315071] sd 0:0:0:0: [sda] Assuming drive cache: write through 
>  [  934.318055] sd 0:0:0:0: [sda] Attached SCSI removable disk 
>  Loading nfs module... [ 1011.681028] BUG: soft lockup - CPU#240 stuck for 62s! [
>  swapper:0] 
>  [ 1011.744482] Modules linked in: sunrpc(+) 
>  [ 1011.789117] CPU 240: 
>  [ 1011.828757] Modules linked in: sunrpc(+) 
>  [ 1011.874003] Pid: 0, comm: swapper Not tainted 2.6.32.19-3.vSMP #2 vSMP 3.5 
>  [ 1011.935843] RIP: 0010:[<ffffffff8105ac32>]  [<ffffffff8105ac32>] weighted_cpu
>  load+0x12/0x20 
>  [ 1012.051597] RSP: 0018:ffff89468e803c88  EFLAGS: 00010286 
>  [ 1012.115020] RAX: 00000000000115c0 RBX: 0000000000000002 RCX: 000000000000001d
>  [ 1012.162897] RDX: ffff8acd2e840000 RSI: 0000000000000002 RDI: 000000000000021d
>  [ 1012.243858] RBP: ffffffff81033133 R08: 0000000000000200 R09: ffff894f0ca3d450
>  [ 1012.309760] R10: 0000000000000000 R11: ffff89468e803dc0 R12: ffff89468e803c00
>  [ 1012.358023] R13: 00000000000115c0 R14: ffffffff8104b6dc R15: ffffffff81046ea6
>  [ 1012.417072] FS:  0000000000000000(0000) GS:ffff89468e800000(0000) knlGS:00000
>  00000000000 
>  [ 1012.494488] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b 
>  [ 1012.559412] CR2: 00000000008d3988 CR3: 0000000001001000 CR4: 00000000000026e0
>  [ 1012.619828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  [ 1012.675491] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
>  [ 1012.739386] Call Trace: 
>  [ 1012.790082]  <IRQ>  [<ffffffff81039705>] ? sched_clock+0x5/0x10 
>  [ 1012.868687]  [<ffffffff8105ac6b>] ? source_load+0x2b/0x70 
>  [ 1012.923473]  [<ffffffff810602d5>] ? find_busiest_group+0x1b5/0xa30 
>  [ 1012.973482]  [<ffffffff81063487>] ? rebalance_domains+0x117/0x470 
>  [ 1013.031838]  [<ffffffff81065a4e>] ? run_rebalance_domains+0x3e/0xe0 
>  [ 1013.081837]  [<ffffffff8106fbbe>] ? __do_softirq+0xae/0x140 
>  [ 1013.134496]  [<ffffffff81085da0>] ? ktime_get+0x50/0xd0 
> [ 1013.182834]  [<ffffffff8103374c>] ? call_softirq+0x1c/0x30 
>  [ 1013.246263]  [<ffffffff81035745>] ? do_softirq+0x65/0xa0 
>  [ 1013.314801]  [<ffffffff8106fb0c>] ? irq_exit+0x7c/0x80 
>  [ 1013.355605]  [<ffffffff81046eab>] ? smp_apic_timer_interrupt+0x6b/0xa0 
>  [ 1013.391166]  [<ffffffff8104b6dc>] ? native_apic_msr_write+0x2c/0x40 
>  [ 1013.391166]  [<ffffffff81033133>] ? apic_timer_interrupt+0x13/0x20 
>  [ 1013.478307]  <EOI>  [<ffffffff8104dc92>] ? native_safe_halt+0x2/0x10 
>  [ 1013.515916]  [<ffffffff8103a481>] ? default_idle+0x21/0x40 
>  [ 1013.572168]  [<ffffffff81031537>] ? cpu_idle+0x57/0x90 
>  [ 1112.445978] BUG: soft lockup - CPU#240 stuck for 62s! [swapper:0] 
>  [ 1112.445978] Modules linked in: sunrpc(+) 

Interesting.

Could you boot up with just enough cores for it to not lock up, and run perf top and 
see where the overhead is?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ