lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Thu, 22 Dec 2011 11:21:18 +1100
From:	Stephen Rothwell <sfr@...b.auug.org.au>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	linux-next@...r.kernel.org,
	ppc-dev <linuxppc-dev@...ts.ozlabs.org>,
	Alan Cox <alan@...rguk.ukuu.org.uk>
Subject: linux-next: boot failure for next-20111216

Hi all,

next-20111216 (and all later so far) fails to boot on a Power7 blade
server like this:

calling  .serial8250_init+0x0/0x1e0 @ 1
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
INFO: rcu_sched detected stalls on CPUs/tasks: { 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31} (detected by 5, t=6002 jiffies)
Call Trace:
[c0000003fd9e74e0] [c000000000015054] .show_stack+0x74/0x1c0 (unreliable)
[c0000003fd9e7590] [c00000000012ad2c] .__rcu_pending+0x49c/0x4b0
[c0000003fd9e7650] [c00000000012ae94] .rcu_check_callbacks+0x154/0x250
[c0000003fd9e76f0] [c0000000000b8ab4] .update_process_times+0x44/0xa0
[c0000003fd9e7780] [c0000000000fcdbc] .tick_sched_timer+0x7c/0x100
[c0000003fd9e7820] [c0000000000d5b38] .__run_hrtimer+0xb8/0x270
[c0000003fd9e78d0] [c0000000000d6068] .hrtimer_interrupt+0x138/0x2d0
[c0000003fd9e79e0] [c00000000001e4d0] .timer_interrupt+0x130/0x300
[c0000003fd9e7a90] [c0000000000038e4] decrementer_common+0x164/0x180
--- Exception: 901 at .arch_local_irq_restore+0xd8/0x140
    LR = .cpu_idle+0x180/0x220
[c0000003fd9e7d80] [c000000000c7c368] wireless_seq_fops+0x109a8/0x2f418 (unreliable)
[c0000003fd9e7e20] [c000000000017970] .cpu_idle+0x180/0x220
[c0000003fd9e7ed0] [c0000000008369e8] .start_secondary+0x3a4/0x3b0
[c0000003fd9e7f90] [c0000000000093f4] .start_secondary_prolog+0x10/0x14

There are quite a few more rcu stalls after this and the boot never
completes (at least not before our automated test system steps in and
reboots the server).

So the obvious first place to point the finger would be changes to the
8250 code ... but there is nothing too obvious there.

-- 
Cheers,
Stephen Rothwell                    sfr@...b.auug.org.au
http://www.canb.auug.org.au/~sfr/

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ