linux-kernel - Re: frequent lockups in 3.18rc4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141120161055.GA8309@redhat.com>
Date:	Thu, 20 Nov 2014 11:10:55 -0500
From:	Dave Jones <davej@...hat.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Don Zickus <dzickus@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	the arch/x86 maintainers <x86@...nel.org>,
	WANG Chao <chaowang@...hat.com>, Baoquan He <bhe@...hat.com>,
	Dave Young <dyoung@...hat.com>
Subject: Re: frequent lockups in 3.18rc4

On Wed, Nov 19, 2014 at 11:28:06AM -0500, Vivek Goyal wrote:
 
 > I am wondering may be in some cases we panic in second kernel and sit
 > there. Probably we should append a kernel command line automatically
 > say "panic=1" so that it reboots itself if second kernel panics.
 > 
 > By any chance, have you enabled "CONFIG_RANDOMIZE_BASE"? If yes, please
 > disable that as currently kexec/kdump stuff does not work with it. And
 > it hangs very early in the boot process and I had to hook serial console
 > to get following message on console.

I did have that enabled. (Perhaps the kconfig should conflict?)

After rebuilding without it, this..

 > > dracut: *** Stripping files done ***
 > > dracut: *** Store current command line parameters ***
 > > dracut: *** Creating image file ***
 > > dracut: *** Creating image file done ***
 > > kdumpctl: cat: write error: Broken pipe
 > > kdumpctl: kexec: failed to load kdump kernel
 > > kdumpctl: Starting kdump: [FAILED]
 
went away. It generated the image, and things looked good.
I did echo c > /proc/sysrq-trigger and got this..

SysRq : Trigger a crash
BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1192
in_atomic(): 0, irqs_disabled(): 0, pid: 8860, name: bash
3 locks held by bash/8860:
 #0:  (sb_writers#5){......}, at: [<ffffffff811eac13>] vfs_write+0x1b3/0x1f0
 #1:  (rcu_read_lock){......}, at: [<ffffffff8144a435>] __handle_sysrq+0x5/0x1b0
 #2:  (&mm->mmap_sem){......}, at: [<ffffffff8103cb20>] __do_page_fault+0x140/0x600
Preemption disabled at:[<ffffffff817ca332>] printk+0x5c/0x72

CPU: 1 PID: 8860 Comm: bash Not tainted 3.18.0-rc5+ #95 [loadavg: 0.54 0.24 0.09 2/143 8909]
 00000000000004a8 00000000e1f75c1b ffff880236473c28 ffffffff817ce5c7
 0000000000000000 0000000000000000 ffff880236473c58 ffffffff8109af8a
 ffff880236473c58 0000000000000029 0000000000000000 ffff880236473d88
Call Trace:
 [<ffffffff817ce5c7>] dump_stack+0x4f/0x7c
 [<ffffffff8109af8a>] __might_sleep+0x12a/0x190
 [<ffffffff8103cb3b>] __do_page_fault+0x15b/0x600
 [<ffffffff811613b2>] ? irq_work_queue+0x62/0xd0
 [<ffffffff8137ad7d>] ? trace_hardirqs_off_thunk+0x3a/0x3f
 [<ffffffff8103cfec>] do_page_fault+0xc/0x10
 [<ffffffff817dbcf2>] page_fault+0x22/0x30
 [<ffffffff817ca332>] ? printk+0x5c/0x72
 [<ffffffff81449ce6>] ? sysrq_handle_crash+0x16/0x20
 [<ffffffff8144a567>] __handle_sysrq+0x137/0x1b0
 [<ffffffff8144a435>] ? __handle_sysrq+0x5/0x1b0
 [<ffffffff8144aa4a>] write_sysrq_trigger+0x4a/0x50
 [<ffffffff81259f2d>] proc_reg_write+0x3d/0x80
 [<ffffffff811eab1a>] vfs_write+0xba/0x1f0
 [<ffffffff811eb628>] SyS_write+0x58/0xd0
 [<ffffffff817da052>] system_call_fastpath+0x12/0x17
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 8860 Comm: bash Not tainted 3.18.0-rc5+ #95 [loadavg: 0.54 0.24 0.09 1/143 8909]
task: ffff8800a1a60000 ti: ffff880236470000 task.ti: ffff880236470000
RIP: 0010:[<ffffffff81449ce6>]  [<ffffffff81449ce6>] sysrq_handle_crash+0x16/0x20
RSP: 0018:ffff880236473e38  EFLAGS: 00010246
RAX: 000000000000000f RBX: ffffffff81cb4a00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff817ca332 RDI: 0000000000000063
RBP: ffff880236473e38 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000358 R11: 0000000000000357 R12: 0000000000000063
R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000
FS:  00007fc652f4e740(0000) GS:ffff880244200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000023a3b2000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffff880236473e78 ffffffff8144a567 ffffffff8144a435 0000000000000002
 0000000000000002 00007fc652f51000 0000000000000002 ffff880236473f48
 ffff880236473ea8 ffffffff8144aa4a 0000000000000002 00007fc652f51000
Call Trace:
 [<ffffffff8144a567>] __handle_sysrq+0x137/0x1b0
 [<ffffffff8144a435>] ? __handle_sysrq+0x5/0x1b0
 [<ffffffff8144aa4a>] write_sysrq_trigger+0x4a/0x50
 [<ffffffff81259f2d>] proc_reg_write+0x3d/0x80
 [<ffffffff811eab1a>] vfs_write+0xba/0x1f0
 [<ffffffff811eb628>] SyS_write+0x58/0xd0
 [<ffffffff817da052>] system_call_fastpath+0x12/0x17
Code: 01 f4 45 39 a5 b4 00 00 00 75 e2 4c 89 ef e8 d2 f7 ff ff eb d8 0f 1f 44 00 00 55 c7 05 08 b7 7e 00 01 00 00 00 48 89 e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 48 89 e5 
RIP  [<ffffffff81449ce6>] sysrq_handle_crash+0x16/0x20
 RSP <ffff880236473e38>
CR2: 0000000000000000

Which, asides from the sleeping while atomic thing which isn't important,
does what I expected.  Shortly later, it rebooted.

And then /var/crash was empty.

	Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/