lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 4 Jun 2012 20:35:30 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Gleb Natapov <gleb@...hat.com>
Cc:	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	Linux Memory Management List <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: kvm segfaults and bad page state in 3.4.0

Hi Gleb,

On Mon, Jun 04, 2012 at 02:56:50PM +0300, Gleb Natapov wrote:
> On Mon, Jun 04, 2012 at 07:46:03PM +0800, Fengguang Wu wrote:
> > Hi,
> > 
> > I'm running lots of kvm instances for doing kernel boot tests.
> > Unfortunately the test system itself is not stable enough, I got scary
> > errors in both kvm and the host kernel. Like this. 
> > 
> What do you mean by "in both kvm and the host kernel". Do you have

I mean the host side's kvm user space process and kernel seem to have problems.

> similar Oopses inside your guests? If yes can you post one?

There are all kinds of problems in the guest kernel, too. Probably I
built in too many modules (take a debian config and s/=m/=y/) and
enabled too many debug options. Many of the bugs I ran into have
already been reported by others in LKML. Here are more weird things.

Kernel freeze in the rcu-torture test:

(x86_64-nfsroot config)

[    4.584367] Initializing RT-Tester: OK                                                          
[    4.598820] audit: initializing netlink socket (disabled)
[    4.616364] type=2000 audit(1338804588.615:1): initialized
[    4.695469] Kprobe smoke test started                     
[    4.726854] Kprobe smoke test passed successfully
[    4.745166] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
+irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 onoff_interval=0
+onoff_holdoff=0   

iput_final() on mtdchar_close(), somehow after unclean umount:

(x86_64-allyesdebian config)

[  277.167130] VFS: Busy inodes after unmount of mtd_inodefs. Self-destruct in 5 seconds.  Have a nice day...
[  277.167202] VFS: Busy inodes after unmount of mtd_inodefs. Self-destruct in 5 seconds.  Have a nice day...
[  277.175814] BUG: unable to handle kernel paging request at ffff88000ad0c820
[  277.175818] IP: [<ffffffff811a6fe2>] iput_final+0x1d/0x17c
[  277.175825] PGD 4015063 PUD 4019063 PMD 17f91067 PTE ad0c160
[  277.175830] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[  277.175834] CPU 0 
[  277.175835] Modules linked in:
[  277.175837] 
[  277.175839] Pid: 3071, comm: mtd_probe Not tainted 3.4.0-rc1+ #1 Bochs Bochs
[  277.175843] RIP: 0010:[<ffffffff811a6fe2>]  [<ffffffff811a6fe2>] iput_final+0x1d/0x17c
[  277.175847] RSP: 0018:ffff880005ee3d38  EFLAGS: 00010246
[  277.175849] RAX: 0000000000000001 RBX: ffff8800087b7be8 RCX: 0000000000000001
[  277.175864] RDX: 8c6318c6318c6301 RSI: ffff8800087b7c80 RDI: ffff8800087b7be8
[  277.175866] RBP: ffff880005ee3d58 R08: 0000000000000000 R09: 0000000000004c86
[  277.175868] R10: ffffffff816ce939 R11: ffff8800087b7c80 R12: ffff88000ad0c7f0
[  277.175870] R13: ffff88000dd347f0 R14: ffff88000ae56be8 R15: ffff880009b95ee8
[  277.175873] FS:  00007f7b52261700(0000) GS:ffff880017400000(0000) knlGS:0000000000000000
[  277.175876] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  277.175878] CR2: ffff88000ad0c820 CR3: 0000000007c1a000 CR4: 00000000000006f0
[  277.175884] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  277.175889] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  277.175892] Process mtd_probe (pid: 3071, threadinfo ffff880005ee2000, task ffff8800004d8740)
[  277.175894] Stack:
[  277.175895]  0000000000000001 ffff8800087b7be8 ffff880005e4ae90 ffff88000dd347f0
[  277.175900]  ffff880005ee3d78 ffffffff811a717e 0000000000000000 ffff880007cac730
[  277.175904]  ffff880005ee3da8 ffffffff81ccf9ea ffff88000ae56be8 ffff880005e4ae90
[  277.175908] Call Trace:
[  277.175912]  [<ffffffff811a717e>] iput+0x3d/0x41
[  277.175918]  [<ffffffff81ccf9ea>] mtdchar_close+0x3f/0x78
[  277.175922]  [<ffffffff81192d7b>] __fput+0x117/0x1b8
[  277.175925]  [<ffffffff81192e36>] fput+0x1a/0x1c
[  277.175929]  [<ffffffff8118fba9>] filp_close+0x70/0x7b
[  277.175933]  [<ffffffff810958c4>] close_files+0xc6/0xff
[  277.175936]  [<ffffffff81095960>] put_files_struct.part.8+0x14/0xa9
[  277.175940]  [<ffffffff810967fb>] put_files_struct+0x1f/0x23
[  277.175943]  [<ffffffff81096899>] exit_files+0x48/0x50
[  277.175946]  [<ffffffff81096e2b>] do_exit+0x2c2/0x463
[  277.175951]  [<ffffffff810e2d60>] ? trace_hardirqs_on_caller+0x12d/0x164
[  277.175955]  [<ffffffff8109719b>] do_group_exit+0x88/0xb6
[  277.175958]  [<ffffffff810971e0>] sys_exit_group+0x17/0x17
[  277.175963]  [<ffffffff82f28e29>] system_call_fastpath+0x16/0x1b
[  277.175965] Code: df e8 98 fd ff ff 58 5b 41 5c 41 5d 5d c3 55 48 89 e5 41 55 41 54 53 50 66 66 66 66 90 f6 87 e0 00 00 00 08 4c 8b 67 28 48 89 fb <4d> 8b 6c 24 30 74 11 be 61 05 00 00 48 c7 c7 9e f6 a6 83 e8 1b 
[  277.176016] RIP  [<ffffffff811a6fe2>] iput_final+0x1d/0x17c
[  277.176020]  RSP <ffff880005ee3d38>
[  277.176020] CR2: ffff88000ad0c820
[  277.176020] ---[ end trace 7383ee643f887e33 ]---
[  277.176020] Fixing recursive fault but reboot is needed!
[  277.176020] BUG: scheduling while atomic: mtd_probe/3071/0x00000002
[  277.176020] INFO: lockdep is turned off.
[  277.176020] Modules linked in:
[  277.176020] irq event stamp: 3696
[  277.176020] hardirqs last  enabled at (3695): [<ffffffff811111d6>] __call_rcu+0x2fe/0x316
[  277.176020] hardirqs last disabled at (3696): [<ffffffff82f220f6>] error_sti+0x5/0x6
[  277.176020] softirqs last  enabled at (2044): [<ffffffff810992fe>] __do_softirq+0x239/0x24f
[  277.176020] softirqs last disabled at (1965): [<ffffffff82f2a36c>] call_softirq+0x1c/0x30
[  277.176020] Pid: 3071, comm: mtd_probe Tainted: G      D      3.4.0-rc1+ #1
[  277.176020] Call Trace:
[  277.176020]  [<ffffffff810e1cb7>] ? print_irqtrace_events+0x9d/0xa1
[  277.176020]  [<ffffffff82e754f4>] __schedule_bug+0x66/0x6a
[  277.176020]  [<ffffffff82f1fb6c>] __schedule+0x8c/0x59a
[  277.176020]  [<ffffffff82f20303>] schedule+0x64/0x66
[  277.176020]  [<ffffffff81096c7a>] do_exit+0x111/0x463
[  277.176020]  [<ffffffff82f22965>] oops_end+0xbe/0xc6
[  277.176020]  [<ffffffff82e73d70>] no_context+0x1ae/0x1bd
[  277.176020]  [<ffffffff82e73f5d>] __bad_area_nosemaphore+0x1de/0x1ff
[  277.176020]  [<ffffffff82e73548>] ? pte_offset_kernel+0x14/0x3a
[  277.176020]  [<ffffffff82e73f91>] bad_area_nosemaphore+0x13/0x15
[  277.176020]  [<ffffffff82f24f5b>] do_page_fault+0x1bc/0x390
[  277.176020]  [<ffffffff810c4a39>] ? local_clock+0x36/0x4d
[  277.176020]  [<ffffffff82f220f6>] ? error_sti+0x5/0x6
[  277.176020]  [<ffffffff81127c5d>] ? time_hardirqs_off+0x26/0x29
[  277.176020]  [<ffffffff82f220f6>] ? error_sti+0x5/0x6
[  277.176020]  [<ffffffff810df07b>] ? trace_hardirqs_off_caller+0x3f/0x9c
[  277.176020]  [<ffffffff816da04d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[  277.176020]  [<ffffffff82f24a33>] do_async_page_fault+0x31/0x4f
[  277.176020]  [<ffffffff82f21ee5>] async_page_fault+0x25/0x30
[  277.176020]  [<ffffffff816ce939>] ? _atomic_dec_and_lock+0x2d/0x4c
[  277.176020]  [<ffffffff811a6fe2>] ? iput_final+0x1d/0x17c
[  277.176020]  [<ffffffff811a717e>] iput+0x3d/0x41
[  277.176020]  [<ffffffff81ccf9ea>] mtdchar_close+0x3f/0x78
[  277.176020]  [<ffffffff81192d7b>] __fput+0x117/0x1b8
[  277.176020]  [<ffffffff81192e36>] fput+0x1a/0x1c
[  277.176020]  [<ffffffff8118fba9>] filp_close+0x70/0x7b
[  277.176020]  [<ffffffff810958c4>] close_files+0xc6/0xff
[  277.176020]  [<ffffffff81095960>] put_files_struct.part.8+0x14/0xa9
[  277.176020]  [<ffffffff810967fb>] put_files_struct+0x1f/0x23
[  277.176020]  [<ffffffff81096899>] exit_files+0x48/0x50
[  277.176020]  [<ffffffff81096e2b>] do_exit+0x2c2/0x463
[  277.176020]  [<ffffffff810e2d60>] ? trace_hardirqs_on_caller+0x12d/0x164
[  277.176020]  [<ffffffff8109719b>] do_group_exit+0x88/0xb6
[  277.176020]  [<ffffffff810971e0>] sys_exit_group+0x17/0x17
[  277.176020]  [<ffffffff82f28e29>] system_call_fastpath+0x16/0x1b
[  277.176265] VFS: Busy inodes after unmount of mtd_inodefs. Self-destruct in 5 seconds.  Have a nice day...
[  277.184584] BUG: unable to handle kernel paging request at ffff88000ac7d820
[  277.184588] IP: [<ffffffff811a6fe2>] iput_final+0x1d/0x17c
[  277.184594] PGD 4015063 PUD 4019063 PMD 17f91067 PTE ac7d160

This looks like a real bug in some device driver:

(x86_64-allyesdebian config)

[  503.061353] BUG: unable to handle kernel NULL pointer dereference at 0000000000000015
[  503.065177] IP: [<ffffffff81fb6325>] usb_hcd_flush_endpoint+0x2f/0xec
[  503.065177] PGD 0 
[  503.065177] Oops: 0000 [#1] SMP 
[  503.065177] CPU 0 
[  503.065177] Modules linked in:
[  503.065177] 
[  503.065177] Pid: 20, comm: kworker/0:1 Not tainted 3.4.0-rc7+ #320 Bochs Bochs
[  503.065177] RIP: 0010:[<ffffffff81fb6325>]  [<ffffffff81fb6325>] usb_hcd_flush_endpoint+0x2f/0xec
[  503.065177] RSP: 0018:ffff88000e013d20  EFLAGS: 00010046
[  503.065177] RAX: 0000000000380038 RBX: ffff88000ace0000 RCX: 0000000000000000
[  503.065177] RDX: 0000000000000038 RSI: ffff88000ad001c0 RDI: ffffffff83e1d86c
[  503.065177] RBP: 0000000000000005 R08: ffff88000e012000 R09: 0000000000000007
[  503.065177] R10: 000000000000086e R11: 0000000000000000 R12: 0000000000000015
[  503.065177] R13: 0000000600000005 R14: 0000000000000001 R15: 0000000000000400
[  503.065177] FS:  0000000000000000(0000) GS:ffff88000f400000(0000) knlGS:0000000000000000
[  503.065177] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  503.065177] CR2: 0000000000000015 CR3: 0000000003813000 CR4: 00000000000006f0
[  503.065177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  503.065177] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  503.065177] Process kworker/0:1 (pid: 20, threadinfo ffff88000e012000, task ffff88000f3ea140)
[  503.065177] Stack:
[  503.065177]  ffff88000ace0000 ffff88000ace0000 ffff88000ace0008 ffff88000ace0080
[  503.065177]  0000000000000400 ffffffff81fb969d 000000000000001d ffff88000ace8400
[  503.065177]  0000000000000720 ffff88000acea888 ffff88000acea800 ffff88000e013df8
[  503.065177] Call Trace:
[  503.065177]  [<ffffffff81fb969d>] ? usb_suspend_both+0x155/0x184
[  503.065177]  [<ffffffff81fba2be>] ? usb_runtime_suspend+0x27/0x54
[  503.065177]  [<ffffffff81fba297>] ? usb_probe_device+0x33/0x33
[  503.065177]  [<ffffffff818bfe62>] ? __rpm_callback+0x2b/0x4f
[  503.065177]  [<ffffffff818c034b>] ? rpm_suspend+0x2f1/0x47c
[  503.065177]  [<ffffffff81086379>] ? mod_timer+0x7b/0x89
[  503.065177]  [<ffffffff818c109c>] ? pm_schedule_suspend+0xab/0xab
[  503.065177]  [<ffffffff818c110d>] ? pm_runtime_work+0x71/0x8c
[  503.065177]  [<ffffffff8108f6c3>] ? process_one_work+0x125/0x296
[  503.065177]  [<ffffffff8108fa8b>] ? worker_thread+0xc4/0x164
[  503.065177]  [<ffffffff8108f9c7>] ? rescuer_thread+0x16e/0x16e
[  503.065177]  [<ffffffff8109340f>] ? kthread+0x7d/0x85
[  503.065177]  [<ffffffff82afac64>] ? kernel_thread_helper+0x4/0x10
[  503.065177]  [<ffffffff81093392>] ? flush_kthread_worker+0x92/0x92
[  503.065177]  [<ffffffff82afac60>] ? gs_change+0x13/0x13
[  503.065177] Code: f6 41 54 55 48 89 f5 53 53 0f 84 d0 00 00 00 4c 8d 65 10 48 89 fb e8 b2 d9 b3 00 48 c7 c7 6c d8 e1 83 4c 8b 6b 40 e8 28 e4 b3 00 <48> 8b 5d 10 eb 0a 83 7b 18 00 74 0a 48 8b 5b 20 48 83 eb 20 eb 
[  503.065177] RIP  [<ffffffff81fb6325>] usb_hcd_flush_endpoint+0x2f/0xec
[  503.065177]  RSP <ffff88000e013d20>
[  503.065177] CR2: 0000000000000015
[  503.065177] ---[ end trace e6f1441a2b05b7c3 ]---

Another trace:

[  109.687421] intel_oaktrail: Platform not recognized (You could try the module's force-parameter)
[  109.687421] ieee802154hardmac ieee802154hardmac: Added ieee802154 HardMAC hardware
[  109.974115] vhci_hcd: changed 0
[  110.046770] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[  110.050713] CPU 0
[  110.050713] Modules linked in:
[  110.050713]
[  110.050713] Pid: 20, comm: kworker/0:1 Not tainted 3.5.0-rc1+ #13 Bochs Bochs
[  110.050713] RIP: 0010:[<ffffffff822b1d35>]  [<ffffffff822b1d35>] usb_hcd_flush_endpoint+0x4c/0x10b
[  110.050713] RSP: 0018:ffff880015811c40  EFLAGS: 00010046
[  110.050713] RAX: ffffffff822b1d35 RBX: ffff88000bde0000 RCX: 0000000000000001
[  110.050713] RDX: ffff88000bde0000 RSI: ffffffff841d76e8 RDI: 0000000000000046
[  110.050713] RBP: ffff880015811c60 R08: ffffffff84a434b0 R09: ffffffff84a434b0
[  110.050713] R10: ffffffff822b1d35 R11: ffffffff841d76e8 R12: 5a5a0030305f7065
[  110.050713] R13: 5a5a0030305f7075 R14: fffffffdfffffffd R15: 0000000000000001
[  110.050713] FS:  0000000000000000(0000) GS:ffff880017400000(0000) knlGS:0000000000000000
[  110.050713] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  110.050713] CR2: 0000000000000000 CR3: 0000000004014000 CR4: 00000000000006f0
[  110.050713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  110.050713] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  110.050713] Process kworker/0:1 (pid: 20, threadinfo ffff880015810000, task ffff88001580c500)
[  110.050713] Stack:
[  110.050713]  ffff88000bde0000 ffff88000bde0000 ffff88000bde0080 0000000000000400
[  110.050713]  ffff880015811cb0 ffffffff822b5b46 ffff88000be2d7f0 0000040015811d40
[  110.050713]  ffff880015811ca0 ffff88000bde2878 ffff88000bde27f0 ffff880015811d28
[  110.050713] Call Trace:
[  110.050713] Call Trace:
[  110.050713]  [<ffffffff822b5b46>] usb_suspend_both+0x16f/0x19d
[  110.050713]  [<ffffffff822b64b2>] usb_runtime_suspend+0x36/0x5d
[  110.050713]  [<ffffffff822b647c>] ? usb_probe_device+0x37/0x37
[  110.050713]  [<ffffffff81a14ca5>] __rpm_callback+0x34/0x5b
[  110.050713]  [<ffffffff81a1538a>] rpm_suspend+0x2ff/0x49d
[  110.050713]  [<ffffffff810e1a7b>] ? lock_acquired+0x5f/0x68
[  110.050713]  [<ffffffff81a16283>] ? pm_runtime_work+0x1e/0xa1
[  110.050713]  [<ffffffff81a162ed>] pm_runtime_work+0x88/0xa1
[  110.050713]  [<ffffffff810ac446>] process_one_work+0x240/0x401
[  110.050713]  [<ffffffff810ac36b>] ? process_one_work+0x165/0x401
[  110.050713]  [<ffffffff810ad05c>] ? worker_thread+0x38/0x15f
[  110.050713]  [<ffffffff81a16265>] ? pm_schedule_suspend+0xb1/0xb1
[  110.050713]  [<ffffffff810ad0ff>] worker_thread+0xdb/0x15f
[  110.050713]  [<ffffffff810ad024>] ? manage_workers.isra.24+0xaa/0xaa
[  110.050713]  [<ffffffff810b3f67>] kthread+0xaf/0xb7
[  110.050713]  [<ffffffff816f4b5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  110.050713]  [<ffffffff82f8af74>] kernel_thread_helper+0x4/0x10
[  110.050713]  [<ffffffff82f828b0>] ? retint_restore_args+0x13/0x13
[  110.050713]  [<ffffffff810b3eb8>] ? __init_kthread_worker+0x5a/0x5a
[  110.050713]  [<ffffffff82f8af70>] ? gs_change+0x13/0x13
[  110.050713] Code: be 5b 06 00 00 48 c7 c7 81 40 cd 83 4d 8d 6c 24 10 e8 1c e1 e0 fe e8 0f f0 cc 00 48 c7 c7 d0 76 1d 84 4c 8b 73 40 e8 a6 fe cc 00 <49> 8b 5c 24 10 eb 0a 83 7b 18 00 74 13 48 8b 5b 20 48 83 eb 20
[  110.050713] RIP  [<ffffffff822b1d35>] usb_hcd_flush_endpoint+0x4c/0x10b
[  110.050713]  RSP <ffff880015811c40>
[  110.050713] ---[ end trace 3ffa21620df1b13a ]---
[  110.050713] note: kworker/0:1[20] exited with preempt_count 1
[  114.071820] BUG: scheduling while atomic: kworker/0:1/20/0x10000002
[  114.132169] INFO: lockdep is turned off.
[  114.192540] Modules linked in:
[  114.252134] Pid: 20, comm: kworker/0:1 Tainted: G      D      3.5.0-rc1+ #13
[  114.315119] Call Trace:
[  114.376805]  [<ffffffff82ed365d>] __schedule_bug+0x66/0x75
[  114.440581]  [<ffffffff82f807e4>] __schedule+0x8c/0x599
[  114.504267]  [<ffffffff811ba97a>] ? exit_fs+0x3a/0x75
[  114.568691]  [<ffffffff810c252b>] __cond_resched+0x27/0x32
[  114.633880]  [<ffffffff82f80d4d>] _cond_resched+0x19/0x22
[  114.698817]  [<ffffffff82f8000b>] mutex_lock_nested+0x2a/0x45
[  114.764387]  [<ffffffff811407d8>] perf_event_exit_task+0x2a/0x9d
[  114.830649]  [<ffffffff81097a12>] do_exit+0x360/0x477
[  114.896676]  [<ffffffff82f83627>] oops_end+0xbe/0xc6
[  114.962852]  [<ffffffff8104d072>] die+0x5a/0x64
[  115.028695]  [<ffffffff82f83228>] do_general_protection+0x131/0x13a
[  115.096531]  [<ffffffff82f82b25>] general_protection+0x25/0x30
[  115.164060]  [<ffffffff822b1d35>] ? usb_hcd_flush_endpoint+0x4c/0x10b
[  115.232190]  [<ffffffff822b1d35>] ? usb_hcd_flush_endpoint+0x4c/0x10b
[  115.298747]  [<ffffffff822b1d35>] ? usb_hcd_flush_endpoint+0x4c/0x10b
[  115.364073]  [<ffffffff822b5b46>] usb_suspend_both+0x16f/0x19d
[  115.428984]  [<ffffffff822b64b2>] usb_runtime_suspend+0x36/0x5d
[  115.494194]  [<ffffffff822b647c>] ? usb_probe_device+0x37/0x37
[  115.558063]  [<ffffffff81a14ca5>] __rpm_callback+0x34/0x5b
[  115.620646]  [<ffffffff81a1538a>] rpm_suspend+0x2ff/0x49d
[  115.679620]  [<ffffffff810e1a7b>] ? lock_acquired+0x5f/0x68
[  115.737314]  [<ffffffff81a16283>] ? pm_runtime_work+0x1e/0xa1
[  115.794345]  [<ffffffff81a162ed>] pm_runtime_work+0x88/0xa1
[  115.851304]  [<ffffffff810ac446>] process_one_work+0x240/0x401
[  115.908428]  [<ffffffff810ac36b>] ? process_one_work+0x165/0x401
[  115.964883]  [<ffffffff810ad05c>] ? worker_thread+0x38/0x15f
[  116.020499]  [<ffffffff81a16265>] ? pm_schedule_suspend+0xb1/0xb1
[  116.077215]  [<ffffffff810ad0ff>] worker_thread+0xdb/0x15f
[  116.131873]  [<ffffffff810ad024>] ? manage_workers.isra.24+0xaa/0xaa
[  116.185794]  [<ffffffff810b3f67>] kthread+0xaf/0xb7
[  116.238823]  [<ffffffff816f4b5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  116.293488]  [<ffffffff82f8af74>] kernel_thread_helper+0x4/0x10
[  116.348421]  [<ffffffff82f828b0>] ? retint_restore_args+0x13/0x13
[  116.403763]  [<ffffffff810b3eb8>] ? __init_kthread_worker+0x5a/0x5a
[  116.459768]  [<ffffffff82f8af70>] ? gs_change+0x13/0x13

A funny bug triggered at kernel panic time:

(x86_64-allyesdebian config)

[ 1526.520230] ===============================                         
[ 1526.520230] [ INFO: suspicious RCU usage. ]                         
[ 1526.520230] 3.5.0-rc1+ #12 Not tainted                              
[ 1526.520230] -------------------------------                         
[ 1526.520230] /c/kernel-tests/mm/include/linux/rcupdate.h:436 Illegal context switch in RCU read-side critical section!
[ 1526.520230]
[ 1526.520230] other info that might help us debug this:
[ 1526.520230]
[ 1526.520230]
[ 1526.520230] rcu_scheduler_active = 1, debug_locks = 0
[ 1526.520230] 3 locks held by net.agent/3279:
[ 1526.520230]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff82f85962>] do_page_fault+0x193/0x390
[ 1526.520230]  #1:  (panic_lock){+.+...}, at: [<ffffffff82ed2830>] panic+0x37/0x1d3
[ 1526.520230]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff810b9b28>] rcu_lock_acquire+0x0/0x29
[ 1526.520230]
[ 1526.520230] stack backtrace:
[ 1526.520230] Pid: 3279, comm: net.agent Not tainted 3.5.0-rc1+ #12
[ 1526.520230] Call Trace:
[ 1526.520230]  [<ffffffff810e1570>] lockdep_rcu_suspicious+0x109/0x112
[ 1526.520230]  [<ffffffff810bfe3a>] rcu_preempt_sleep_check+0x45/0x47
[ 1526.520230]  [<ffffffff810bfe5a>] __might_sleep+0x1e/0x19a
[ 1526.520230]  [<ffffffff82f8010e>] down_write+0x26/0x81
[ 1526.520230]  [<ffffffff8276a966>] led_trigger_unregister+0x1f/0x9c
[ 1526.520230]  [<ffffffff8276def5>] heartbeat_reboot_notifier+0x15/0x19
[ 1526.520230]  [<ffffffff82f85bf5>] notifier_call_chain+0x96/0xcd
[ 1526.520230]  [<ffffffff82f85cba>] __atomic_notifier_call_chain+0x8e/0xff
[ 1526.520230]  [<ffffffff81094b7c>] ? kmsg_dump+0x37/0x1eb
[ 1526.520230]  [<ffffffff82f85d3f>] atomic_notifier_call_chain+0x14/0x16
[ 1526.520230]  [<ffffffff82ed28e1>] panic+0xe8/0x1d3
[ 1526.520230]  [<ffffffff811473e2>] out_of_memory+0x15d/0x1d3
[ 1526.520230]  [<ffffffff8114be63>] __alloc_pages_nodemask+0x5c2/0x762
[ 1526.520230]  [<ffffffff8117b479>] alloc_pages_current+0xd2/0xf3
[ 1526.520230]  [<ffffffff81143cf8>] __page_cache_alloc+0xa1/0xad
[ 1526.520230]  [<ffffffff81145d92>] filemap_fault+0x1f9/0x2f6
[ 1526.520230]  [<ffffffff8112a549>] ? time_hardirqs_off+0x26/0x29
[ 1526.520230]  [<ffffffff810e022a>] ? trace_hardirqs_off_caller+0x1f/0x9c
[ 1526.520230]  [<ffffffff811631ad>] __do_fault+0xc8/0x37d
[ 1526.520230]  [<ffffffff810e0841>] ? lock_release_holdtime.part.23+0x4e/0x55
[ 1526.520230]  [<ffffffff81165d21>] handle_pte_fault+0xf1/0x196
[ 1526.520230]  [<ffffffff81162acd>] ? pmd_offset+0x14/0x3a
[ 1526.520230]  [<ffffffff8116611f>] handle_mm_fault+0x1b1/0x1cb
[ 1526.520230]  [<ffffffff82f85b3a>] do_page_fault+0x36b/0x390
[ 1526.520230]  [<ffffffff816f485e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1526.520230]  [<ffffffff81165d54>] ? handle_pte_fault+0x124/0x196
[ 1526.520230]  [<ffffffff81165d54>] ? handle_pte_fault+0x124/0x196
[ 1526.520230]  [<ffffffff82f82aa6>] ? error_sti+0x5/0x6
[ 1526.520230]  [<ffffffff8112a549>] ? time_hardirqs_off+0x26/0x29
[ 1526.520230]  [<ffffffff82f82aa6>] ? error_sti+0x5/0x6
[ 1526.520230]  [<ffffffff810e024a>] ? trace_hardirqs_off_caller+0x3f/0x9c
[ 1526.520230]  [<ffffffff816f489d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1526.520230]  [<ffffffff82f85454>] do_async_page_fault+0x31/0x5e
[ 1526.520230]  [<ffffffff82f82885>] async_page_fault+0x25/0x30

Thanks,
Fengguang

View attachment "config-nfsroot" of type "text/plain" (88606 bytes)

View attachment "config-allyesdebian" of type "text/plain" (137149 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ