[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120604123530.GA7262@localhost>
Date: Mon, 4 Jun 2012 20:35:30 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Gleb Natapov <gleb@...hat.com>
Cc: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: kvm segfaults and bad page state in 3.4.0
Hi Gleb,
On Mon, Jun 04, 2012 at 02:56:50PM +0300, Gleb Natapov wrote:
> On Mon, Jun 04, 2012 at 07:46:03PM +0800, Fengguang Wu wrote:
> > Hi,
> >
> > I'm running lots of kvm instances for doing kernel boot tests.
> > Unfortunately the test system itself is not stable enough, I got scary
> > errors in both kvm and the host kernel. Like this.
> >
> What do you mean by "in both kvm and the host kernel". Do you have
I mean the host side's kvm user space process and kernel seem to have problems.
> similar Oopses inside your guests? If yes can you post one?
There are all kinds of problems in the guest kernel, too. Probably I
built in too many modules (take a debian config and s/=m/=y/) and
enabled too many debug options. Many of the bugs I ran into have
already been reported by others in LKML. Here are more weird things.
Kernel freeze in the rcu-torture test:
(x86_64-nfsroot config)
[ 4.584367] Initializing RT-Tester: OK
[ 4.598820] audit: initializing netlink socket (disabled)
[ 4.616364] type=2000 audit(1338804588.615:1): initialized
[ 4.695469] Kprobe smoke test started
[ 4.726854] Kprobe smoke test passed successfully
[ 4.745166] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5
+irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 onoff_interval=0
+onoff_holdoff=0
iput_final() on mtdchar_close(), somehow after unclean umount:
(x86_64-allyesdebian config)
[ 277.167130] VFS: Busy inodes after unmount of mtd_inodefs. Self-destruct in 5 seconds. Have a nice day...
[ 277.167202] VFS: Busy inodes after unmount of mtd_inodefs. Self-destruct in 5 seconds. Have a nice day...
[ 277.175814] BUG: unable to handle kernel paging request at ffff88000ad0c820
[ 277.175818] IP: [<ffffffff811a6fe2>] iput_final+0x1d/0x17c
[ 277.175825] PGD 4015063 PUD 4019063 PMD 17f91067 PTE ad0c160
[ 277.175830] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 277.175834] CPU 0
[ 277.175835] Modules linked in:
[ 277.175837]
[ 277.175839] Pid: 3071, comm: mtd_probe Not tainted 3.4.0-rc1+ #1 Bochs Bochs
[ 277.175843] RIP: 0010:[<ffffffff811a6fe2>] [<ffffffff811a6fe2>] iput_final+0x1d/0x17c
[ 277.175847] RSP: 0018:ffff880005ee3d38 EFLAGS: 00010246
[ 277.175849] RAX: 0000000000000001 RBX: ffff8800087b7be8 RCX: 0000000000000001
[ 277.175864] RDX: 8c6318c6318c6301 RSI: ffff8800087b7c80 RDI: ffff8800087b7be8
[ 277.175866] RBP: ffff880005ee3d58 R08: 0000000000000000 R09: 0000000000004c86
[ 277.175868] R10: ffffffff816ce939 R11: ffff8800087b7c80 R12: ffff88000ad0c7f0
[ 277.175870] R13: ffff88000dd347f0 R14: ffff88000ae56be8 R15: ffff880009b95ee8
[ 277.175873] FS: 00007f7b52261700(0000) GS:ffff880017400000(0000) knlGS:0000000000000000
[ 277.175876] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 277.175878] CR2: ffff88000ad0c820 CR3: 0000000007c1a000 CR4: 00000000000006f0
[ 277.175884] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 277.175889] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 277.175892] Process mtd_probe (pid: 3071, threadinfo ffff880005ee2000, task ffff8800004d8740)
[ 277.175894] Stack:
[ 277.175895] 0000000000000001 ffff8800087b7be8 ffff880005e4ae90 ffff88000dd347f0
[ 277.175900] ffff880005ee3d78 ffffffff811a717e 0000000000000000 ffff880007cac730
[ 277.175904] ffff880005ee3da8 ffffffff81ccf9ea ffff88000ae56be8 ffff880005e4ae90
[ 277.175908] Call Trace:
[ 277.175912] [<ffffffff811a717e>] iput+0x3d/0x41
[ 277.175918] [<ffffffff81ccf9ea>] mtdchar_close+0x3f/0x78
[ 277.175922] [<ffffffff81192d7b>] __fput+0x117/0x1b8
[ 277.175925] [<ffffffff81192e36>] fput+0x1a/0x1c
[ 277.175929] [<ffffffff8118fba9>] filp_close+0x70/0x7b
[ 277.175933] [<ffffffff810958c4>] close_files+0xc6/0xff
[ 277.175936] [<ffffffff81095960>] put_files_struct.part.8+0x14/0xa9
[ 277.175940] [<ffffffff810967fb>] put_files_struct+0x1f/0x23
[ 277.175943] [<ffffffff81096899>] exit_files+0x48/0x50
[ 277.175946] [<ffffffff81096e2b>] do_exit+0x2c2/0x463
[ 277.175951] [<ffffffff810e2d60>] ? trace_hardirqs_on_caller+0x12d/0x164
[ 277.175955] [<ffffffff8109719b>] do_group_exit+0x88/0xb6
[ 277.175958] [<ffffffff810971e0>] sys_exit_group+0x17/0x17
[ 277.175963] [<ffffffff82f28e29>] system_call_fastpath+0x16/0x1b
[ 277.175965] Code: df e8 98 fd ff ff 58 5b 41 5c 41 5d 5d c3 55 48 89 e5 41 55 41 54 53 50 66 66 66 66 90 f6 87 e0 00 00 00 08 4c 8b 67 28 48 89 fb <4d> 8b 6c 24 30 74 11 be 61 05 00 00 48 c7 c7 9e f6 a6 83 e8 1b
[ 277.176016] RIP [<ffffffff811a6fe2>] iput_final+0x1d/0x17c
[ 277.176020] RSP <ffff880005ee3d38>
[ 277.176020] CR2: ffff88000ad0c820
[ 277.176020] ---[ end trace 7383ee643f887e33 ]---
[ 277.176020] Fixing recursive fault but reboot is needed!
[ 277.176020] BUG: scheduling while atomic: mtd_probe/3071/0x00000002
[ 277.176020] INFO: lockdep is turned off.
[ 277.176020] Modules linked in:
[ 277.176020] irq event stamp: 3696
[ 277.176020] hardirqs last enabled at (3695): [<ffffffff811111d6>] __call_rcu+0x2fe/0x316
[ 277.176020] hardirqs last disabled at (3696): [<ffffffff82f220f6>] error_sti+0x5/0x6
[ 277.176020] softirqs last enabled at (2044): [<ffffffff810992fe>] __do_softirq+0x239/0x24f
[ 277.176020] softirqs last disabled at (1965): [<ffffffff82f2a36c>] call_softirq+0x1c/0x30
[ 277.176020] Pid: 3071, comm: mtd_probe Tainted: G D 3.4.0-rc1+ #1
[ 277.176020] Call Trace:
[ 277.176020] [<ffffffff810e1cb7>] ? print_irqtrace_events+0x9d/0xa1
[ 277.176020] [<ffffffff82e754f4>] __schedule_bug+0x66/0x6a
[ 277.176020] [<ffffffff82f1fb6c>] __schedule+0x8c/0x59a
[ 277.176020] [<ffffffff82f20303>] schedule+0x64/0x66
[ 277.176020] [<ffffffff81096c7a>] do_exit+0x111/0x463
[ 277.176020] [<ffffffff82f22965>] oops_end+0xbe/0xc6
[ 277.176020] [<ffffffff82e73d70>] no_context+0x1ae/0x1bd
[ 277.176020] [<ffffffff82e73f5d>] __bad_area_nosemaphore+0x1de/0x1ff
[ 277.176020] [<ffffffff82e73548>] ? pte_offset_kernel+0x14/0x3a
[ 277.176020] [<ffffffff82e73f91>] bad_area_nosemaphore+0x13/0x15
[ 277.176020] [<ffffffff82f24f5b>] do_page_fault+0x1bc/0x390
[ 277.176020] [<ffffffff810c4a39>] ? local_clock+0x36/0x4d
[ 277.176020] [<ffffffff82f220f6>] ? error_sti+0x5/0x6
[ 277.176020] [<ffffffff81127c5d>] ? time_hardirqs_off+0x26/0x29
[ 277.176020] [<ffffffff82f220f6>] ? error_sti+0x5/0x6
[ 277.176020] [<ffffffff810df07b>] ? trace_hardirqs_off_caller+0x3f/0x9c
[ 277.176020] [<ffffffff816da04d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 277.176020] [<ffffffff82f24a33>] do_async_page_fault+0x31/0x4f
[ 277.176020] [<ffffffff82f21ee5>] async_page_fault+0x25/0x30
[ 277.176020] [<ffffffff816ce939>] ? _atomic_dec_and_lock+0x2d/0x4c
[ 277.176020] [<ffffffff811a6fe2>] ? iput_final+0x1d/0x17c
[ 277.176020] [<ffffffff811a717e>] iput+0x3d/0x41
[ 277.176020] [<ffffffff81ccf9ea>] mtdchar_close+0x3f/0x78
[ 277.176020] [<ffffffff81192d7b>] __fput+0x117/0x1b8
[ 277.176020] [<ffffffff81192e36>] fput+0x1a/0x1c
[ 277.176020] [<ffffffff8118fba9>] filp_close+0x70/0x7b
[ 277.176020] [<ffffffff810958c4>] close_files+0xc6/0xff
[ 277.176020] [<ffffffff81095960>] put_files_struct.part.8+0x14/0xa9
[ 277.176020] [<ffffffff810967fb>] put_files_struct+0x1f/0x23
[ 277.176020] [<ffffffff81096899>] exit_files+0x48/0x50
[ 277.176020] [<ffffffff81096e2b>] do_exit+0x2c2/0x463
[ 277.176020] [<ffffffff810e2d60>] ? trace_hardirqs_on_caller+0x12d/0x164
[ 277.176020] [<ffffffff8109719b>] do_group_exit+0x88/0xb6
[ 277.176020] [<ffffffff810971e0>] sys_exit_group+0x17/0x17
[ 277.176020] [<ffffffff82f28e29>] system_call_fastpath+0x16/0x1b
[ 277.176265] VFS: Busy inodes after unmount of mtd_inodefs. Self-destruct in 5 seconds. Have a nice day...
[ 277.184584] BUG: unable to handle kernel paging request at ffff88000ac7d820
[ 277.184588] IP: [<ffffffff811a6fe2>] iput_final+0x1d/0x17c
[ 277.184594] PGD 4015063 PUD 4019063 PMD 17f91067 PTE ac7d160
This looks like a real bug in some device driver:
(x86_64-allyesdebian config)
[ 503.061353] BUG: unable to handle kernel NULL pointer dereference at 0000000000000015
[ 503.065177] IP: [<ffffffff81fb6325>] usb_hcd_flush_endpoint+0x2f/0xec
[ 503.065177] PGD 0
[ 503.065177] Oops: 0000 [#1] SMP
[ 503.065177] CPU 0
[ 503.065177] Modules linked in:
[ 503.065177]
[ 503.065177] Pid: 20, comm: kworker/0:1 Not tainted 3.4.0-rc7+ #320 Bochs Bochs
[ 503.065177] RIP: 0010:[<ffffffff81fb6325>] [<ffffffff81fb6325>] usb_hcd_flush_endpoint+0x2f/0xec
[ 503.065177] RSP: 0018:ffff88000e013d20 EFLAGS: 00010046
[ 503.065177] RAX: 0000000000380038 RBX: ffff88000ace0000 RCX: 0000000000000000
[ 503.065177] RDX: 0000000000000038 RSI: ffff88000ad001c0 RDI: ffffffff83e1d86c
[ 503.065177] RBP: 0000000000000005 R08: ffff88000e012000 R09: 0000000000000007
[ 503.065177] R10: 000000000000086e R11: 0000000000000000 R12: 0000000000000015
[ 503.065177] R13: 0000000600000005 R14: 0000000000000001 R15: 0000000000000400
[ 503.065177] FS: 0000000000000000(0000) GS:ffff88000f400000(0000) knlGS:0000000000000000
[ 503.065177] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 503.065177] CR2: 0000000000000015 CR3: 0000000003813000 CR4: 00000000000006f0
[ 503.065177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 503.065177] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 503.065177] Process kworker/0:1 (pid: 20, threadinfo ffff88000e012000, task ffff88000f3ea140)
[ 503.065177] Stack:
[ 503.065177] ffff88000ace0000 ffff88000ace0000 ffff88000ace0008 ffff88000ace0080
[ 503.065177] 0000000000000400 ffffffff81fb969d 000000000000001d ffff88000ace8400
[ 503.065177] 0000000000000720 ffff88000acea888 ffff88000acea800 ffff88000e013df8
[ 503.065177] Call Trace:
[ 503.065177] [<ffffffff81fb969d>] ? usb_suspend_both+0x155/0x184
[ 503.065177] [<ffffffff81fba2be>] ? usb_runtime_suspend+0x27/0x54
[ 503.065177] [<ffffffff81fba297>] ? usb_probe_device+0x33/0x33
[ 503.065177] [<ffffffff818bfe62>] ? __rpm_callback+0x2b/0x4f
[ 503.065177] [<ffffffff818c034b>] ? rpm_suspend+0x2f1/0x47c
[ 503.065177] [<ffffffff81086379>] ? mod_timer+0x7b/0x89
[ 503.065177] [<ffffffff818c109c>] ? pm_schedule_suspend+0xab/0xab
[ 503.065177] [<ffffffff818c110d>] ? pm_runtime_work+0x71/0x8c
[ 503.065177] [<ffffffff8108f6c3>] ? process_one_work+0x125/0x296
[ 503.065177] [<ffffffff8108fa8b>] ? worker_thread+0xc4/0x164
[ 503.065177] [<ffffffff8108f9c7>] ? rescuer_thread+0x16e/0x16e
[ 503.065177] [<ffffffff8109340f>] ? kthread+0x7d/0x85
[ 503.065177] [<ffffffff82afac64>] ? kernel_thread_helper+0x4/0x10
[ 503.065177] [<ffffffff81093392>] ? flush_kthread_worker+0x92/0x92
[ 503.065177] [<ffffffff82afac60>] ? gs_change+0x13/0x13
[ 503.065177] Code: f6 41 54 55 48 89 f5 53 53 0f 84 d0 00 00 00 4c 8d 65 10 48 89 fb e8 b2 d9 b3 00 48 c7 c7 6c d8 e1 83 4c 8b 6b 40 e8 28 e4 b3 00 <48> 8b 5d 10 eb 0a 83 7b 18 00 74 0a 48 8b 5b 20 48 83 eb 20 eb
[ 503.065177] RIP [<ffffffff81fb6325>] usb_hcd_flush_endpoint+0x2f/0xec
[ 503.065177] RSP <ffff88000e013d20>
[ 503.065177] CR2: 0000000000000015
[ 503.065177] ---[ end trace e6f1441a2b05b7c3 ]---
Another trace:
[ 109.687421] intel_oaktrail: Platform not recognized (You could try the module's force-parameter)
[ 109.687421] ieee802154hardmac ieee802154hardmac: Added ieee802154 HardMAC hardware
[ 109.974115] vhci_hcd: changed 0
[ 110.046770] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 110.050713] CPU 0
[ 110.050713] Modules linked in:
[ 110.050713]
[ 110.050713] Pid: 20, comm: kworker/0:1 Not tainted 3.5.0-rc1+ #13 Bochs Bochs
[ 110.050713] RIP: 0010:[<ffffffff822b1d35>] [<ffffffff822b1d35>] usb_hcd_flush_endpoint+0x4c/0x10b
[ 110.050713] RSP: 0018:ffff880015811c40 EFLAGS: 00010046
[ 110.050713] RAX: ffffffff822b1d35 RBX: ffff88000bde0000 RCX: 0000000000000001
[ 110.050713] RDX: ffff88000bde0000 RSI: ffffffff841d76e8 RDI: 0000000000000046
[ 110.050713] RBP: ffff880015811c60 R08: ffffffff84a434b0 R09: ffffffff84a434b0
[ 110.050713] R10: ffffffff822b1d35 R11: ffffffff841d76e8 R12: 5a5a0030305f7065
[ 110.050713] R13: 5a5a0030305f7075 R14: fffffffdfffffffd R15: 0000000000000001
[ 110.050713] FS: 0000000000000000(0000) GS:ffff880017400000(0000) knlGS:0000000000000000
[ 110.050713] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 110.050713] CR2: 0000000000000000 CR3: 0000000004014000 CR4: 00000000000006f0
[ 110.050713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 110.050713] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 110.050713] Process kworker/0:1 (pid: 20, threadinfo ffff880015810000, task ffff88001580c500)
[ 110.050713] Stack:
[ 110.050713] ffff88000bde0000 ffff88000bde0000 ffff88000bde0080 0000000000000400
[ 110.050713] ffff880015811cb0 ffffffff822b5b46 ffff88000be2d7f0 0000040015811d40
[ 110.050713] ffff880015811ca0 ffff88000bde2878 ffff88000bde27f0 ffff880015811d28
[ 110.050713] Call Trace:
[ 110.050713] Call Trace:
[ 110.050713] [<ffffffff822b5b46>] usb_suspend_both+0x16f/0x19d
[ 110.050713] [<ffffffff822b64b2>] usb_runtime_suspend+0x36/0x5d
[ 110.050713] [<ffffffff822b647c>] ? usb_probe_device+0x37/0x37
[ 110.050713] [<ffffffff81a14ca5>] __rpm_callback+0x34/0x5b
[ 110.050713] [<ffffffff81a1538a>] rpm_suspend+0x2ff/0x49d
[ 110.050713] [<ffffffff810e1a7b>] ? lock_acquired+0x5f/0x68
[ 110.050713] [<ffffffff81a16283>] ? pm_runtime_work+0x1e/0xa1
[ 110.050713] [<ffffffff81a162ed>] pm_runtime_work+0x88/0xa1
[ 110.050713] [<ffffffff810ac446>] process_one_work+0x240/0x401
[ 110.050713] [<ffffffff810ac36b>] ? process_one_work+0x165/0x401
[ 110.050713] [<ffffffff810ad05c>] ? worker_thread+0x38/0x15f
[ 110.050713] [<ffffffff81a16265>] ? pm_schedule_suspend+0xb1/0xb1
[ 110.050713] [<ffffffff810ad0ff>] worker_thread+0xdb/0x15f
[ 110.050713] [<ffffffff810ad024>] ? manage_workers.isra.24+0xaa/0xaa
[ 110.050713] [<ffffffff810b3f67>] kthread+0xaf/0xb7
[ 110.050713] [<ffffffff816f4b5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 110.050713] [<ffffffff82f8af74>] kernel_thread_helper+0x4/0x10
[ 110.050713] [<ffffffff82f828b0>] ? retint_restore_args+0x13/0x13
[ 110.050713] [<ffffffff810b3eb8>] ? __init_kthread_worker+0x5a/0x5a
[ 110.050713] [<ffffffff82f8af70>] ? gs_change+0x13/0x13
[ 110.050713] Code: be 5b 06 00 00 48 c7 c7 81 40 cd 83 4d 8d 6c 24 10 e8 1c e1 e0 fe e8 0f f0 cc 00 48 c7 c7 d0 76 1d 84 4c 8b 73 40 e8 a6 fe cc 00 <49> 8b 5c 24 10 eb 0a 83 7b 18 00 74 13 48 8b 5b 20 48 83 eb 20
[ 110.050713] RIP [<ffffffff822b1d35>] usb_hcd_flush_endpoint+0x4c/0x10b
[ 110.050713] RSP <ffff880015811c40>
[ 110.050713] ---[ end trace 3ffa21620df1b13a ]---
[ 110.050713] note: kworker/0:1[20] exited with preempt_count 1
[ 114.071820] BUG: scheduling while atomic: kworker/0:1/20/0x10000002
[ 114.132169] INFO: lockdep is turned off.
[ 114.192540] Modules linked in:
[ 114.252134] Pid: 20, comm: kworker/0:1 Tainted: G D 3.5.0-rc1+ #13
[ 114.315119] Call Trace:
[ 114.376805] [<ffffffff82ed365d>] __schedule_bug+0x66/0x75
[ 114.440581] [<ffffffff82f807e4>] __schedule+0x8c/0x599
[ 114.504267] [<ffffffff811ba97a>] ? exit_fs+0x3a/0x75
[ 114.568691] [<ffffffff810c252b>] __cond_resched+0x27/0x32
[ 114.633880] [<ffffffff82f80d4d>] _cond_resched+0x19/0x22
[ 114.698817] [<ffffffff82f8000b>] mutex_lock_nested+0x2a/0x45
[ 114.764387] [<ffffffff811407d8>] perf_event_exit_task+0x2a/0x9d
[ 114.830649] [<ffffffff81097a12>] do_exit+0x360/0x477
[ 114.896676] [<ffffffff82f83627>] oops_end+0xbe/0xc6
[ 114.962852] [<ffffffff8104d072>] die+0x5a/0x64
[ 115.028695] [<ffffffff82f83228>] do_general_protection+0x131/0x13a
[ 115.096531] [<ffffffff82f82b25>] general_protection+0x25/0x30
[ 115.164060] [<ffffffff822b1d35>] ? usb_hcd_flush_endpoint+0x4c/0x10b
[ 115.232190] [<ffffffff822b1d35>] ? usb_hcd_flush_endpoint+0x4c/0x10b
[ 115.298747] [<ffffffff822b1d35>] ? usb_hcd_flush_endpoint+0x4c/0x10b
[ 115.364073] [<ffffffff822b5b46>] usb_suspend_both+0x16f/0x19d
[ 115.428984] [<ffffffff822b64b2>] usb_runtime_suspend+0x36/0x5d
[ 115.494194] [<ffffffff822b647c>] ? usb_probe_device+0x37/0x37
[ 115.558063] [<ffffffff81a14ca5>] __rpm_callback+0x34/0x5b
[ 115.620646] [<ffffffff81a1538a>] rpm_suspend+0x2ff/0x49d
[ 115.679620] [<ffffffff810e1a7b>] ? lock_acquired+0x5f/0x68
[ 115.737314] [<ffffffff81a16283>] ? pm_runtime_work+0x1e/0xa1
[ 115.794345] [<ffffffff81a162ed>] pm_runtime_work+0x88/0xa1
[ 115.851304] [<ffffffff810ac446>] process_one_work+0x240/0x401
[ 115.908428] [<ffffffff810ac36b>] ? process_one_work+0x165/0x401
[ 115.964883] [<ffffffff810ad05c>] ? worker_thread+0x38/0x15f
[ 116.020499] [<ffffffff81a16265>] ? pm_schedule_suspend+0xb1/0xb1
[ 116.077215] [<ffffffff810ad0ff>] worker_thread+0xdb/0x15f
[ 116.131873] [<ffffffff810ad024>] ? manage_workers.isra.24+0xaa/0xaa
[ 116.185794] [<ffffffff810b3f67>] kthread+0xaf/0xb7
[ 116.238823] [<ffffffff816f4b5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 116.293488] [<ffffffff82f8af74>] kernel_thread_helper+0x4/0x10
[ 116.348421] [<ffffffff82f828b0>] ? retint_restore_args+0x13/0x13
[ 116.403763] [<ffffffff810b3eb8>] ? __init_kthread_worker+0x5a/0x5a
[ 116.459768] [<ffffffff82f8af70>] ? gs_change+0x13/0x13
A funny bug triggered at kernel panic time:
(x86_64-allyesdebian config)
[ 1526.520230] ===============================
[ 1526.520230] [ INFO: suspicious RCU usage. ]
[ 1526.520230] 3.5.0-rc1+ #12 Not tainted
[ 1526.520230] -------------------------------
[ 1526.520230] /c/kernel-tests/mm/include/linux/rcupdate.h:436 Illegal context switch in RCU read-side critical section!
[ 1526.520230]
[ 1526.520230] other info that might help us debug this:
[ 1526.520230]
[ 1526.520230]
[ 1526.520230] rcu_scheduler_active = 1, debug_locks = 0
[ 1526.520230] 3 locks held by net.agent/3279:
[ 1526.520230] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff82f85962>] do_page_fault+0x193/0x390
[ 1526.520230] #1: (panic_lock){+.+...}, at: [<ffffffff82ed2830>] panic+0x37/0x1d3
[ 1526.520230] #2: (rcu_read_lock){.+.+..}, at: [<ffffffff810b9b28>] rcu_lock_acquire+0x0/0x29
[ 1526.520230]
[ 1526.520230] stack backtrace:
[ 1526.520230] Pid: 3279, comm: net.agent Not tainted 3.5.0-rc1+ #12
[ 1526.520230] Call Trace:
[ 1526.520230] [<ffffffff810e1570>] lockdep_rcu_suspicious+0x109/0x112
[ 1526.520230] [<ffffffff810bfe3a>] rcu_preempt_sleep_check+0x45/0x47
[ 1526.520230] [<ffffffff810bfe5a>] __might_sleep+0x1e/0x19a
[ 1526.520230] [<ffffffff82f8010e>] down_write+0x26/0x81
[ 1526.520230] [<ffffffff8276a966>] led_trigger_unregister+0x1f/0x9c
[ 1526.520230] [<ffffffff8276def5>] heartbeat_reboot_notifier+0x15/0x19
[ 1526.520230] [<ffffffff82f85bf5>] notifier_call_chain+0x96/0xcd
[ 1526.520230] [<ffffffff82f85cba>] __atomic_notifier_call_chain+0x8e/0xff
[ 1526.520230] [<ffffffff81094b7c>] ? kmsg_dump+0x37/0x1eb
[ 1526.520230] [<ffffffff82f85d3f>] atomic_notifier_call_chain+0x14/0x16
[ 1526.520230] [<ffffffff82ed28e1>] panic+0xe8/0x1d3
[ 1526.520230] [<ffffffff811473e2>] out_of_memory+0x15d/0x1d3
[ 1526.520230] [<ffffffff8114be63>] __alloc_pages_nodemask+0x5c2/0x762
[ 1526.520230] [<ffffffff8117b479>] alloc_pages_current+0xd2/0xf3
[ 1526.520230] [<ffffffff81143cf8>] __page_cache_alloc+0xa1/0xad
[ 1526.520230] [<ffffffff81145d92>] filemap_fault+0x1f9/0x2f6
[ 1526.520230] [<ffffffff8112a549>] ? time_hardirqs_off+0x26/0x29
[ 1526.520230] [<ffffffff810e022a>] ? trace_hardirqs_off_caller+0x1f/0x9c
[ 1526.520230] [<ffffffff811631ad>] __do_fault+0xc8/0x37d
[ 1526.520230] [<ffffffff810e0841>] ? lock_release_holdtime.part.23+0x4e/0x55
[ 1526.520230] [<ffffffff81165d21>] handle_pte_fault+0xf1/0x196
[ 1526.520230] [<ffffffff81162acd>] ? pmd_offset+0x14/0x3a
[ 1526.520230] [<ffffffff8116611f>] handle_mm_fault+0x1b1/0x1cb
[ 1526.520230] [<ffffffff82f85b3a>] do_page_fault+0x36b/0x390
[ 1526.520230] [<ffffffff816f485e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1526.520230] [<ffffffff81165d54>] ? handle_pte_fault+0x124/0x196
[ 1526.520230] [<ffffffff81165d54>] ? handle_pte_fault+0x124/0x196
[ 1526.520230] [<ffffffff82f82aa6>] ? error_sti+0x5/0x6
[ 1526.520230] [<ffffffff8112a549>] ? time_hardirqs_off+0x26/0x29
[ 1526.520230] [<ffffffff82f82aa6>] ? error_sti+0x5/0x6
[ 1526.520230] [<ffffffff810e024a>] ? trace_hardirqs_off_caller+0x3f/0x9c
[ 1526.520230] [<ffffffff816f489d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1526.520230] [<ffffffff82f85454>] do_async_page_fault+0x31/0x5e
[ 1526.520230] [<ffffffff82f82885>] async_page_fault+0x25/0x30
Thanks,
Fengguang
View attachment "config-nfsroot" of type "text/plain" (88606 bytes)
View attachment "config-allyesdebian" of type "text/plain" (137149 bytes)
Powered by blists - more mailing lists