[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171109072222.7nybw6pqqzksp6gy@wfg-t540p.sh.intel.com>
Date: Thu, 9 Nov 2017 15:22:22 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
Network Development <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
intel-wired-lan <intel-wired-lan@...ts.osuosl.org>,
Girish Moodalbail <girish.moodalbail@...cle.com>,
Cong Wang <xiyou.wangcong@...il.com>
Subject: Re: [vlan_device_event] BUG: unable to handle kernel paging request
at 6b6b6ccf
On Thu, Nov 09, 2017 at 12:09:45PM +0800, Fengguang Wu wrote:
>On Thu, Nov 09, 2017 at 11:12:06AM +0800, Fengguang Wu wrote:
>>Hi Alex,
>>
>>>So looking over the trace the panic seems to be happening after a
>>>decnet interface is getting deleted. Is there any chance we could try
>>>compiling the kernel without decnet support to see if that is the
>>>source of these issues? I don't know if anyone on the Intel Wired Lan
>>>team is testing with that enabled so if we can eliminate that as a
>>>possible cause that would be useful.
>>
>>Sure and thank you for the suggestion!
>>
>>It looks disabling DECNET still triggers the vlan_device_event BUG.
>>However when looking at the dmesgs, I find another warning just before
>>the vlan_device_event BUG. Not sure if it's related one or independent
>>now-fixed issue.
>>
>>Please press Enter to activate this console.
>>[ 1291.938326] Writes: Total: 2 Max/Min: 0/0 Fail: 0
>>[ 1297.731690] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>[ 1297.828227] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1300.506245] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1302.467460] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-10, MAC , kernel 4.13.0 1, serial console /dev/ttyS0
>>[ 1304.161688] Kernel tests: Boot OK!
>>[ 1306.558532] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1308.507499] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>[ 1310.526380] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1311.246017] LKP: waiting for network...
>>[ 1312.543432] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>[ 1313.985807]
>>[ 1313.991541] =====================================
>>[ 1314.002398] WARNING: bad unlock balance detected!
>>[ 1314.013154] 4.13.0 #1 Not tainted
>>[ 1314.021549] -------------------------------------
>>[ 1314.032505] procd/1244 is trying to release lock (rcu_preempt_state) at:
>>[ 1314.047216] [<c10e5840>] rcu_read_unlock_special+0x580/0x5b0
>>[ 1314.059825] but there are no more locks to release!
>>[ 1314.070546]
>>[ 1314.070546] other info that might help us debug this:
>>[ 1314.085941] 2 locks held by procd/1244:
>>[ 1314.095139] #0: (&sig->cred_guard_mutex){......}, at: [<c12587b8>] prepare_bprm_creds+0x28/0xc0
>>[ 1314.114616] #1: (rcu_read_lock){......}, at: [<c1260140>] path_init+0x490/0x6f0
>>[ 1314.132155]
>>[ 1314.132155] stack backtrace:
>>[ 1314.144402] CPU: 0 PID: 1244 Comm: procd Not tainted 4.13.0 #1
>>[ 1314.160197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>[ 1314.179404] Call Trace:
>>[ 1314.186768] dump_stack+0x16/0x1c
>>[ 1314.195387] print_unlock_imbalance_bug+0xb9/0xd0
>>[ 1314.205753] ? rcu_read_unlock_special+0x580/0x5b0
>>[ 1314.216381] ? rcu_read_unlock_special+0x580/0x5b0
>>[ 1314.226982] lock_release+0x1cc/0x490
>>[ 1314.235602] ? rcu_gp_kthread_wake+0x34/0x50
>>[ 1314.245262] ? rcu_read_unlock_special+0x580/0x5b0
>>[ 1314.255724] rt_mutex_unlock+0x1e/0xb0
>>[ 1314.264610] rcu_read_unlock_special+0x580/0x5b0
>>[ 1314.274814] __rcu_read_unlock+0xa7/0xb0
>>[ 1314.283954] unlazy_walk+0xcf/0x1f0
>>[ 1314.292409] trailing_symlink+0x349/0x4e0
>>[ 1314.301583] path_openat+0x333/0x1280
>>[ 1314.310197] do_filp_open+0x67/0x140
>>[ 1314.318696] ? getname_kernel+0x23/0x1e0
>>[ 1314.327766] ? cache_alloc_debugcheck_after+0x13a/0x2a0
>>[ 1314.340076] ? getname_kernel+0x23/0x1e0
>>[ 1314.349179] do_open_execat+0xab/0x2a0
>>[ 1314.358063] open_exec+0x57/0x80
>>[ 1314.366128] load_script+0x33c/0x3d0
>>[ 1314.374556] ? kvm_sched_clock_read+0x9/0x20
>>[ 1314.384219] ? sched_clock+0x9/0x10
>>[ 1314.392611] ? sched_clock_cpu+0x1a/0x1e0
>>[ 1314.401875] ? _raw_read_unlock+0x55/0x90
>>[ 1314.411080] search_binary_handler+0xd9/0x160
>>[ 1314.420799] do_execveat_common+0x8f6/0xb10
>>[ 1314.430334] SyS_execve+0x1f/0x30
>>[ 1314.438458] do_int80_syscall_32+0x95/0x1b0
>>[ 1314.447956] entry_INT80_32+0x2f/0x2f
>>[ 1314.456606] EIP: 0xb7e9ab07
>>[ 1314.464062] EFLAGS: 00000296 CPU: 0
>>[ 1314.472421] EAX: ffffffda EBX: 0807b584 ECX: bfb0fd70 EDX: 08061250
>>[ 1314.485257] ESI: 0807b584 EDI: 00000000 EBP: bfb0fd58 ESP: bfb0fd28
>>[ 1314.498024] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
>>[ 1314.613681] hotplug-call (1244) used greatest stack depth: 6384 bytes left
>>[ 1314.957636] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1316.955154] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>[ 1318.197800] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1320.222754] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>[ 1321.409456] BUG: unable to handle kernel paging request at 6b6b6f4f
>>[ 1321.421942] IP: vlan_device_event+0x7f5/0xa40
>>[ 1321.431239] *pde = 00000000
>
>Note that this call trace is different from the ones posted in earlier
>emails.
>
>[ 1321.431267]
>[ 1321.443356] Oops: 0000 [#1] PREEMPT
>[ 1321.451390] CPU: 0 PID: 798 Comm: netifd Not tainted 4.13.0 #1
>[ 1321.462802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>[ 1321.479743] task: cf8ae5c0 task.stack: cf114000
>[ 1321.489345] EIP: vlan_device_event+0x7f5/0xa40
>[ 1321.498701] EFLAGS: 00010206 CPU: 0
>[ 1321.506683] EAX: 000000f9 EBX: 00000002 ECX: 00000001 EDX: 6b6b6b6b
>[ 1321.518893] ESI: 000014f9 EDI: cf3ec700 EBP: cf115dd8 ESP: cf115da4
>[ 1321.531044] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
>[ 1321.541906] CR0: 80050033 CR2: 6b6b6f4f CR3: 10a02000 CR4: 00000690
>[ 1321.554097] Call Trace:
>[ 1321.560423] ? __local_bh_enable_ip+0xf6/0x1a0
>[ 1321.569987] ? _raw_write_unlock_bh+0x2a/0x30
>[ 1321.579393] notifier_call_chain+0x4e/0xa0
>[ 1321.588336] raw_notifier_call_chain+0xc/0x10
>[ 1321.597604] call_netdevice_notifiers_info+0x59/0x90
>[ 1321.607779] __dev_notify_flags+0xea/0x130
>[ 1321.616697] dev_change_flags+0x60/0x70
>[ 1321.625255] dev_ifsioc+0x47b/0x5a0
>[ 1321.633354] ? mutex_lock_nested+0x14/0x20
>[ 1321.642292] dev_ioctl+0x36f/0xb20
>[ 1321.650403] sock_ioctl+0x1cd/0x350
>[ 1321.658401] ? sock_fasync+0xb0/0xb0
>[ 1321.666573] vfs_ioctl+0x33/0x70
>[ 1321.674214] do_vfs_ioctl+0x8d/0xc60
>[ 1321.682384] ? kmem_cache_free+0x186/0x290
>[ 1321.691201] ? putname+0x9f/0xe0
>[ 1321.698876] ? putname+0x9f/0xe0
>[ 1321.706491] ? do_sys_open+0x28d/0x420
>[ 1321.714920] ? __fget_light+0xb7/0xf0
>[ 1321.723153] SyS_ioctl+0x98/0xb0
>[ 1321.730961] do_int80_syscall_32+0x95/0x1b0
>[ 1321.740064] entry_INT80_32+0x2f/0x2f
>[ 1321.748335] EIP: 0xb7eb9648
>[ 1321.755299] EFLAGS: 00000246 CPU: 0
>[ 1321.763209] EAX: ffffffda EBX: 00000005 ECX: 00008914 EDX: bfed0640
>[ 1321.775289] ESI: bfed0640 EDI: bfed0660 EBP: bfed0678 ESP: bfed060c
>[ 1321.787449] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
>[ 1321.798394] Code: 8d f4 5d b3 c4 8d 56 01 89 14 8d f4 5d b3 c4 0f 85 03 02 00 00 89 7d d4 31 f6 8b 7d d8 e9 84 00 00 00 8d 74 26 00 25 ff 01 00 00 <8b> 1c 82 31 d2 85 db 0f 95 c2 8b 04 95 e0 5d b3 c4 83 c0 01 85
>[ 1321.833755] EIP: vlan_device_event+0x7f5/0xa40 SS:ESP: 0068:cf115da4
>[ 1321.846005] CR2: 000000006b6b6f4f
>
>>[ 1321.431267]
>>
>>Attached is the full .config and dmesg. Please don't waste time on reproducing
>>-- it's not quite possible since it seems to only happen in one of our
>>host machine.
>
>Here are the faddr2line parsed 4.11 call traces:
>
>[ 194.685204] wait for background monitors: 1164 oom-killer
>[ 196.712107] BUG: unable to handle kernel paging request at 6b6b6e73
>[ 196.725844] IP: vlan_device_event+0x725/0xa30:
>vlan_device_event at net/8021q/vlan.h:59
>[ 196.735827] *pde = 00000000
>[ 196.735855]
>[ 196.749561] Oops: 0000 [#1] PREEMPT
>[ 196.769520] CPU: 0 PID: 797 Comm: netifd Not tainted 4.11.0 #1
>[ 196.792728] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>[ 196.821634] task: cfcc9000 task.stack: cfc7e000
>[ 196.831969] EIP: vlan_device_event+0x725/0xa30:
>vlan_device_event at net/8021q/vlan.h:59
>[ 196.842019] EFLAGS: 00010202 CPU: 0
>[ 196.850925] EAX: 000000c2 EBX: 00000002 ECX: 00000000 EDX: 6b6b6b6b
>[ 196.863985] ESI: 000008c2 EDI: d08c0f00 EBP: cfc7fdd4 ESP: cfc7fda0
>[ 196.876463] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
>[ 196.887903] CR0: 80050033 CR2: 6b6b6e73 CR3: 10f0b000 CR4: 00000690
>[ 196.900970] Call Trace:
>[ 196.907905] ? dn_dev_delete+0x18c/0x1b0:
>dn_dev_delete at net/decnet/dn_dev.c:1219
>[ 196.917749] ? dn_dev_down+0x69/0x80:
>dn_dev_down at net/decnet/dn_dev.c:1235
>[ 196.926701] notifier_call_chain+0x4e/0xa0:
>notifier_call_chain at kernel/notifier.c:95 (discriminator 1)
>--
>[ 197.042320] do_vfs_ioctl+0x91/0xcc0:
>do_vfs_ioctl at fs/ioctl.c:685
>[ 197.051621] ? kmem_cache_free+0x124/0x270:
>kmem_cache_free at include/linux/rcupdate.h:981
>[ 197.061767] ? putname+0x9f/0xe0:
>putname at fs/namei.c:258
>[ 197.080497] ? putname+0x9f/0xe0:
>putname at fs/namei.c:258
>[ 197.098679] ? do_sys_open+0x271/0x3e0:
>do_sys_open at fs/open.c:1062
>[ 197.108071] ? __fget_light+0xb7/0xf0:
>__fget_light at fs/file.c:752 (discriminator 2)
>[ 197.127385] SyS_ioctl+0x96/0xb0:
>SyS_ioctl at fs/ioctl.c:700
>[ 197.136262] do_int80_syscall_32+0x95/0x1b0:
>do_int80_syscall_32 at arch/x86/entry/common.c:323
>[ 197.146639] entry_INT80_32+0x2f/0x2f:
>restore_all at arch/x86/entry/entry_32.S:530
>[ 197.155513] EIP: 0xb7727648
>[ 197.162874] EFLAGS: 00000246 CPU: 0
>[ 197.171852] EAX: ffffffda EBX: 00000005 ECX: 00008914 EDX: bff1f5d0
>[ 197.184875] ESI: bff1f5d0 EDI: bff1f5f0 EBP: bff1f608 ESP: bff1f59c
>[ 197.198320] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
>[ 197.210120] Code: 8d fc a1 8d c4 8d 56 01 89 14 8d fc a1 8d c4 0f 85 b4 02 00 00 89 7d d8 31 f6 8b 7d e0 e9 84 00 00 00 8d 74 26 00 25 ff 01 00 00 <8b> 1c 82 31 d2 85 db 0f 95 c2 8b 04 95 e8 a1 8d c4 83 c0 01 85
>[ 197.249301] EIP: vlan_device_event+0x725/0xa30:
>vlan_device_event at net/8021q/vlan.h:59 SS:ESP: 0068:cfc7fda0
>[ 197.262812] CR2: 000000006b6b6e73
>[ 199.290882] /bin/busybox wget -q http://inn:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/scheduled/vm-lkp-wsx03-openwrt-i386-11/trinity-300s-openwrt-i386-2016-03-16.cgz-a351e9b9fc24e982ec2f0e76379a49826036da12-20171109-13235-ab6tcc-wait_kernel-4.yaml&loadavg=5.12%202.92%201.17%204/93%201360&start_time=1510191311&end_time=1510191314&version=/lkp/lkp/.src-20171109-011142& -O /dev/null
>[ 199.356767] wget: socket(AF_INET,1,0): Address family not supported by protocol
>[ 199.474622] ---[ end trace a8f7db573cc5216e ]---
>[ 199.474631] Kernel panic - not syncing: Fatal exception
>[ 199.474660] Kernel Offset: disabled
>[ 199.534708] Rebooting in 3 seconds..
Here is yet another call trace, this time it's "NULL point dereference".
[ 748.471023] procd: - init -
Please press Enter to activate this console.
[ 753.540379] random: uci: uninitialized urandom read (6 bytes read)
[ 754.030470] random: uci: uninitialized urandom read (6 bytes read)
[ 754.245351] random: uci: uninitialized urandom read (6 bytes read)
[ 757.258323] Writes: Total: 2 Max/Min: 0/0 Fail: 0
[ 760.340612] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 760.448681] 8021q: adding VLAN 0 to HW filter on device eth0
[ 762.403927] 8021q: adding VLAN 0 to HW filter on device eth0
[ 764.380369] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-18, MAC , kernel 4.11.0-08060-g8d5e72d 1, serial console /dev/ttyS0
[ 765.018938] Kernel tests: Boot OK!
[ 766.118342] BUG: unable to handle kernel NULL pointer dereference at 00000639
[ 766.127471] IP: vlan_device_event+0x7f5/0xa40
[ 766.133669] *pde = 00000000
[ 766.133694]
[ 766.141020] Oops: 0000 [#1] PREEMPT
[ 766.146171] CPU: 0 PID: 832 Comm: netifd Not tainted 4.11.0-08060-g8d5e72d #1
[ 766.154925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 766.166422] task: d1ac3000 task.stack: d2952000
[ 766.172581] EIP: vlan_device_event+0x7f5/0xa40
[ 766.178581] EFLAGS: 00010206 CPU: 0
[ 766.183537] EAX: 0000018e EBX: 00000002 ECX: 00000000 EDX: 00000001
[ 766.191471] ESI: 0000038e EDI: d1f0ed80 EBP: d2953dd8 ESP: d2953da4
[ 766.199453] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 766.206997] CR0: 80050033 CR2: 00000639 CR3: 1199e000 CR4: 00000690
[ 766.216394] Call Trace:
[ 766.220644] ? dn_dev_delete+0x138/0x1d0
[ 766.226253] ? dn_dev_down+0x69/0x80
[ 766.231402] notifier_call_chain+0x4e/0xa0
[ 766.237335] raw_notifier_call_chain+0xc/0x10
[ 766.243331] call_netdevice_notifiers_info+0x59/0x90
[ 766.250689] __dev_notify_flags+0xea/0x130
[ 766.257746] dev_change_flags+0x60/0x70
[ 766.265073] dev_ifsioc+0x44b/0x570
[ 766.271028] ? mutex_lock_nested+0x14/0x20
[ 766.278162] dev_ioctl+0x399/0xa70
[ 766.284317] ? slow_virt_to_phys+0x3e/0x110
[ 766.291227] sock_ioctl+0x1cd/0x350
[ 766.297515] ? sock_fasync+0xb0/0xb0
[ 766.304019] vfs_ioctl+0x33/0x70
[ 766.309894] do_vfs_ioctl+0x8d/0xc30
[ 766.316242] ? kmem_cache_free+0x186/0x290
[ 766.323504] ? putname+0x9f/0xe0
[ 766.330082] ? putname+0x9f/0xe0
[ 766.335780] ? do_sys_open+0x28d/0x420
[ 766.342393] ? __fget_light+0xb7/0xf0
[ 766.348578] SyS_ioctl+0x98/0xb0
[ 766.353281] do_int80_syscall_32+0x95/0x1b0
[ 766.359117] entry_INT80_32+0x2f/0x2f
[ 766.364295] EIP: 0xb7766648
[ 766.368512] EFLAGS: 00000246 CPU: 0
[ 766.373497] EAX: ffffffda EBX: 00000005 ECX: 00008914 EDX: bfdae180
[ 766.381443] ESI: bfdae180 EDI: bfdae1a0 EBP: bfdae1b8 ESP: bfdae14c
[ 766.389775] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 766.396967] Code: 8d a4 80 9b c4 8d 56 01 89 14 8d a4 80 9b c4 0f 85 03 02 00 00 89 7d d4 31 f6 8b 7d d8 e9 84 00 00 00 8d 74 26 00 25 ff 01 00 00 <8b> 1c 82 31 d2 85 db 0f 95 c2 8b 04 95 90 80 9b c4 83 c0 01 85
[ 766.420978] EIP: vlan_device_event+0x7f5/0xa40 SS:ESP: 0068:d2953da4
[ 766.428984] CR2: 0000000000000639
[ 768.045479] ---[ end trace 4eaaf76db348fd80 ]---
[ 768.105484] Kernel panic - not syncing: Fatal exception
[ 768.112938] Kernel Offset: disabled
[ 768.118386] Rebooting in 3 seconds..
kernel NULL pointer
View attachment "dmesg-vm-lkp-wsx03-openwrt-i386-18:20171108184543:i386-randconfig-b0-11061302:4.11.0-08060-g8d5e72d:1" of type "text/plain" (90115 bytes)
Powered by blists - more mailing lists