[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171109031206.x6ta5ysdalf3lk3s@wfg-t540p.sh.intel.com>
Date: Thu, 9 Nov 2017 11:12:06 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
Network Development <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
intel-wired-lan <intel-wired-lan@...ts.osuosl.org>
Subject: Re: [vlan_device_event] BUG: unable to handle kernel paging request
at 6b6b6ccf
Hi Alex,
>So looking over the trace the panic seems to be happening after a
>decnet interface is getting deleted. Is there any chance we could try
>compiling the kernel without decnet support to see if that is the
>source of these issues? I don't know if anyone on the Intel Wired Lan
>team is testing with that enabled so if we can eliminate that as a
>possible cause that would be useful.
Sure and thank you for the suggestion!
It looks disabling DECNET still triggers the vlan_device_event BUG.
However when looking at the dmesgs, I find another warning just before
the vlan_device_event BUG. Not sure if it's related one or independent
now-fixed issue.
Please press Enter to activate this console.
[ 1291.938326] Writes: Total: 2 Max/Min: 0/0 Fail: 0
[ 1297.731690] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 1297.828227] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1300.506245] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1302.467460] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-10, MAC , kernel 4.13.0 1, serial console /dev/ttyS0
[ 1304.161688] Kernel tests: Boot OK!
[ 1306.558532] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1308.507499] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 1310.526380] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1311.246017] LKP: waiting for network...
[ 1312.543432] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 1313.985807]
[ 1313.991541] =====================================
[ 1314.002398] WARNING: bad unlock balance detected!
[ 1314.013154] 4.13.0 #1 Not tainted
[ 1314.021549] -------------------------------------
[ 1314.032505] procd/1244 is trying to release lock (rcu_preempt_state) at:
[ 1314.047216] [<c10e5840>] rcu_read_unlock_special+0x580/0x5b0
[ 1314.059825] but there are no more locks to release!
[ 1314.070546]
[ 1314.070546] other info that might help us debug this:
[ 1314.085941] 2 locks held by procd/1244:
[ 1314.095139] #0: (&sig->cred_guard_mutex){......}, at: [<c12587b8>] prepare_bprm_creds+0x28/0xc0
[ 1314.114616] #1: (rcu_read_lock){......}, at: [<c1260140>] path_init+0x490/0x6f0
[ 1314.132155]
[ 1314.132155] stack backtrace:
[ 1314.144402] CPU: 0 PID: 1244 Comm: procd Not tainted 4.13.0 #1
[ 1314.160197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 1314.179404] Call Trace:
[ 1314.186768] dump_stack+0x16/0x1c
[ 1314.195387] print_unlock_imbalance_bug+0xb9/0xd0
[ 1314.205753] ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.216381] ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.226982] lock_release+0x1cc/0x490
[ 1314.235602] ? rcu_gp_kthread_wake+0x34/0x50
[ 1314.245262] ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.255724] rt_mutex_unlock+0x1e/0xb0
[ 1314.264610] rcu_read_unlock_special+0x580/0x5b0
[ 1314.274814] __rcu_read_unlock+0xa7/0xb0
[ 1314.283954] unlazy_walk+0xcf/0x1f0
[ 1314.292409] trailing_symlink+0x349/0x4e0
[ 1314.301583] path_openat+0x333/0x1280
[ 1314.310197] do_filp_open+0x67/0x140
[ 1314.318696] ? getname_kernel+0x23/0x1e0
[ 1314.327766] ? cache_alloc_debugcheck_after+0x13a/0x2a0
[ 1314.340076] ? getname_kernel+0x23/0x1e0
[ 1314.349179] do_open_execat+0xab/0x2a0
[ 1314.358063] open_exec+0x57/0x80
[ 1314.366128] load_script+0x33c/0x3d0
[ 1314.374556] ? kvm_sched_clock_read+0x9/0x20
[ 1314.384219] ? sched_clock+0x9/0x10
[ 1314.392611] ? sched_clock_cpu+0x1a/0x1e0
[ 1314.401875] ? _raw_read_unlock+0x55/0x90
[ 1314.411080] search_binary_handler+0xd9/0x160
[ 1314.420799] do_execveat_common+0x8f6/0xb10
[ 1314.430334] SyS_execve+0x1f/0x30
[ 1314.438458] do_int80_syscall_32+0x95/0x1b0
[ 1314.447956] entry_INT80_32+0x2f/0x2f
[ 1314.456606] EIP: 0xb7e9ab07
[ 1314.464062] EFLAGS: 00000296 CPU: 0
[ 1314.472421] EAX: ffffffda EBX: 0807b584 ECX: bfb0fd70 EDX: 08061250
[ 1314.485257] ESI: 0807b584 EDI: 00000000 EBP: bfb0fd58 ESP: bfb0fd28
[ 1314.498024] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 1314.613681] hotplug-call (1244) used greatest stack depth: 6384 bytes left
[ 1314.957636] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1316.955154] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 1318.197800] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1320.222754] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 1321.409456] BUG: unable to handle kernel paging request at 6b6b6f4f
[ 1321.421942] IP: vlan_device_event+0x7f5/0xa40
[ 1321.431239] *pde = 00000000
[ 1321.431267]
Attached is the full .config and dmesg. Please don't waste time on reproducing
-- it's not quite possible since it seems to only happen in one of our
host machine.
Thanks,
Fengguang
View attachment ".config" of type "text/plain" (126727 bytes)
View attachment "dmesg-vm-lkp-wsx03-openwrt-i386-10:20171109103638:i386-randconfig-b0-11061302-CONFIG_DECNET:4.13.0:1" of type "text/plain" (85549 bytes)
Powered by blists - more mailing lists