linux-kernel - Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171109031206.x6ta5ysdalf3lk3s@wfg-t540p.sh.intel.com>
Date:   Thu, 9 Nov 2017 11:12:06 +0800
From:   Fengguang Wu <fengguang.wu@...el.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
        Network Development <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        intel-wired-lan <intel-wired-lan@...ts.osuosl.org>
Subject: Re: [vlan_device_event] BUG: unable to handle kernel paging request
 at 6b6b6ccf

Hi Alex,

>So looking over the trace the panic seems to be happening after a
>decnet interface is getting deleted. Is there any chance we could try
>compiling the kernel without decnet support to see if that is the
>source of these issues? I don't know if anyone on the Intel Wired Lan
>team is testing with that enabled so if we can eliminate that as a
>possible cause that would be useful.

Sure and thank you for the suggestion!

It looks disabling DECNET still triggers the vlan_device_event BUG.
However when looking at the dmesgs, I find another warning just before
the vlan_device_event BUG. Not sure if it's related one or independent
now-fixed issue.

Please press Enter to activate this console.
[ 1291.938326] Writes:  Total: 2  Max/Min: 0/0   Fail: 0
[ 1297.731690] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 1297.828227] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1300.506245] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1302.467460] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-10, MAC , kernel 4.13.0 1, serial console /dev/ttyS0
[ 1304.161688] Kernel tests: Boot OK!
[ 1306.558532] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1308.507499] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 1310.526380] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1311.246017] LKP: waiting for network...
[ 1312.543432] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 1313.985807]
[ 1313.991541] =====================================
[ 1314.002398] WARNING: bad unlock balance detected!
[ 1314.013154] 4.13.0 #1 Not tainted
[ 1314.021549] -------------------------------------
[ 1314.032505] procd/1244 is trying to release lock (rcu_preempt_state) at:
[ 1314.047216] [<c10e5840>] rcu_read_unlock_special+0x580/0x5b0
[ 1314.059825] but there are no more locks to release!
[ 1314.070546]
[ 1314.070546] other info that might help us debug this:
[ 1314.085941] 2 locks held by procd/1244:
[ 1314.095139]  #0:  (&sig->cred_guard_mutex){......}, at: [<c12587b8>] prepare_bprm_creds+0x28/0xc0
[ 1314.114616]  #1:  (rcu_read_lock){......}, at: [<c1260140>] path_init+0x490/0x6f0
[ 1314.132155]
[ 1314.132155] stack backtrace:
[ 1314.144402] CPU: 0 PID: 1244 Comm: procd Not tainted 4.13.0 #1
[ 1314.160197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 1314.179404] Call Trace:
[ 1314.186768]  dump_stack+0x16/0x1c
[ 1314.195387]  print_unlock_imbalance_bug+0xb9/0xd0
[ 1314.205753]  ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.216381]  ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.226982]  lock_release+0x1cc/0x490
[ 1314.235602]  ? rcu_gp_kthread_wake+0x34/0x50
[ 1314.245262]  ? rcu_read_unlock_special+0x580/0x5b0
[ 1314.255724]  rt_mutex_unlock+0x1e/0xb0
[ 1314.264610]  rcu_read_unlock_special+0x580/0x5b0
[ 1314.274814]  __rcu_read_unlock+0xa7/0xb0
[ 1314.283954]  unlazy_walk+0xcf/0x1f0
[ 1314.292409]  trailing_symlink+0x349/0x4e0
[ 1314.301583]  path_openat+0x333/0x1280
[ 1314.310197]  do_filp_open+0x67/0x140
[ 1314.318696]  ? getname_kernel+0x23/0x1e0
[ 1314.327766]  ? cache_alloc_debugcheck_after+0x13a/0x2a0
[ 1314.340076]  ? getname_kernel+0x23/0x1e0
[ 1314.349179]  do_open_execat+0xab/0x2a0
[ 1314.358063]  open_exec+0x57/0x80
[ 1314.366128]  load_script+0x33c/0x3d0
[ 1314.374556]  ? kvm_sched_clock_read+0x9/0x20
[ 1314.384219]  ? sched_clock+0x9/0x10
[ 1314.392611]  ? sched_clock_cpu+0x1a/0x1e0
[ 1314.401875]  ? _raw_read_unlock+0x55/0x90
[ 1314.411080]  search_binary_handler+0xd9/0x160
[ 1314.420799]  do_execveat_common+0x8f6/0xb10
[ 1314.430334]  SyS_execve+0x1f/0x30
[ 1314.438458]  do_int80_syscall_32+0x95/0x1b0
[ 1314.447956]  entry_INT80_32+0x2f/0x2f
[ 1314.456606] EIP: 0xb7e9ab07
[ 1314.464062] EFLAGS: 00000296 CPU: 0
[ 1314.472421] EAX: ffffffda EBX: 0807b584 ECX: bfb0fd70 EDX: 08061250
[ 1314.485257] ESI: 0807b584 EDI: 00000000 EBP: bfb0fd58 ESP: bfb0fd28
[ 1314.498024]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 1314.613681] hotplug-call (1244) used greatest stack depth: 6384 bytes left
[ 1314.957636] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1316.955154] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 1318.197800] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1320.222754] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 1321.409456] BUG: unable to handle kernel paging request at 6b6b6f4f
[ 1321.421942] IP: vlan_device_event+0x7f5/0xa40
[ 1321.431239] *pde = 00000000
[ 1321.431267]

Attached is the full .config and dmesg. Please don't waste time on reproducing
-- it's not quite possible since it seems to only happen in one of our
host machine.

Thanks,
Fengguang

View attachment ".config" of type "text/plain" (126727 bytes)

View attachment "dmesg-vm-lkp-wsx03-openwrt-i386-10:20171109103638:i386-randconfig-b0-11061302-CONFIG_DECNET:4.13.0:1" of type "text/plain" (85549 bytes)