lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171109072222.7nybw6pqqzksp6gy@wfg-t540p.sh.intel.com>
Date:   Thu, 9 Nov 2017 15:22:22 +0800
From:   Fengguang Wu <fengguang.wu@...el.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
        Network Development <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        intel-wired-lan <intel-wired-lan@...ts.osuosl.org>,
        Girish Moodalbail <girish.moodalbail@...cle.com>,
        Cong Wang <xiyou.wangcong@...il.com>
Subject: Re: [vlan_device_event] BUG: unable to handle kernel paging request
 at 6b6b6ccf

On Thu, Nov 09, 2017 at 12:09:45PM +0800, Fengguang Wu wrote:
>On Thu, Nov 09, 2017 at 11:12:06AM +0800, Fengguang Wu wrote:
>>Hi Alex,
>>
>>>So looking over the trace the panic seems to be happening after a
>>>decnet interface is getting deleted. Is there any chance we could try
>>>compiling the kernel without decnet support to see if that is the
>>>source of these issues? I don't know if anyone on the Intel Wired Lan
>>>team is testing with that enabled so if we can eliminate that as a
>>>possible cause that would be useful.
>>
>>Sure and thank you for the suggestion!
>>
>>It looks disabling DECNET still triggers the vlan_device_event BUG.
>>However when looking at the dmesgs, I find another warning just before
>>the vlan_device_event BUG. Not sure if it's related one or independent
>>now-fixed issue.
>>
>>Please press Enter to activate this console.
>>[ 1291.938326] Writes:  Total: 2  Max/Min: 0/0   Fail: 0
>>[ 1297.731690] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>[ 1297.828227] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1300.506245] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1302.467460] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-10, MAC , kernel 4.13.0 1, serial console /dev/ttyS0
>>[ 1304.161688] Kernel tests: Boot OK!
>>[ 1306.558532] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1308.507499] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>[ 1310.526380] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1311.246017] LKP: waiting for network...
>>[ 1312.543432] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>[ 1313.985807]
>>[ 1313.991541] =====================================
>>[ 1314.002398] WARNING: bad unlock balance detected!
>>[ 1314.013154] 4.13.0 #1 Not tainted
>>[ 1314.021549] -------------------------------------
>>[ 1314.032505] procd/1244 is trying to release lock (rcu_preempt_state) at:
>>[ 1314.047216] [<c10e5840>] rcu_read_unlock_special+0x580/0x5b0
>>[ 1314.059825] but there are no more locks to release!
>>[ 1314.070546]
>>[ 1314.070546] other info that might help us debug this:
>>[ 1314.085941] 2 locks held by procd/1244:
>>[ 1314.095139]  #0:  (&sig->cred_guard_mutex){......}, at: [<c12587b8>] prepare_bprm_creds+0x28/0xc0
>>[ 1314.114616]  #1:  (rcu_read_lock){......}, at: [<c1260140>] path_init+0x490/0x6f0
>>[ 1314.132155]
>>[ 1314.132155] stack backtrace:
>>[ 1314.144402] CPU: 0 PID: 1244 Comm: procd Not tainted 4.13.0 #1
>>[ 1314.160197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>[ 1314.179404] Call Trace:
>>[ 1314.186768]  dump_stack+0x16/0x1c
>>[ 1314.195387]  print_unlock_imbalance_bug+0xb9/0xd0
>>[ 1314.205753]  ? rcu_read_unlock_special+0x580/0x5b0
>>[ 1314.216381]  ? rcu_read_unlock_special+0x580/0x5b0
>>[ 1314.226982]  lock_release+0x1cc/0x490
>>[ 1314.235602]  ? rcu_gp_kthread_wake+0x34/0x50
>>[ 1314.245262]  ? rcu_read_unlock_special+0x580/0x5b0
>>[ 1314.255724]  rt_mutex_unlock+0x1e/0xb0
>>[ 1314.264610]  rcu_read_unlock_special+0x580/0x5b0
>>[ 1314.274814]  __rcu_read_unlock+0xa7/0xb0
>>[ 1314.283954]  unlazy_walk+0xcf/0x1f0
>>[ 1314.292409]  trailing_symlink+0x349/0x4e0
>>[ 1314.301583]  path_openat+0x333/0x1280
>>[ 1314.310197]  do_filp_open+0x67/0x140
>>[ 1314.318696]  ? getname_kernel+0x23/0x1e0
>>[ 1314.327766]  ? cache_alloc_debugcheck_after+0x13a/0x2a0
>>[ 1314.340076]  ? getname_kernel+0x23/0x1e0
>>[ 1314.349179]  do_open_execat+0xab/0x2a0
>>[ 1314.358063]  open_exec+0x57/0x80
>>[ 1314.366128]  load_script+0x33c/0x3d0
>>[ 1314.374556]  ? kvm_sched_clock_read+0x9/0x20
>>[ 1314.384219]  ? sched_clock+0x9/0x10
>>[ 1314.392611]  ? sched_clock_cpu+0x1a/0x1e0
>>[ 1314.401875]  ? _raw_read_unlock+0x55/0x90
>>[ 1314.411080]  search_binary_handler+0xd9/0x160
>>[ 1314.420799]  do_execveat_common+0x8f6/0xb10
>>[ 1314.430334]  SyS_execve+0x1f/0x30
>>[ 1314.438458]  do_int80_syscall_32+0x95/0x1b0
>>[ 1314.447956]  entry_INT80_32+0x2f/0x2f
>>[ 1314.456606] EIP: 0xb7e9ab07
>>[ 1314.464062] EFLAGS: 00000296 CPU: 0
>>[ 1314.472421] EAX: ffffffda EBX: 0807b584 ECX: bfb0fd70 EDX: 08061250
>>[ 1314.485257] ESI: 0807b584 EDI: 00000000 EBP: bfb0fd58 ESP: bfb0fd28
>>[ 1314.498024]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
>>[ 1314.613681] hotplug-call (1244) used greatest stack depth: 6384 bytes left
>>[ 1314.957636] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1316.955154] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>[ 1318.197800] 8021q: adding VLAN 0 to HW filter on device eth0
>>[ 1320.222754] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>>[ 1321.409456] BUG: unable to handle kernel paging request at 6b6b6f4f
>>[ 1321.421942] IP: vlan_device_event+0x7f5/0xa40
>>[ 1321.431239] *pde = 00000000
>
>Note that this call trace is different from the ones posted in earlier
>emails.
>
>[ 1321.431267]
>[ 1321.443356] Oops: 0000 [#1] PREEMPT
>[ 1321.451390] CPU: 0 PID: 798 Comm: netifd Not tainted 4.13.0 #1
>[ 1321.462802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>[ 1321.479743] task: cf8ae5c0 task.stack: cf114000
>[ 1321.489345] EIP: vlan_device_event+0x7f5/0xa40
>[ 1321.498701] EFLAGS: 00010206 CPU: 0
>[ 1321.506683] EAX: 000000f9 EBX: 00000002 ECX: 00000001 EDX: 6b6b6b6b
>[ 1321.518893] ESI: 000014f9 EDI: cf3ec700 EBP: cf115dd8 ESP: cf115da4
>[ 1321.531044]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
>[ 1321.541906] CR0: 80050033 CR2: 6b6b6f4f CR3: 10a02000 CR4: 00000690
>[ 1321.554097] Call Trace:
>[ 1321.560423]  ? __local_bh_enable_ip+0xf6/0x1a0
>[ 1321.569987]  ? _raw_write_unlock_bh+0x2a/0x30
>[ 1321.579393]  notifier_call_chain+0x4e/0xa0
>[ 1321.588336]  raw_notifier_call_chain+0xc/0x10
>[ 1321.597604]  call_netdevice_notifiers_info+0x59/0x90
>[ 1321.607779]  __dev_notify_flags+0xea/0x130
>[ 1321.616697]  dev_change_flags+0x60/0x70
>[ 1321.625255]  dev_ifsioc+0x47b/0x5a0
>[ 1321.633354]  ? mutex_lock_nested+0x14/0x20
>[ 1321.642292]  dev_ioctl+0x36f/0xb20
>[ 1321.650403]  sock_ioctl+0x1cd/0x350
>[ 1321.658401]  ? sock_fasync+0xb0/0xb0
>[ 1321.666573]  vfs_ioctl+0x33/0x70
>[ 1321.674214]  do_vfs_ioctl+0x8d/0xc60
>[ 1321.682384]  ? kmem_cache_free+0x186/0x290
>[ 1321.691201]  ? putname+0x9f/0xe0
>[ 1321.698876]  ? putname+0x9f/0xe0
>[ 1321.706491]  ? do_sys_open+0x28d/0x420
>[ 1321.714920]  ? __fget_light+0xb7/0xf0
>[ 1321.723153]  SyS_ioctl+0x98/0xb0
>[ 1321.730961]  do_int80_syscall_32+0x95/0x1b0
>[ 1321.740064]  entry_INT80_32+0x2f/0x2f
>[ 1321.748335] EIP: 0xb7eb9648
>[ 1321.755299] EFLAGS: 00000246 CPU: 0
>[ 1321.763209] EAX: ffffffda EBX: 00000005 ECX: 00008914 EDX: bfed0640
>[ 1321.775289] ESI: bfed0640 EDI: bfed0660 EBP: bfed0678 ESP: bfed060c
>[ 1321.787449]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
>[ 1321.798394] Code: 8d f4 5d b3 c4 8d 56 01 89 14 8d f4 5d b3 c4 0f 85 03 02 00 00 89 7d d4 31 f6 8b 7d d8 e9 84 00 00 00 8d 74 26 00 25 ff 01 00 00 <8b> 1c 82 31 d2 85 db 0f 95 c2 8b 04 95 e0 5d b3 c4 83 c0 01 85
>[ 1321.833755] EIP: vlan_device_event+0x7f5/0xa40 SS:ESP: 0068:cf115da4
>[ 1321.846005] CR2: 000000006b6b6f4f
>
>>[ 1321.431267]
>>
>>Attached is the full .config and dmesg. Please don't waste time on reproducing
>>-- it's not quite possible since it seems to only happen in one of our
>>host machine.
>
>Here are the faddr2line parsed 4.11 call traces:
>
>[  194.685204] wait for background monitors: 1164 oom-killer
>[  196.712107] BUG: unable to handle kernel paging request at 6b6b6e73
>[  196.725844] IP: vlan_device_event+0x725/0xa30:
>vlan_device_event at net/8021q/vlan.h:59
>[  196.735827] *pde = 00000000
>[  196.735855]
>[  196.749561] Oops: 0000 [#1] PREEMPT
>[  196.769520] CPU: 0 PID: 797 Comm: netifd Not tainted 4.11.0 #1
>[  196.792728] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>[  196.821634] task: cfcc9000 task.stack: cfc7e000
>[  196.831969] EIP: vlan_device_event+0x725/0xa30:
>vlan_device_event at net/8021q/vlan.h:59
>[  196.842019] EFLAGS: 00010202 CPU: 0
>[  196.850925] EAX: 000000c2 EBX: 00000002 ECX: 00000000 EDX: 6b6b6b6b
>[  196.863985] ESI: 000008c2 EDI: d08c0f00 EBP: cfc7fdd4 ESP: cfc7fda0
>[  196.876463]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
>[  196.887903] CR0: 80050033 CR2: 6b6b6e73 CR3: 10f0b000 CR4: 00000690
>[  196.900970] Call Trace:
>[  196.907905]  ? dn_dev_delete+0x18c/0x1b0:
>dn_dev_delete at net/decnet/dn_dev.c:1219
>[  196.917749]  ? dn_dev_down+0x69/0x80:
>dn_dev_down at net/decnet/dn_dev.c:1235
>[  196.926701]  notifier_call_chain+0x4e/0xa0:
>notifier_call_chain at kernel/notifier.c:95 (discriminator 1)
>--
>[  197.042320]  do_vfs_ioctl+0x91/0xcc0:
>do_vfs_ioctl at fs/ioctl.c:685
>[  197.051621]  ? kmem_cache_free+0x124/0x270:
>kmem_cache_free at include/linux/rcupdate.h:981
>[  197.061767]  ? putname+0x9f/0xe0:
>putname at fs/namei.c:258
>[  197.080497]  ? putname+0x9f/0xe0:
>putname at fs/namei.c:258
>[  197.098679]  ? do_sys_open+0x271/0x3e0:
>do_sys_open at fs/open.c:1062
>[  197.108071]  ? __fget_light+0xb7/0xf0:
>__fget_light at fs/file.c:752 (discriminator 2)
>[  197.127385]  SyS_ioctl+0x96/0xb0:
>SyS_ioctl at fs/ioctl.c:700
>[  197.136262]  do_int80_syscall_32+0x95/0x1b0:
>do_int80_syscall_32 at arch/x86/entry/common.c:323
>[  197.146639]  entry_INT80_32+0x2f/0x2f:
>restore_all at arch/x86/entry/entry_32.S:530
>[  197.155513] EIP: 0xb7727648
>[  197.162874] EFLAGS: 00000246 CPU: 0
>[  197.171852] EAX: ffffffda EBX: 00000005 ECX: 00008914 EDX: bff1f5d0
>[  197.184875] ESI: bff1f5d0 EDI: bff1f5f0 EBP: bff1f608 ESP: bff1f59c
>[  197.198320]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
>[  197.210120] Code: 8d fc a1 8d c4 8d 56 01 89 14 8d fc a1 8d c4 0f 85 b4 02 00 00 89 7d d8 31 f6 8b 7d e0 e9 84 00 00 00 8d 74 26 00 25 ff 01 00 00 <8b> 1c 82 31 d2 85 db 0f 95 c2 8b 04 95 e8 a1 8d c4 83 c0 01 85
>[  197.249301] EIP: vlan_device_event+0x725/0xa30:
>vlan_device_event at net/8021q/vlan.h:59 SS:ESP: 0068:cfc7fda0
>[  197.262812] CR2: 000000006b6b6e73
>[  199.290882] /bin/busybox wget -q http://inn:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/scheduled/vm-lkp-wsx03-openwrt-i386-11/trinity-300s-openwrt-i386-2016-03-16.cgz-a351e9b9fc24e982ec2f0e76379a49826036da12-20171109-13235-ab6tcc-wait_kernel-4.yaml&loadavg=5.12%202.92%201.17%204/93%201360&start_time=1510191311&end_time=1510191314&version=/lkp/lkp/.src-20171109-011142& -O /dev/null
>[  199.356767] wget: socket(AF_INET,1,0): Address family not supported by protocol
>[  199.474622] ---[ end trace a8f7db573cc5216e ]---
>[  199.474631] Kernel panic - not syncing: Fatal exception
>[  199.474660] Kernel Offset: disabled
>[  199.534708] Rebooting in 3 seconds..

Here is yet another call trace, this time it's "NULL point dereference".

[  748.471023] procd: - init -
Please press Enter to activate this console.
[  753.540379] random: uci: uninitialized urandom read (6 bytes read)
[  754.030470] random: uci: uninitialized urandom read (6 bytes read)
[  754.245351] random: uci: uninitialized urandom read (6 bytes read)
[  757.258323] Writes:  Total: 2  Max/Min: 0/0   Fail: 0
[  760.340612] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  760.448681] 8021q: adding VLAN 0 to HW filter on device eth0
[  762.403927] 8021q: adding VLAN 0 to HW filter on device eth0
[  764.380369] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-18, MAC , kernel 4.11.0-08060-g8d5e72d 1, serial console /dev/ttyS0
[  765.018938] Kernel tests: Boot OK!
[  766.118342] BUG: unable to handle kernel NULL pointer dereference at 00000639
[  766.127471] IP: vlan_device_event+0x7f5/0xa40
[  766.133669] *pde = 00000000
[  766.133694]
[  766.141020] Oops: 0000 [#1] PREEMPT
[  766.146171] CPU: 0 PID: 832 Comm: netifd Not tainted 4.11.0-08060-g8d5e72d #1
[  766.154925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[  766.166422] task: d1ac3000 task.stack: d2952000
[  766.172581] EIP: vlan_device_event+0x7f5/0xa40
[  766.178581] EFLAGS: 00010206 CPU: 0
[  766.183537] EAX: 0000018e EBX: 00000002 ECX: 00000000 EDX: 00000001
[  766.191471] ESI: 0000038e EDI: d1f0ed80 EBP: d2953dd8 ESP: d2953da4
[  766.199453]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[  766.206997] CR0: 80050033 CR2: 00000639 CR3: 1199e000 CR4: 00000690
[  766.216394] Call Trace:
[  766.220644]  ? dn_dev_delete+0x138/0x1d0
[  766.226253]  ? dn_dev_down+0x69/0x80
[  766.231402]  notifier_call_chain+0x4e/0xa0
[  766.237335]  raw_notifier_call_chain+0xc/0x10
[  766.243331]  call_netdevice_notifiers_info+0x59/0x90
[  766.250689]  __dev_notify_flags+0xea/0x130
[  766.257746]  dev_change_flags+0x60/0x70
[  766.265073]  dev_ifsioc+0x44b/0x570
[  766.271028]  ? mutex_lock_nested+0x14/0x20
[  766.278162]  dev_ioctl+0x399/0xa70
[  766.284317]  ? slow_virt_to_phys+0x3e/0x110
[  766.291227]  sock_ioctl+0x1cd/0x350
[  766.297515]  ? sock_fasync+0xb0/0xb0
[  766.304019]  vfs_ioctl+0x33/0x70
[  766.309894]  do_vfs_ioctl+0x8d/0xc30
[  766.316242]  ? kmem_cache_free+0x186/0x290
[  766.323504]  ? putname+0x9f/0xe0
[  766.330082]  ? putname+0x9f/0xe0
[  766.335780]  ? do_sys_open+0x28d/0x420
[  766.342393]  ? __fget_light+0xb7/0xf0
[  766.348578]  SyS_ioctl+0x98/0xb0
[  766.353281]  do_int80_syscall_32+0x95/0x1b0
[  766.359117]  entry_INT80_32+0x2f/0x2f
[  766.364295] EIP: 0xb7766648
[  766.368512] EFLAGS: 00000246 CPU: 0
[  766.373497] EAX: ffffffda EBX: 00000005 ECX: 00008914 EDX: bfdae180
[  766.381443] ESI: bfdae180 EDI: bfdae1a0 EBP: bfdae1b8 ESP: bfdae14c
[  766.389775]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[  766.396967] Code: 8d a4 80 9b c4 8d 56 01 89 14 8d a4 80 9b c4 0f 85 03 02 00 00 89 7d d4 31 f6 8b 7d d8 e9 84 00 00 00 8d 74 26 00 25 ff 01 00 00 <8b> 1c 82 31 d2 85 db 0f 95 c2 8b 04 95 90 80 9b c4 83 c0 01 85
[  766.420978] EIP: vlan_device_event+0x7f5/0xa40 SS:ESP: 0068:d2953da4
[  766.428984] CR2: 0000000000000639
[  768.045479] ---[ end trace 4eaaf76db348fd80 ]---
[  768.105484] Kernel panic - not syncing: Fatal exception
[  768.112938] Kernel Offset: disabled
[  768.118386] Rebooting in 3 seconds..

kernel NULL pointer

View attachment "dmesg-vm-lkp-wsx03-openwrt-i386-18:20171108184543:i386-randconfig-b0-11061302:4.11.0-08060-g8d5e72d:1" of type "text/plain" (90115 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ