[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171108094832.qxvkawpw2snpcbvh@wfg-t540p.sh.intel.com>
Date: Wed, 8 Nov 2017 17:48:32 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Network Development <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [vlan_device_event] BUG: unable to handle kernel paging request
at 6b6b6ccf
On Tue, Nov 07, 2017 at 08:25:03AM -0800, Linus Torvalds wrote:
>On Tue, Nov 7, 2017 at 2:21 AM, Fengguang Wu <fengguang.wu@...el.com> wrote:
>>
>> FYI this happens in v4.14-rc8 -- it's not necessarily a new bug.
>
>Probably not.
>
>Looks like a use-after-free bug in vlan_device_event() judging by the
>base pointer:
>
> ECX: 6b6b6b6b
>
>this is one of those circumstances where having the faddr2line output
>for that EIP would make it much easier to see exactly which access it
>is that causes problems. There's lots of inlining going on, so without
>that it's a pain to figure out.
>
>The code is
>
> 0: 31 c0 xor %eax,%eax
> 2: 8d 76 00 lea 0x0(%esi),%esi
> 5: 89 c2 mov %eax,%edx
> 7: 89 c3 mov %eax,%ebx
> 9: 81 e2 ff 0f 00 00 and $0xfff,%edx
> f: 89 d1 mov %edx,%ecx
> 11: c1 fb 0c sar $0xc,%ebx
> 14: c1 e9 09 shr $0x9,%ecx
> 17: 8d 0c d9 lea (%ecx,%ebx,8),%ecx
> 1a: 8b 4c 8e 10 mov 0x10(%esi,%ecx,4),%ecx
> 1e: 85 c9 test %ecx,%ecx
> 20: 74 34 je 0x56
> 22: 81 e2 ff 01 00 00 and $0x1ff,%edx
> 28:* 8b 14 91 mov (%ecx,%edx,4),%edx <-- trapping instruction
> 2b: 85 d2 test %edx,%edx
> 2d: 74 27 je 0x56
> 2f: f6 82 30 01 00 00 01 testb $0x1,0x130(%edx)
> 36: 74 1e je 0x56
>
>and just by going by the constants in question (0xfff and 0x1ff), I
>can see that it's one of
>
> vlan_group_for_each_dev(..) {
> ...
> }
>
>things, but that's pretty much all I can tell.
>
>Apparently we'll get that faddr2line output soon. In the meantime, I
>think this is a real bug report but I don't see enough information to
>really go on.
Now I got the faddr2line output. :)
[ 737.421306] 8021q: adding VLAN 0 to HW filter on device eth0
[ 740.106437] Writes: Total: 2 Max/Min: 0/0 Fail: 0
[ 740.613618] 8021q: adding VLAN 0 to HW filter on device eth0
[ 742.651266] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-12, MAC , kernel 4.14.0-rc8 1, serial console /dev/ttyS0
[ 745.719623] BUG: unable to handle kernel paging request at 6b6b6f4f
[ 745.732871] IP: vlan_device_event at net/8021q/vlan.h:60
[ 745.742106] *pde = 00000000
[ 745.748587] Oops: 0000 [#1] PREEMPT
[ 745.756104] CPU: 0 PID: 786 Comm: netifd Not tainted 4.14.0-rc8 #1
[ 745.769171] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 745.786791] task: cf768780 task.stack: d187a000
[ 745.796485] EIP: vlan_device_event at net/8021q/vlan.h:60
[ 745.805877] EFLAGS: 00010206 CPU: 0
[ 745.813237] EAX: 000000f9 EBX: 00000002 ECX: 00000000 EDX: 6b6b6b6b
[ 745.825774] ESI: 000002f9 EDI: d1de3700 EBP: d187bdd8 ESP: d187bda4
[ 745.838871] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 745.850218] CR0: 80050033 CR2: 6b6b6f4f CR3: 0f4c8000 CR4: 00000690
[ 745.862750] Call Trace:
[ 745.868650] ? dn_dev_delete at net/decnet/dn_dev.c:1224
[ 745.876751] ? dn_dev_down at net/decnet/dn_dev.c:1240
[ 745.885084] notifier_call_chain at kernel/notifier.c:95 (discriminator 1)
[ 745.894254] raw_notifier_call_chain at kernel/notifier.c:402
[ 745.903979] call_netdevice_notifiers_info at net/core/dev.c:1672
[ 745.914670] __dev_notify_flags at net/core/dev.c:1687
[ 745.923446] dev_change_flags at net/core/dev.c:6813
[ 745.931679] dev_ifsioc at net/core/dev_ioctl.c:257
[ 745.939102] ? mutex_lock_nested at kernel/locking/mutex.c:909
[ 745.948173] dev_ioctl at net/core/dev_ioctl.c:566
[ 745.956154] sock_ioctl at net/socket.c:968
[ 745.964313] ? sock_ioctl at net/socket.c:984
[ 745.972512] vfs_ioctl at fs/ioctl.c:47
[ 745.979867] do_vfs_ioctl at fs/ioctl.c:690
[ 745.987782] ? kmem_cache_free at include/linux/rcupdate.h:777
[ 745.996138] ? putname at fs/namei.c:259
[ 746.003434] ? putname at fs/namei.c:259
[ 746.011240] ? do_sys_open at fs/open.c:1069
[ 746.019728] ? __fget_light at fs/file.c:739 (discriminator 2)
[ 746.029292] SyS_ioctl+0x98/0xb0
[ 746.036680] do_int80_syscall_32 at arch/x86/entry/common.c:329
[ 746.045685] restore_all at arch/x86/entry/entry_32.S:536
[ 746.053427] EIP: 0xb7e97648
[ 746.059907] EFLAGS: 00000246 CPU: 0
[ 746.068336] EAX: ffffffda EBX: 00000005 ECX: 00008914 EDX: bfcaa350
[ 746.081238] ESI: bfcaa350 EDI: bfcaa370 EBP: bfcaa388 ESP: bfcaa31c
[ 746.093449] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 746.103600] Code: 8d e0 db b4 c4 8d 56 01 89 14 8d e0 db b4 c4 0f 85 03 02 00 00 89 7d d4 31 f6 8b 7d d8 e9 84 00 00 00 8d 74 26 00 25 ff 01 00 00 <8b> 1c 82 31 d2 85 db 0f 95 c2 8b 04 95 cc db b4 c4 83 c0 01 85
[ 746.140089] EIP: vlan_device_event at net/8021q/vlan.h:60 SS:ESP: 0068:d187bda4
[ 746.153505] CR2: 000000006b6b6f4f
[ 746.413297] Kernel tests: Boot OK!
[ 748.237463] ---[ end trace 40505af7d815b57d ]---
[ 748.281157] Kernel panic - not syncing: Fatal exception
Attached are the stack traces for 23 bad boots and one full dmesg.
>Of course, if it's bisectable, that would be great too.
Yes, bisect is on the way. So far it's bisecting in the 4.12 commits.
Regards,
Fengguang
View attachment "vlan_device_event-traces" of type "text/plain" (62805 bytes)
View attachment "dmesg-vm-lkp-wsx03-openwrt-i386-12:20171108162400:i386-randconfig-b0-11061302+CONFIG_DEBUG_INFO_REDUCED:4.14.0-rc8:1" of type "text/plain" (83665 bytes)
Powered by blists - more mailing lists