[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D2634DE.2060907@gmail.com>
Date: Thu, 06 Jan 2011 21:32:14 +0000
From: Iain Paton <selsinork@...il.com>
To: netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: 2.6.37 vlans on bnx2 not functional, panic with tcpdump
Hi,
vlans don't appear to be functional on my HP DL380G6 with onboard bnx2 adapter using vanilla 2.6.37 kernel. No tagged vlan traffic
is arriving at the vlan interface.
To reproduce, use vanilla 2.6.37 built with the attached config
ip link add link eth0 name v406 type vlan id 406
ip link set up dev eth0
ip link set up dev v406
ip addr add 10.251.0.3/16 dev v406
from another machine on the same vlan run a ping to 10.251.0.3, ping returns destination host unreachable.
tcpdump -n -e -i v406 shows no traffic.
If I then run
tcpdump -n -e -i eth0
while the ping is still running I get
[ 112.190114] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 112.198912] IP: [<ffffffff813f5a34>] __skb_recv_datagram+0x124/0x2a0
[ 112.214203] PGD 31fa05067 PUD 31fb51067 PMD 0
[ 112.220207] Oops: 0002 [#1] SMP
[ 112.228949] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:01:04.6/class
[ 112.248201] CPU 0
[ 112.251342] Modules linked in: 8021q garp stp llc
[ 112.269199]
[ 112.269692] Pid: 1370, comm: rpc.statd Not tainted 2.6.37-64 #1 /ProLiant DL380 G6
[ 112.275164] RIP: 0010:[<ffffffff813f5a34>] [<ffffffff813f5a34>] __skb_recv_datagram+0x124/0x2a0
[ 112.293143] RSP: 0018:ffff88031fbd5a88 EFLAGS: 00010046
[ 112.300238] RAX: 0000000000000246 RBX: 0000000000000000 RCX: ffff88019f91a8c0
[ 112.319307] RDX: ffff88019f94b500 RSI: ffff88031fbd5b44 RDI: ffff88019f91a8d4
[ 112.329271] RBP: ffff88031fbd5b28 R08: 0000000000000000 R09: 0000000000001000
[ 112.339123] R10: 0000000000000000 R11: 0000000000000246 R12: ffff88031fbd5ac8
[ 112.360207] R13: ffff88019f91a8c0 R14: ffff88031fbd5ae0 R15: ffff88019f91a8d4
[ 112.363278] FS: 00007ff90cd77700(0000) GS:ffff8800d7200000(0000) knlGS:0000000000000000
[ 112.366216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 112.368311] CR2: 0000000000000008 CR3: 000000031fb1a000 CR4: 00000000000006f0
[ 112.375050] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 112.379300] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 112.381913] Process rpc.statd (pid: 1370, threadinfo ffff88031fbd4000, task ffff88031fe00000)
[ 112.387746] Stack:
[ 112.388474] ffff88031fbd5ae8 ffffffff8141eb50 7fffffffffffffff ffff88031fe00000
[ 112.404676] ffff88031fbd5bc4 ffff88031fbd5b44 000000001fbd5b88 ffff88031fe00000
[ 112.411531] ffff88019f91a800 ffff88019f8be600 000000000000055a ffffffff81b8d780
[ 112.415367] Call Trace:
[ 112.416223] [<ffffffff8141eb50>] ? netlink_dump+0x1a0/0x200
[ 112.418289] [<ffffffff8141fa4d>] ? netlink_dump_start+0x18d/0x1b0
[ 112.420439] [<ffffffff813f5bcf>] skb_recv_datagram+0x1f/0x30
[ 112.422686] [<ffffffff8141eeec>] netlink_recvmsg+0x7c/0x440
[ 112.424846] [<ffffffff813f1ea2>] ? __kfree_skb+0x42/0xa0
[ 112.444125] [<ffffffff813e9cf8>] sock_recvmsg+0xf8/0x130
[ 112.449889] [<ffffffff814600bf>] ? inet_sendmsg+0x5f/0xb0
[ 112.451876] [<ffffffff813e9e7e>] ? sock_sendmsg+0xee/0x130
[ 112.454213] [<ffffffff810b6869>] ? __do_fault+0x3b9/0x4a0
[ 112.456961] [<ffffffff810a7b48>] ? lru_cache_add_lru+0x28/0x50
[ 112.481554] [<ffffffff813e8769>] ? might_fault+0x9/0x10
[ 112.483393] [<ffffffff813e9964>] ? move_addr_to_user+0x84/0xa0
[ 112.485667] [<ffffffff813ea04d>] __sys_recvmsg+0x13d/0x2b0
[ 112.492094] [<ffffffff8141ff2e>] ? netlink_table_ungrab+0x2e/0x30
[ 112.512132] [<ffffffff8141ffb9>] ? netlink_insert+0x89/0x160
[ 112.514165] [<ffffffff813eae40>] ? move_addr_to_kernel+0x50/0x60
[ 112.531071] [<ffffffff813eb6b4>] ? sys_sendto+0x104/0x140
[ 112.541470] [<ffffffff813e9964>] ? move_addr_to_user+0x84/0xa0
[ 112.549085] [<ffffffff813eb4c2>] ? sys_getsockname+0xa2/0xc0
[ 112.569417] [<ffffffff813ebe14>] sys_recvmsg+0x44/0x90
[ 112.571711] [<ffffffff81002552>] system_call_fastpath+0x16/0x1b
[ 112.573894] Code: 00 00 00 e9 4f ff ff ff 0f 1f 80 00 00 00 00 ff 8b d0 00 00 00 48 8b 1a 48 8b 4a 08 48 c7 02 00 00 00 00 48 c7
42 08 00 00 00 00 <48> 89 4b 08 48 89 19 e9 7c ff ff ff 31 c0 87 87 64 01 00 00 f7
[ 112.592492] RIP [<ffffffff813f5a34>] __skb_recv_datagram+0x124/0x2a0
[ 112.603649] RSP <ffff88031fbd5a88>
[ 112.607123] CR2: 0000000000000008
[ 112.609064] ---[ end trace f6cbe3b43db03698 ]---
The stack dump isn't always the same, sometimes I'll see
[ 236.078335] general protection fault: 0000 [#1] SMP
and the dump shows scsi/blk or xfs or cpu_idle etc. so I don't know how relevant this particular dump is.
What's consistent is that running tcpdump against eth0 while there's tagged traffic arriving on eth0 will kill the kernel.
If I don't run tcpdump, the machine will stay up, but it's not much use if it can't use the network.
On 2.6.36 I needed the patch from http://patchwork.ozlabs.org/patch/69516/ to prevent a similar looking immediate crash on boot. I
don't have logs from that to compare with and I know the vlan code has changed quite a bit since then.
The same issue has been duplicated on two physically different servers, so hopefully not hardware related. The full boot log from
this latest attempt is attached.
Iain
Download attachment "2.6.37-64-config.gz" of type "application/gzip" (20981 bytes)
View attachment "2.6.37-vlan.txt" of type "text/plain" (42728 bytes)
Powered by blists - more mailing lists