netdev - 3.5 bridging regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121021112727.GF21937@1wt.eu>
Date:	Sun, 21 Oct 2012 13:27:28 +0200
From:	Willy Tarreau <w@....eu>
To:	Eric Dumazet <edumazet@...gle.com>
Cc:	netdev@...r.kernel.org
Subject: 3.5 bridging regression

Hi Eric,

Since 3.5, I was getting very quick panic when setting up a bridge
on my guruplug (dual-gig equipped ARM system). This week-end I could
bisect the issue and found that it is this patch which introduced the
issue :

  a1c7fff7e18f59e684e07b0f9a770561cd39f395  net: netdev_alloc_skb() use build_skb()

I can reliably reproduce the issue by installing my laptop behind this
bridge and running "find" on an NFS mount. I don't understand why this
patch can cause this, I was thinking that maybe we free the same page
twice or something like this but I don't see any such thing there.

I set up the bridge using this script :

	brctl addbr br0
	brctl addif br0 eth0
	brctl addif br0 eth1
	ip a f dev eth0
	ip a f dev eth1
	ip li set eth0 up
	ip li set eth1 up
	ip li set br0 up

The network driver is mv643xx. I don't know if this is important, but
since this issue is still present in 3.6.2 and nobody has yet reported
a panic on bridge, I suspect that it may contribute to the issue.

I'm also the config (only non-disabled options), I could strip it down
enough to reduce the possibilities. This config causes the panic with
the patch and does not without. I don't know what else to look for,
I'm open to any idea you might have.

The panic looks like this :

------------[ cut here ]------------
kernel BUG at mm/slab.c:505!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0    Not tainted  (3.5.0-fail #17)
PC is at kfree+0x8c/0xa8
LR is at __kfree_skb+0x14/0xc8
pc : [<8008bee8>]    lr : [<801bad84>]    psr: 40000093
sp : 8035fe08  ip : 00000000  fp : ffdf8480
r10: 803691a8  r9 : 00f82802  r8 : 80403900
r7 : 00000001  r6 : a0000013  r5 : 9fb776e0  r4 : 8040b000
r3 : 80801ee0  r2 : 00000000  r1 : 00000000  r0 : 00000000
Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005397f  Table: 1f074000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0x8035e270)
Stack: (0x8035fe08 to 0x80360000)
fe00:                   9fb749e0 9fb749e0 9f9b0400 801bad84 9f9b1918 801aa740
fe20: 807ed620 00000010 00000000 9f96e400 00000002 9f9b0400 00000000 9f9b1918
fe40: 00000038 803804e0 00000000 00000000 00000001 801ab1c8 622af000 ffffffff
fe60: 0000001d 00000080 00000010 00001514 9f9b04d4 001312cf af6a5400 ffffffff
fe80: 00000000 00989680 001e8480 801aaef8 9f9b04d4 00000080 0000012c 803804e0
fea0: 803804e8 8036f1f0 ffff91f1 801c61bc 00000000 803804e0 00000000 00000001
fec0: 8040166c 8035e000 00000100 80401660 80370138 80401640 0000000a 80024cbc
fee0: 00000000 80370e88 0000000f 00000003 8035ff64 8035e000 0000000f 00000000
ff00: 8035ff64 00000000 56251311 8036b1f8 00000000 800250e8 8037608c 8000fd90
ff20: 801b1150 20000013 fed20200 8000eaf4 8035ff78 20000013 0002dffa 00000004
ff40: 3e957f9a 00000004 8036b470 00000000 00000000 56251311 8036b1f8 00000000
ff60: 9fffffae 8035ff78 80044de0 801b1150 20000013 ffffffff 3e985f94 00000004
ff80: 00000000 8035e000 00000000 00000000 00000000 8036b470 804070e8 8036b1f8
ffa0: 003592b0 801b0f74 8035e000 803806c8 80369e54 80369e4c 00004000 800100c8
ffc0: 80366108 8035a678 8080c0c0 80340738 00000000 00000000 803401d4 00000000
ffe0: 00000000 8035a678 00053975 80366044 8035a674 00008040 00000000 00000000
[<8008bee8>] (kfree+0x8c/0xa8) from [<801bad84>] (__kfree_skb+0x14/0xc8)
[<801bad84>] (__kfree_skb+0x14/0xc8) from [<801aa740>] (txq_reclaim+0x198/0x244)
[<801aa740>] (txq_reclaim+0x198/0x244) from [<801ab1c8>] (mv643xx_eth_poll+0x2d0/0x71c)
[<801ab1c8>] (mv643xx_eth_poll+0x2d0/0x71c) from [<801c61bc>] (net_rx_action+0xb0/0x188)
[<801c61bc>] (net_rx_action+0xb0/0x188) from [<80024cbc>] (__do_softirq+0x90/0x120)
[<80024cbc>] (__do_softirq+0x90/0x120) from [<800250e8>] (irq_exit+0x7c/0x84)
[<800250e8>] (irq_exit+0x7c/0x84) from [<8000fd90>] (handle_IRQ+0x34/0x84)
[<8000fd90>] (handle_IRQ+0x34/0x84) from [<8000eaf4>] (__irq_svc+0x34/0x80)
[<8000eaf4>] (__irq_svc+0x34/0x80) from [<801b1150>] (cpuidle_wrap_enter+0x54/0x9c)
[<801b1150>] (cpuidle_wrap_enter+0x54/0x9c) from [<801b0f74>] (cpuidle_idle_call+0x9c/0x130)
[<801b0f74>] (cpuidle_idle_call+0x9c/0x130) from [<800100c8>] (cpu_idle+0x88/0xd4)
[<800100c8>] (cpu_idle+0x88/0xd4) from [<80340738>] (start_kernel+0x298/0x2ec)
Code: e7845101 e5840000 e121f006 e8bd8070 (e7f001f2)
---[ end trace 879b0e636889a6d4 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 1 seconds..

Thanks,
Willy


View attachment "config-arm-crash.txt" of type "text/plain" (8313 bytes)