lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <5697D833.5010506@hpe.com>
Date:	Thu, 14 Jan 2016 10:17:39 -0700
From:	John <john.phillips5@....com>
To:	netdev@...r.kernel.org
Cc:	tom@...bertland.com, david.roth@....com
Subject: Kernel memory leak in bnx2x driver with vxlan tunnel

I'm getting what seems to be a kernel memory leak while doing a TCP
throughput test between two VMs on identical systems, in order to test a
broadcom NIC's performance with a kernel 4.4.0-rc8 and OpenVSwitch version
2.4.90. The host system of the receiving (server) VM leaks memory during the
throughput test. The memory leaks fast enough to make the system completely
unusable within five minutes. Once I stop the throughput test, the 
memory stops
leaking. A couple of times, the kernel on the host system has actually 
killed
the qemu process for me, but this doesn't happen reliably. The leaked memory
doesn't become available again even after the VM is killed.

To investigate this, I compiled a 4.4.0-rc8 kernel with kmemleak. I can 
scan the
leaking system during and after killing the throughput test and get the
following stack trace over and over again:

unreferenced object 0xffff880464f11488 (size 256):
   comm "softirq", pid 0, jiffies 4312675043 (age 379.184s)
   hex dump (first 32 bytes):
     6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
   backtrace:
     [<ffffffff81550e28>] kmemleak_alloc+0x28/0x50
     [<ffffffff811a8bac>] __kmalloc+0x11c/0x2a0
     [<ffffffff8146972e>] metadata_dst_alloc+0x1e/0x40
     [<ffffffff814ed476>] udp_tun_rx_dst+0x126/0x1c0
     [<ffffffff8140be68>] vxlan_udp_encap_recv+0x148/0xb10
     [<ffffffff814cabe9>] udp_queue_rcv_skb+0x1e9/0x480
     [<ffffffff814cb77c>] __udp4_lib_rcv+0x45c/0x700
     [<ffffffff814cbf0a>] udp_rcv+0x1a/0x20
     [<ffffffff8149ae54>] ip_local_deliver_finish+0x94/0x1e0
     [<ffffffff8149b150>] ip_local_deliver+0x60/0xd0
     [<ffffffff8149ab39>] ip_rcv_finish+0x99/0x320
     [<ffffffff8149b41e>] ip_rcv+0x25e/0x380
     [<ffffffff814602db>] __netif_receive_skb_core+0x2cb/0xa00
     [<ffffffff81460a26>] __netif_receive_skb+0x16/0x70
     [<ffffffff81460aa3>] netif_receive_skb_internal+0x23/0x80
     [<ffffffff814615f5>] napi_gro_receive+0xa5/0xd0

I pulled down the kernel tree from
http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, and 
did a git
bisect and got this:

58ce31cca1ffe057f4744c3f671e3e84606d3d4a is the first bad commit
commit 58ce31cca1ffe057f4744c3f671e3e84606d3d4a
Author: Tom Herbert <tom@...bertland.com>
Date:   Wed Aug 19 17:07:33 2015 -0700

     vxlan: GRO support at tunnel layer

     Add calls to gro_cells infrastructure to do GRO when receiving on a 
tunnel.

     Testing:

     Ran 200 netperf TCP_STREAM instance

       - With fix (GRO enabled on VXLAN interface)

         Verify GRO is happening.

         9084 MBps tput
         3.44% CPU utilization

       - Without fix (GRO disabled on VXLAN interface)

         Verified no GRO is happening.

         9084 MBps tput
         5.54% CPU utilization

     Signed-off-by: Tom Herbert <tom@...bertland.com>
     Signed-off-by: David S. Miller <davem@...emloft.net>

:040000 040000 a7d49cb2e24ebddf620c01e27515cc756b32e46f 
c3951c16da75ff3e0db1322b8ccb3e61975b1242 M      drivers
:040000 040000 f36442958138eafdd472c58d06ea35be66990aa1 
0e29d513e575dd11f459c59df71e05db074363de M      include

For the test I'm using two HP Proliant dl360gen9's. I put two matching 
broadcom
PCIe cards in each machine and ran throughput tests between two VMs on 
either
machine, using the throughput testing program iperf3. On each host we 
had a qemu
VM attached to an OVS bridge; these bridges are connected over a VxLAN 
tunnel as
detailed here: https://community.mellanox.com/docs/DOC-1446.

The test went well with an Intel Niantic NIC, and I saw high (8.04 Gb/s)
throughput over an eighteen hour throughput test. There was no memory leak.
However, when I put in a Broadcom NIC on both systems I get the above memory
leak, if they have a VM on the receving end of the test. lspci -v output 
for the
Broadcom NIC below:

04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 
10 Gigabit Ethernet (rev 10)
         Subsystem: Hewlett-Packard Company HP FlexFabric 10Gb 2-port 
534FLR-SFP+ Adapter
         Flags: bus master, fast devsel, latency 0
         Memory at 97000000 (64-bit, prefetchable) [size=8M]
         Memory at 96800000 (64-bit, prefetchable) [size=8M]
         Memory at 98800000 (64-bit, prefetchable) [size=64K]
         [virtual] Expansion ROM at 98880000 [disabled] [size=512K]
         Capabilities: [48] Power Management version 3
         Capabilities: [50] Vital Product Data
         Capabilities: [a0] MSI-X: Enable+ Count=32 Masked-
         Capabilities: [ac] Express Endpoint, MSI 00
         Capabilities: [100] Advanced Error Reporting
         Capabilities: [13c] Device Serial Number 14-58-d0-ff-fe-52-5b-d8
         Capabilities: [150] Power Budgeting <?>
         Capabilities: [160] Virtual Channel
         Capabilities: [1b8] Alternative Routing-ID Interpretation (ARI)
         Capabilities: [1c0] Single Root I/O Virtualization (SR-IOV)
         Capabilities: [220] #15
         Capabilities: [300] #19
         Kernel driver in use: bnx2x

ethtool -i info:
driver: bnx2x
version: 1.712.30-0
firmware-version: bc 7.8.24
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

dmesg bnx2x output:
[    1.506071] bnx2x: QLogic 5771x/578xx 10/20-Gigabit Ethernet Driver 
bnx2x 1.712.30-0 (2014/02/10)
[    1.506205] bnx2x 0000:04:00.0: msix capability found
[    1.506297] bnx2x 0000:04:00.0: part number 
394D4342-31383735-31543030-47303030
[    1.555970] bnx2x 0000:04:00.1: msix capability found
[    1.556061] bnx2x 0000:04:00.1: part number 
394D4342-31383735-31543030-47303030
[   10.360477] bnx2x 0000:04:00.1 eth9: renamed from eth3
[   10.584371] bnx2x 0000:04:00.0 rename3: renamed from eth1
[  588.956002] bnx2x 0000:04:00.0 rename3: using MSI-X  IRQs: sp 70 
fp[0] 74 ... fp[7] 81
[  589.208675] bnx2x 0000:04:00.0 rename3: Added vxlan dest port 4789
[  640.159842] bnx2x 0000:04:00.1 eth10: renamed from eth9
[  642.432216] bnx2x 0000:04:00.1 eth10: using MSI-X  IRQs: sp 82 fp[0] 
84 ... fp[7] 91
[  642.700576] bnx2x 0000:04:00.1 eth10: Added vxlan dest port 4789
[ 1098.368845] bnx2x 0000:04:00.1 eth10: using MSI-X  IRQs: sp 82 fp[0] 
84 ... fp[7] 91
[ 1109.277182] bnx2x 0000:04:00.1 eth10_nolink: renamed from eth10
[ 1115.368873] bnx2x 0000:04:00.0 eth10: renamed from rename3
[ 1117.928156] bnx2x 0000:04:00.0 eth10: using MSI-X  IRQs: sp 70 fp[0] 
74 ... fp[7] 81
[ 1118.214861] bnx2x 0000:04:00.0 eth10: NIC Link is Up, 10000 Mbps full 
duplex, Flow control: ON - receive & transmit

I've tried disabling all offloads (gro included) but the leak still 
happens.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ