lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1453247464.1223.297.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Tue, 19 Jan 2016 15:51:04 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Jesse Gross <jesse@...nel.org>
Cc:	John <john.phillips5@....com>,
	Linux Kernel Network Developers <netdev@...r.kernel.org>,
	Tom Herbert <tom@...bertland.com>, david.roth@....com,
	Pravin B Shelar <pshelar@...ira.com>,
	Thomas Graf <tgraf@...g.ch>
Subject: Re: Kernel memory leak in bnx2x driver with vxlan tunnel

On Tue, 2016-01-19 at 15:34 -0800, Jesse Gross wrote:
> On Tue, Jan 19, 2016 at 2:47 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > On Tue, 2016-01-19 at 13:07 -0800, Jesse Gross wrote:
> >> On Thu, Jan 14, 2016 at 9:17 AM, John <john.phillips5@....com> wrote:
> >> > I'm getting what seems to be a kernel memory leak while doing a TCP
> >> > throughput test between two VMs on identical systems, in order to test a
> >> > broadcom NIC's performance with a kernel 4.4.0-rc8 and OpenVSwitch version
> >> > 2.4.90. The host system of the receiving (server) VM leaks memory during the
> >> > throughput test. The memory leaks fast enough to make the system completely
> >> > unusable within five minutes. Once I stop the throughput test, the memory
> >> > stops
> >> > leaking. A couple of times, the kernel on the host system has actually
> >> > killed
> >> > the qemu process for me, but this doesn't happen reliably. The leaked memory
> >> > doesn't become available again even after the VM is killed.
> >>
> >> It looks like the problem is in napi_skb_finish(). If we when do
> >> GRO_MERGED_FREE we have NAPI_GRO_CB(skb)->free ==
> >> NAPI_GRO_FREE_STOLEN_HEAD then we will just free the skb memory itself
> >> but not any of the associated elements. Historically, this would have
> >> been OK but these days we will have allocated a dst entry already for
> >> tunnel metadata, which will get leaked.
> >>
> >> If we don't have NAPI_GRO_FREE_STOLEN_HEAD then we'll do a
> >> __kfree_skb(), which will release the dst entry. That would explain
> >> why some drivers have the problem but not others since the memory is
> >> laid out differently.
> >
> >
> > Wow.... What is the purpose of using skb_dst_set() on skb before calling
> > gro_cells_receive() exactly ?
> >
> > Commit 2e15ea390e6f4466655066d97e22ec66870a042c changelog is not
> > helpful :
> >
> >     Following patch create new tunnel flag which enable
> >     tunnel metadata collection on given device.
> 
> Note that this isn't really the problem commit. The general issue is
> lightweight tunnels - the above commit is just adding support for GRE
> in a way that I think follows the existing model.

I believe this commit added the first skb_dst_set() before
gro_cells_receive(), in ip_tunnel_rcv().

This is already buggy.

Then Tom in 58ce31cca1ffe057f4744c3f671e3e84606d3d4a added the
gro_cells_receive() in vxlan_rcv(), which added another bug, because of
existing skb_dst_set() call.




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ