[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEh+42iikonSmsLu5f3UyQCPge0rmUmNu15ggvpkPdv9gqKQ=g@mail.gmail.com>
Date: Tue, 19 Jan 2016 15:34:27 -0800
From: Jesse Gross <jesse@...nel.org>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: John <john.phillips5@....com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Tom Herbert <tom@...bertland.com>, david.roth@....com,
Pravin B Shelar <pshelar@...ira.com>,
Thomas Graf <tgraf@...g.ch>
Subject: Re: Kernel memory leak in bnx2x driver with vxlan tunnel
On Tue, Jan 19, 2016 at 2:47 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Tue, 2016-01-19 at 13:07 -0800, Jesse Gross wrote:
>> On Thu, Jan 14, 2016 at 9:17 AM, John <john.phillips5@....com> wrote:
>> > I'm getting what seems to be a kernel memory leak while doing a TCP
>> > throughput test between two VMs on identical systems, in order to test a
>> > broadcom NIC's performance with a kernel 4.4.0-rc8 and OpenVSwitch version
>> > 2.4.90. The host system of the receiving (server) VM leaks memory during the
>> > throughput test. The memory leaks fast enough to make the system completely
>> > unusable within five minutes. Once I stop the throughput test, the memory
>> > stops
>> > leaking. A couple of times, the kernel on the host system has actually
>> > killed
>> > the qemu process for me, but this doesn't happen reliably. The leaked memory
>> > doesn't become available again even after the VM is killed.
>>
>> It looks like the problem is in napi_skb_finish(). If we when do
>> GRO_MERGED_FREE we have NAPI_GRO_CB(skb)->free ==
>> NAPI_GRO_FREE_STOLEN_HEAD then we will just free the skb memory itself
>> but not any of the associated elements. Historically, this would have
>> been OK but these days we will have allocated a dst entry already for
>> tunnel metadata, which will get leaked.
>>
>> If we don't have NAPI_GRO_FREE_STOLEN_HEAD then we'll do a
>> __kfree_skb(), which will release the dst entry. That would explain
>> why some drivers have the problem but not others since the memory is
>> laid out differently.
>
>
> Wow.... What is the purpose of using skb_dst_set() on skb before calling
> gro_cells_receive() exactly ?
>
> Commit 2e15ea390e6f4466655066d97e22ec66870a042c changelog is not
> helpful :
>
> Following patch create new tunnel flag which enable
> tunnel metadata collection on given device.
Note that this isn't really the problem commit. The general issue is
lightweight tunnels - the above commit is just adding support for GRE
in a way that I think follows the existing model.
> This is rather strange since later the dst is thrown away with the
> skb_valid_dst() test.
This isn't really an IP routing dst - it's the tunnel encapsulation
information - so there isn't much to with it in ip_rcv_finish().
However, the information can be consumed in other places, such as
through eBPF.
> Note also that IP early demux is broken as well, since it does not use
> skb_valid_dst() but a simple :
>
> if (sysctl_ip_early_demux && !skb_dst(skb) && !skb->sk) {
> ...
> }
I agree that this should use the helper function.
Powered by blists - more mailing lists