netdev - Re: Linux 4.12+ memory leak on router with i40e NICs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 14 Oct 2017 17:58:01 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Paweł Staszewski <pstaszewski@...are.pl>
Cc:     "Anders K. Pedersen | Cohaesio" <akp@...aesio.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
        "alexander.h.duyck@...el.com" <alexander.h.duyck@...el.com>
Subject: Re: Linux 4.12+ memory leak on router with i40e NICs

Hi Pawel,

To clarify is that Dave Miller's tree or Linus's that you are talking
about? If it is Dave's tree how long ago was it you pulled it since I
think the fix was just pushed by Jeff Kirsher a few days ago.

The issue should be fixed in the following commit:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972

Thanks.

- Alex

On Sat, Oct 14, 2017 at 3:03 PM, Paweł Staszewski <pstaszewski@...are.pl> wrote:
> Forgot to add - this graphs are tested with Kernel 4.14-rc4-next
>
>
> W dniu 2017-10-15 o 00:00, Paweł Staszewski pisze:
>
> Same problem here
>
> Also only difference is change 82599 intel to x710 and have memleak
>
> mem with ixgbe driver over time - same config saame kernel
>
>
>
> changed NIC's to x710 i40e driver (this is the only change)
>
> And mem over time:
>
>
>
> There is no process that is eating memory - looks like there is some problem
> with i40e driver - but it not a surprise :) this driver is really buggy -
> with many things - most tickets on e1000e sourceforge that i openned have no
> reply for year or more - or if somebody reply after year they are closing
> ticket after 1 day with info about no activity :)
>
>
>
> W dniu 2017-10-05 o 07:19, Anders K. Pedersen | Cohaesio pisze:
>
> On ons, 2017-10-04 at 08:32 -0700, Alexander Duyck wrote:
>
> On Wed, Oct 4, 2017 at 5:56 AM, Anders K. Pedersen | Cohaesio
> <akp@...aesio.com> wrote:
>
> Hello,
>
> After updating one of our Linux based routers to kernel 4.13 it
> began
> leaking memory quite fast (about 1 GB every half hour). To narrow
> we
> tried various kernel versions and found that 4.11.12 is okay, while
> 4.12 also leaks, so we did a bisection between 4.11 and 4.12.
>
> The first bisection ended at
> "[6964e53f55837b0c49ed60d36656d2e0ee4fc27b] i40e: fix handling of
> HW
> ATR eviction", which fixes some flag handling that was broken by
> 47994c119a36 "i40e: remove hw_disabled_flags in favor of using
> separate
> flag bits", so I did a second bisection, where I added 6964e53f5583
> "i40e: fix handling of HW ATR eviction" to the steps that had
> 47994c119a36 "i40e: remove hw_disabled_flags in favor of using
> separate
> flag bits" in them.
>
> The second bisection ended at
> "[0e626ff7ccbfc43c6cc4aeea611c40b899682382] i40e: Fix support for
> flow
> director programming status", where I don't see any obvious
> problems,
> so I'm hoping for some assistance.
>
> The router is a PowerEdge R730 server (Haswell based) with three
> Intel
> NICs (all using the i40e driver):
>
> X710 quad port 10 GbE SFP+: eth0 eth1 eth2 eth3
> X710 quad port 10 GbE SFP+: eth4 eth5 eth6 eth7
> XL710 dual port 40 GbE QSFP+: eth8 eth9
>
> The NICs are aggregated with LACP with the team driver:
>
> team0: eth9 (40 GbE selected primary), and eth3, eth7 (10 GbE non-
> selected backups)
> team1: eth0, eth1, eth4, eth5 (all 10 GbE selected)
>
> team0 is used for internal networks and has one untagged and four
> tagged VLAN interfaces, while team1 has an external uplink
> connection
> without any VLANs.
>
> The router runs an eBGP session on team1 to one of our uplinks, and
> iBGP via team0 to our other border routers. It also runs OSPF on
> the
> internal VLANs on team0. One thing I've noticed is that when OSPF
> is
> not announcing a default gateway to the internal networks, so there
> is
> almost no traffic coming in on team0 and out on team1, but still
> plenty
> of traffic coming in on team1 and out via team0, there's no memory
> leak
> (or at least it is so small that we haven't detected it). But as
> soon
> as we configure OSPF to announce a default gateway to the internal
> VLANs, so we get traffic from team0 to team1 the leaking begins.
> Stopping the OSPF default gateway announcement again also stops the
> leaking, but does not release already leaked memory.
>
> So this leads to me suspect that the leaking is related to RX on
> team0
> (where XL710 eth9 is normally the only active interface) or TX on
> team1
> (X710 eth0, eth1, eth4, eth5). The first bad commit is related to
> RX
> cleaning, which suggests RX on team0. Since we're only seeing the
> leak
> for our outbound traffic, I suspect either a difference between the
> X710 vs. XL710 NICs, or that the inbound traffic is for relatively
> few
> destination addresses (only our own systems) while the outbound
> traffic
> is for many different addresses on the internet. But I'm just
> guessing
> here.
>
> I've tried kmemleak, but it only found a few kB of suspected memory
> leaks (several of which disappeared again after a while).
>
> Below I've included more details - git bisect logs, ethtool -i,
> dmesg,
> Kernel .config, and various memory related /proc files. Any help or
> suggestions would be much appreciated, and please let me know if
> more
> information is needed or there's something I should try.
>
> Regards,
> Anders K. Pedersen
>
> Hi Anders,
>
> I think I see the problem and should have a patch submitted shortly
> to
> address it. From what I can tell it looks like the issue is that we
> weren't properly recycling the pages associated with descriptors that
> contained an Rx programming status. For now the workaround would be
> to
> try disabling ATR via the "ethtool --set-priv-flags" command. I
> should
> have a patch out in the next hour or so that you can try testing to
> verify if it addresses the issue.
>
> Thanks.
>
> - Alex
>
> Thanks Alex,
>
> I will test the patch in our next service window on Tuesday morning.
>
> Regards,
> Anders
>
>
>