lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 22 Oct 2017 13:56:40 +0000 From: "Anders K. Pedersen | Cohaesio" <akp@...aesio.com> To: "alexander.duyck@...il.com" <alexander.duyck@...il.com> CC: "pstaszewski@...are.pl" <pstaszewski@...are.pl>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, "pavlos.parissis@...il.com" <pavlos.parissis@...il.com>, "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>, "alexander.h.duyck@...el.com" <alexander.h.duyck@...el.com> Subject: Re: Linux 4.12+ memory leak on router with i40e NICs On tor, 2017-10-19 at 08:40 -0700, Alexander Duyck wrote: > On Thu, Oct 19, 2017 at 5:19 AM, Anders K. Pedersen | Cohaesio > <akp@...aesio.com> wrote: > > Hi Alex, > > > > On ons, 2017-10-18 at 16:37 -0700, Alexander Duyck wrote: > > > When we last talked I had asked if you could do a git bisect to > > > find > > > the memory leak and you said you would look into it. The most > > > useful > > > way to solve this would be to do a git bisect between your > > > current > > > kernel and the 4.11 kernel to find the point at which this > > > started. > > > If > > > we can do that then fixing this becomes much simpler as we just > > > have > > > to fix the patch that introduced the issue. > > > > We're also seeing a smaller memory leak (about 1 GB per day) than > > the > > original one even with the "Fix memory leak related filter > > programming > > status" fix applied. So far I've determined that the leak is > > present on > > 4.13.7 and was introduced between 4.11 and 4.12, so I'll do another > > round of bisection to identify the patch that introduced this. > > > > Since the router must run for a couple of hours before I can be > > sure > > whether a kernel is good or bad, and I can't reboot it during > > working > > hours, it'll probably be about a week before I have a result. > > > > -- > > Venlig hilsen / Best Regards > > > > Anders K. Pedersen > > Senior Technical Manager > > Anders, > > I'll do some digging on my side to see if I can find any other memory > leaks that might be floating around in the driver that could have > been > introduced during that time-frame. > > One thing you might try that would help with your testing would be to > just disable the ATR functionality in i40e. You can do that with the > ethtool command "ethtool --set-priv-flags <iface> flow-director-atr > off". That should allow you to bisect this without needing to deal > with the "programming status" patches since you won't be programming > ATR filters which is what caused that leak. > > Thanks for looking into this. > > - Alex Hi Alex, I began bisecting, where I applied the known fix patches to the steps, where they were applicable (i.e. without changing the flow-director-atr flag), but some of the steps had a high amount of packet drops, which caused problems for our network, so I couldn't leave them running for several hours, which is necessary to determine if the leak is present or not. The part of the bisection I got through had the same outcome as the last bisection, which led to "i40e: Fix support for flow director programming status". After that I experimented a bit with the flow-director-atr flag, and it turns out that if I disable this flag on all the NICs, then the memory leak is gone, so I suspected that the smaller memory leak was also caused by "i40e: Fix support for flow director programming status". I tried to revert this patch from 4.13 (with manual fixup for the trace point that had been added later), but that brought back the packet drops, so I couldn't let it run. This morning I saw your "i40e: Add programming descriptors to cleaned_count" patch, so I tried 4.13.9 with that patch and the previous "i40e: Fix memory leak related filter programming status" without turning off the flow-director-atr flag. So far this combination is running stable without any memory leaks. Thanks for fixing this. Regards, Anders
Powered by blists - more mailing lists