lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UeHBXw+d-y=4_Ady-T8Uiv0e7igWzWrMQHj_OrYr=XzgQ@mail.gmail.com>
Date:   Thu, 19 Oct 2017 08:53:24 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Pavlos Parissis <pavlos.parissis@...il.com>
Cc:     Paweł Staszewski <pstaszewski@...are.pl>,
        "Anders K. Pedersen | Cohaesio" <akp@...aesio.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
        "alexander.h.duyck@...el.com" <alexander.h.duyck@...el.com>
Subject: Re: Linux 4.12+ memory leak on router with i40e NICs

On Thu, Oct 19, 2017 at 4:41 AM, Pavlos Parissis
<pavlos.parissis@...il.com> wrote:
> On 19 October 2017 at 01:40, Paweł Staszewski <pstaszewski@...are.pl> wrote:
>>
>>
>> W dniu 2017-10-19 o 01:29, Alexander Duyck pisze:
>>
>>> On Mon, Oct 16, 2017 at 10:51 PM, Vitezslav Samel <vitezslav@...el.cz>
>>> wrote:
>>>>
>>>> On Tue, Oct 17, 2017 at 01:34:29AM +0200, Paweł Staszewski wrote:
>>>>>
>>>>> W dniu 2017-10-16 o 18:26, Paweł Staszewski pisze:
>>>>>>
>>>>>> W dniu 2017-10-16 o 13:20, Pavlos Parissis pisze:
>>>>>>>
>>>>>>> On 15/10/2017 02:58 πμ, Alexander Duyck wrote:
>>>>>>>>
>>>>>>>> Hi Pawel,
>>>>>>>>
>>>>>>>> To clarify is that Dave Miller's tree or Linus's that you are talking
>>>>>>>> about? If it is Dave's tree how long ago was it you pulled it since I
>>>>>>>> think the fix was just pushed by Jeff Kirsher a few days ago.
>>>>>>>>
>>>>>>>> The issue should be fixed in the following commit:
>>>>>>>>
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972
>>>>>>>
>>>>>>> Do you know when it is going to be available on net-next and
>>>>>>> linux-stable repos?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Pavlos
>>>>>>>
>>>>>>>
>>>>>> I will make some tests today night with "net" git tree where this patch
>>>>>> is included.
>>>>>> Starting from 0:00 CET
>>>>>> :)
>>>>>>
>>>>>>
>>>>> Upgraded and looks like problem is not solved with that patch
>>>>> Currently running system with
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/
>>>>> kernel
>>>>>
>>>>> Still about 0.5GB of memory is leaking somewhere
>>>>>
>>>>> Also can confirm that the latest kernel where memory is not leaking
>>>>> (with
>>>>> use i40e driver intel 710 cards) is 4.11.12
>>>>> With kernel 4.11.12 - after hour no change in memory usage.
>>>>>
>>>>> also checked that with ixgbe instead of i40e with same  net.git kernel
>>>>> there
>>>>> is no memleak - after hour same memory usage - so for 100% this is i40e
>>>>> driver problem.
>>>>
>>>>    I have (probably) the same problem here but with X520 cards: booting
>>>> 4.12.x gives me oops after circa 20 minutes of our workload. Booting
>>>> 4.9.y is OK. This machine is in production so any testing is very
>>>> limited.
>>>>
>>>>    Machine was stable for >2 months (on the desk before got to
>>>> production) with 4.12.8 but with no traffic on X520 cards.
>>>>
>>>>          Cheers,
>>>>
>>>>                  Vita
>>>
>>> Sorry but it can't be the same issue since we are discussing a
>>> different driver (i40e) running different hardware (X710 or XL170).
>>> You might want to start a new thread for your issue, and/or if
>>> possible file a bug on e1000.sf.net.
>>>
>>> Thanks.
>>>
>>> - Alex
>>>
>> sorry but bugs reported on e1000.sf.net are delayed - some after about 6 or
>> more months - when i reported first bug there iv got reply after a year
>> about no activity :):) haha - and reported there bug is still actrive :)
>> better for me is now to change nics (for sure cheaper from  the perspective
>> of clients :) ) to mellanox or just to replace and use ixgbe - that have no
>> this bug (mellanox and ixgbe have no such bug - have many servers with them
>> with same conf - and only one with i40e where is same conf and memleak)
>>
>> If nobody from Intel wants to reproduce this - qool - this is not my problem
>> but intels :) - there is now many good nics to use - like mellanox or just
>> stick with many 10G based on ixgbe that is really good driver - but really ?
>> intel guys have no XL710 cards ? i dont want to buy another buggy cards to
>> do only kernel bisects .... sorry ....
>> To do good bisects with this bug You need to spend maybee 200/300 bisects -
>> and to confirm each - You need maybee 30minutes so count how much time You
>> need - more that 100 cards in price from mellanox maybee :)
>>
>
> I have similar issues with you in regards to the stability of i40e
> driver. I will need to open another thread about them, but I would
> like to mention that you are not the only one who suffers from
> problems related to i40e driver. In my case I can't simply change
> NICs..so it is even worse.
>
> Cheers,
> Pavlos

Hi Pavlos,

If you want feel free to Cc either my gmail or my intel.com email
address when you start the new thread, and I can work with you to try
to resolve the issues you are experiencing.

I'm just wanting to split up the unrelated issues into separate
threads as it is easier to track them as single threads. It makes it
much easier to figure out when an actual issue such as the original
memory leak was resolved versus trying to work multiple issues on the
same thread which makes things confusing as you end up losing track of
what the issue being resolved actually is, and it makes it confusing
for people who are reviewing the mailing list for issues similar to
what they are experiencing.

Thanks for your input, and I look forward to working with you to
resolve the issue you are experiencing.

- Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ