lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd59ccc3-2af4-6842-8d2f-169229a61470@itcare.pl>
Date:   Thu, 19 Oct 2017 00:58:56 +0200
From:   Paweł Staszewski <pstaszewski@...are.pl>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     Pavlos Parissis <pavlos.parissis@...il.com>,
        "Anders K. Pedersen | Cohaesio" <akp@...aesio.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
        "alexander.h.duyck@...el.com" <alexander.h.duyck@...el.com>
Subject: Re: Linux 4.12+ memory leak on router with i40e NICs



W dniu 2017-10-19 o 00:50, Paweł Staszewski pisze:
>
>
> W dniu 2017-10-19 o 00:20, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-10-18 o 17:44, Paweł Staszewski pisze:
>>>
>>>
>>> W dniu 2017-10-17 o 16:08, Paweł Staszewski pisze:
>>>>
>>>>
>>>> W dniu 2017-10-17 o 13:52, Paweł Staszewski pisze:
>>>>>
>>>>>
>>>>> W dniu 2017-10-17 o 13:05, Paweł Staszewski pisze:
>>>>>>
>>>>>>
>>>>>> W dniu 2017-10-17 o 12:59, Paweł Staszewski pisze:
>>>>>>>
>>>>>>>
>>>>>>> W dniu 2017-10-17 o 12:51, Paweł Staszewski pisze:
>>>>>>>>
>>>>>>>>
>>>>>>>> W dniu 2017-10-17 o 12:20, Paweł Staszewski pisze:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> W dniu 2017-10-17 o 11:48, Paweł Staszewski pisze:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> W dniu 2017-10-17 o 02:44, Paweł Staszewski pisze:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> W dniu 2017-10-17 o 01:56, Alexander Duyck pisze:
>>>>>>>>>>>> On Mon, Oct 16, 2017 at 4:34 PM, Paweł Staszewski 
>>>>>>>>>>>> <pstaszewski@...are.pl> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> W dniu 2017-10-16 o 18:26, Paweł Staszewski pisze:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> W dniu 2017-10-16 o 13:20, Pavlos Parissis pisze:
>>>>>>>>>>>>>>> On 15/10/2017 02:58 πμ, Alexander Duyck wrote:
>>>>>>>>>>>>>>>> Hi Pawel,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To clarify is that Dave Miller's tree or Linus's that 
>>>>>>>>>>>>>>>> you are talking
>>>>>>>>>>>>>>>> about? If it is Dave's tree how long ago was it you 
>>>>>>>>>>>>>>>> pulled it since I
>>>>>>>>>>>>>>>> think the fix was just pushed by Jeff Kirsher a few 
>>>>>>>>>>>>>>>> days ago.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The issue should be fixed in the following commit:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you know when it is going to be available on net-next 
>>>>>>>>>>>>>>> and linux-stable
>>>>>>>>>>>>>>> repos?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Pavlos
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I will make some tests today night with "net" git tree 
>>>>>>>>>>>>>> where this patch is
>>>>>>>>>>>>>> included.
>>>>>>>>>>>>>> Starting from 0:00 CET
>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Upgraded and looks like problem is not solved with that patch
>>>>>>>>>>>>> Currently running system with
>>>>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/ 
>>>>>>>>>>>>>
>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>
>>>>>>>>>>>>> Still about 0.5GB of memory is leaking somewhere
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also can confirm that the latest kernel where memory is 
>>>>>>>>>>>>> not leaking (with
>>>>>>>>>>>>> use i40e driver intel 710 cards) is 4.11.12
>>>>>>>>>>>>> With kernel 4.11.12 - after hour no change in memory usage.
>>>>>>>>>>>>>
>>>>>>>>>>>>> also checked that with ixgbe instead of i40e with same 
>>>>>>>>>>>>> net.git kernel there
>>>>>>>>>>>>> is no memleak - after hour same memory usage - so for 100% 
>>>>>>>>>>>>> this is i40e
>>>>>>>>>>>>> driver problem.
>>>>>>>>>>>> So how long was the run to get the .5GB of memory leaking?
>>>>>>>>>>> 1 hour
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Also is there any chance of you being able to bisect to 
>>>>>>>>>>>> determine
>>>>>>>>>>>> where the memory leak was introduced since as you pointed 
>>>>>>>>>>>> out it
>>>>>>>>>>>> didn't exist in 4.11.12 so odds are it was introduced 
>>>>>>>>>>>> somewhere
>>>>>>>>>>>> between 4.11 and the latest kernel release.
>>>>>>>>>>> Can be hard cause currently need to back to 4.11.12 - this 
>>>>>>>>>>> is production host/router
>>>>>>>>>>> Will try to find some free/test router for tests/bicects 
>>>>>>>>>>> with i40e driver (intel 710 cards)
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> - Alex
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Also forgoto to add errors for i40e when driver initialize:
>>>>>>>>>> [   15.760569] i40e 0000:02:00.1: Error I40E_AQ_RC_ENOSPC 
>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>> [   16.365587] i40e 0000:03:00.3: Error I40E_AQ_RC_ENOSPC 
>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>> [   16.367686] i40e 0000:02:00.2: Error I40E_AQ_RC_ENOSPC 
>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>> [   16.368816] i40e 0000:03:00.0: Error I40E_AQ_RC_ENOSPC 
>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>> [   16.369877] i40e 0000:03:00.2: Error I40E_AQ_RC_ENOSPC 
>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>> [   16.370941] i40e 0000:02:00.3: Error I40E_AQ_RC_ENOSPC 
>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>> [   16.372005] i40e 0000:02:00.0: Error I40E_AQ_RC_ENOSPC 
>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>> [   16.373029] i40e 0000:03:00.1: Error I40E_AQ_RC_ENOSPC 
>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>>
>>>>>>>>>> some params that are set for this nic's
>>>>>>>>>>         ip link set up dev $i
>>>>>>>>>>         ethtool -A $i autoneg off rx off tx off
>>>>>>>>>>         ethtool -G $i rx 1024 tx 2048
>>>>>>>>>>         ip link set $i txqueuelen 1000
>>>>>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off 
>>>>>>>>>> rx-usecs 512 tx-usecs 128
>>>>>>>>>>         ethtool -L $i combined 6
>>>>>>>>>>         #ethtool -N $i rx-flow-hash udp4 sdfn
>>>>>>>>>>         ethtool -K $i ntuple on
>>>>>>>>>>         ethtool -K $i gro off
>>>>>>>>>>         ethtool -K $i tso off
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Also after TSO/GRO on there is memory usage change - and 
>>>>>>>>> leaking faster
>>>>>>>>> Below image from memory usage before change with TSO/GRO OFF 
>>>>>>>>> and after enabling TSO/GRO
>>>>>>>>>
>>>>>>>>> https://ibb.co/dTqBY6
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Pawel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> With settings like this:
>>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>>> for i in $ifc
>>>>>>>>         do
>>>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 
>>>>>>>> 512 tx-usecs 128
>>>>>>>>         ethtool -K $i gro on
>>>>>>>>         ethtool -K $i tso on
>>>>>>>>
>>>>>>>>         done
>>>>>>>>
>>>>>>>> Server is leaking about 4-6MB per each 10 seconds
>>>>>>>> MEMLEAK:
>>>>>>>> 5  MB/10sec
>>>>>>>> 6  MB/10sec
>>>>>>>> 4  MB/10sec
>>>>>>>> 4  MB/10sec
>>>>>>>>
>>>>>>>>
>>>>>>>> Other settings TSO/GRO off
>>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>>> for i in $ifc
>>>>>>>>         do
>>>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 
>>>>>>>> 512 tx-usecs 128
>>>>>>>>         ethtool -K $i gro off
>>>>>>>>         ethtool -K $i tso off
>>>>>>>>
>>>>>>>>         done
>>>>>>>>
>>>>>>>> Same leak about 5MB per 10 seconds
>>>>>>>> MEMLEAK:
>>>>>>>> 5  MB/10sec
>>>>>>>> 5  MB/10sec
>>>>>>>> 5  MB/10sec
>>>>>>>>
>>>>>>>>
>>>>>>>> Other settings rx-usecs change from 512 to 1024:
>>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>>> for i in $ifc
>>>>>>>>         do
>>>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 
>>>>>>>> 1024 tx-usecs 128
>>>>>>>>         ethtool -K $i gro off
>>>>>>>>         ethtool -K $i tso off
>>>>>>>>
>>>>>>>>         done
>>>>>>>>
>>>>>>>> MEMLEAK:
>>>>>>>> 4  MB/10sec
>>>>>>>> 3  MB/10sec
>>>>>>>> 4  MB/10sec
>>>>>>>> 4  MB/10sec
>>>>>>>>
>>>>>>>>
>>>>>>>> So memleak have something to do with rx-usecs (less interrupts 
>>>>>>>> but bigger latency for traffic)
>>>>>>>>
>>>>>>>>
>>>>>>>> But also enabling TSO/GRO making leak about 1MB bigger for each 
>>>>>>>> 10 seconds
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> So far best config is:
>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>> for i in $ifc
>>>>>>>         do
>>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 
>>>>>>> 64 tx-usecs 512
>>>>>>>         ethtool -K $i gro off
>>>>>>>         ethtool -K $i tso on
>>>>>>>
>>>>>>>         done
>>>>>>>
>>>>>>> MEMLEAK - about 2MB/10secs
>>>>>>> 2  MB/10sec
>>>>>>> 2  MB/10sec
>>>>>>> 2  MB/10sec
>>>>>>>
>>>>>>>
>>>>>>> With - rx-usecs set to 256 (about 7-9MB/10secs memleak)
>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>> for i in $ifc
>>>>>>>         do
>>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 
>>>>>>> 256 tx-usecs 512
>>>>>>>         ethtool -K $i gro off
>>>>>>>         ethtool -K $i tso on
>>>>>>>
>>>>>>>         done
>>>>>>>
>>>>>>> MEMLEAK:
>>>>>>> 7  MB/10sec
>>>>>>> 7  MB/10sec
>>>>>>> 8  MB/10sec
>>>>>>> 9  MB/10sec
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> And even less memleak with rx-usecs set to 32
>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>>> enp3s0f2 enp3s0f3'
>>>>>> for i in $ifc
>>>>>>         do
>>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 32 
>>>>>> tx-usecs 512
>>>>>>         ethtool -K $i gro off
>>>>>>         ethtool -K $i tso on
>>>>>>
>>>>>>         done
>>>>>>
>>>>>>
>>>>>> MEMLEAK - about 0-2MB for each 10 seconds
>>>>>> 0  MB/10sec
>>>>>> 1  MB/10sec
>>>>>> 0  MB/10sec
>>>>>> 2  MB/10sec
>>>>>> 1  MB/10sec
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> So best settings - to have as less leak as possible for now 
>>>>> (rx-usecs set to 16):
>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>> enp3s0f2 enp3s0f3'
>>>>> for i in $ifc
>>>>>         do
>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 16 
>>>>> tx-usecs 768
>>>>>         ethtool -K $i gro on
>>>>>         ethtool -K $i tso on
>>>>>
>>>>>         done
>>>>>
>>>>>
>>>>> MEMLEAK: (0-1MB/10seconds)
>>>>> 0  MB/10sec
>>>>> 0  MB/10sec
>>>>> 0  MB/10sec
>>>>> 1  MB/10sec
>>>>> 1  MB/10sec
>>>>> -1  MB/10sec
>>>>> 1  MB/10sec
>>>>> 1  MB/10sec
>>>>> 0  MB/10sec
>>>>>
>>>>> (there are some memory recycles - so this is good :) )
>>>>>
>>>>>
>>>>>
>>>>> Compared to(rx-usecs 512):
>>>>>
>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 
>>>>> enp3s0f2 enp3s0f3'
>>>>> for i in $ifc
>>>>>         do
>>>>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 512 
>>>>> tx-usecs 128
>>>>>         ethtool -K $i gro on
>>>>>         ethtool -K $i tso on
>>>>>
>>>>>         done
>>>>>
>>>>> Server is leaking about 4-6MB per each 10 seconds
>>>>> MEMLEAK:
>>>>> 5  MB/10sec
>>>>> 6  MB/10sec
>>>>> 4  MB/10sec
>>>>> 4  MB/10sec
>>>>>
>>>>>
>>>>
>>>> And  graph where all changes for rx-usecs was done over some time:
>>>> https://ibb.co/nrRfbR
>>>>
>>>>
>>>>
>>>>
>>>>
>>> Cant eliminate the problem with settings - memleak is bigger or less 
>>> visible with rx-usecs set to low values - but then have 100% cpu 
>>> load - cant have rx-usecs set to 16
>>>
>>> Cant find also other host with same cards or that are using i40e 
>>> driver for tests with bisecting
>>> So will just replace to mellanox :)
>>>
>>>
>> Also after fresh reboot with i40e
>> startup settings:
>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 
>> enp3s0f3'
>> for i in $ifc
>>         do
>>         ip link set up dev $i
>>         ethtool -A $i autoneg off rx off tx off
>>         ethtool -G $i rx 2048 tx 2048
>>         ip link set $i txqueuelen 1000
>>         #ethtool -C $i rx-usecs 256
>>         ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 17 
>> tx-usecs 125
>>         ethtool -L $i combined 6
>>         #ethtool -N $i rx-flow-hash udp4 sdfn
>>         #ethtool -K $i ntuple on
>>         #ethtool -K $i gro off
>>         #ethtool -K $i tso off
>>         done
>>
>>
>> After issuing:
>>
>>  ethtool -K enp2s0f0 gro on tso on
>>
>> dmesg shows
>> [35764.338259] i40e 0000:02:00.0: PF reset failed, -15
>>
>>
>> and no traffic on the card :)
>>
>>
> Also checked now
> bigger rx ring
>         ethtool -G $i rx 2048 tx 2048
>
>
> Bigger memleag :)
>
>
>
ok need to change cards now to ixgbe .... no reply no help for i40e so ....

maybee someone else with i40e will gather more data i have only this 
host soo far - will try to install this cards to other hosts after 
change but alll this movement will takes about 2 maybee 3 months - 
nobody from my team want to but now cards that supports i40e cause of 
this bug soo this is hard now to debug - i need to change also all cards 
now >10G to mellanox that have no such bug ... sorry :)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ