[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd82f85a-f9b8-a09a-beab-67667b58f36a@itcare.pl>
Date: Tue, 17 Oct 2017 12:51:42 +0200
From: Paweł Staszewski <pstaszewski@...are.pl>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Pavlos Parissis <pavlos.parissis@...il.com>,
"Anders K. Pedersen | Cohaesio" <akp@...aesio.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"alexander.h.duyck@...el.com" <alexander.h.duyck@...el.com>
Subject: Re: Linux 4.12+ memory leak on router with i40e NICs
W dniu 2017-10-17 o 12:20, Paweł Staszewski pisze:
>
>
> W dniu 2017-10-17 o 11:48, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-10-17 o 02:44, Paweł Staszewski pisze:
>>>
>>>
>>> W dniu 2017-10-17 o 01:56, Alexander Duyck pisze:
>>>> On Mon, Oct 16, 2017 at 4:34 PM, Paweł Staszewski
>>>> <pstaszewski@...are.pl> wrote:
>>>>>
>>>>> W dniu 2017-10-16 o 18:26, Paweł Staszewski pisze:
>>>>>
>>>>>>
>>>>>> W dniu 2017-10-16 o 13:20, Pavlos Parissis pisze:
>>>>>>> On 15/10/2017 02:58 πμ, Alexander Duyck wrote:
>>>>>>>> Hi Pawel,
>>>>>>>>
>>>>>>>> To clarify is that Dave Miller's tree or Linus's that you are
>>>>>>>> talking
>>>>>>>> about? If it is Dave's tree how long ago was it you pulled it
>>>>>>>> since I
>>>>>>>> think the fix was just pushed by Jeff Kirsher a few days ago.
>>>>>>>>
>>>>>>>> The issue should be fixed in the following commit:
>>>>>>>>
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972
>>>>>>>>
>>>>>>>>
>>>>>>> Do you know when it is going to be available on net-next and
>>>>>>> linux-stable
>>>>>>> repos?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Pavlos
>>>>>>>
>>>>>>>
>>>>>> I will make some tests today night with "net" git tree where this
>>>>>> patch is
>>>>>> included.
>>>>>> Starting from 0:00 CET
>>>>>> :)
>>>>>>
>>>>>>
>>>>> Upgraded and looks like problem is not solved with that patch
>>>>> Currently running system with
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/
>>>>> kernel
>>>>>
>>>>> Still about 0.5GB of memory is leaking somewhere
>>>>>
>>>>> Also can confirm that the latest kernel where memory is not
>>>>> leaking (with
>>>>> use i40e driver intel 710 cards) is 4.11.12
>>>>> With kernel 4.11.12 - after hour no change in memory usage.
>>>>>
>>>>> also checked that with ixgbe instead of i40e with same net.git
>>>>> kernel there
>>>>> is no memleak - after hour same memory usage - so for 100% this is
>>>>> i40e
>>>>> driver problem.
>>>> So how long was the run to get the .5GB of memory leaking?
>>> 1 hour
>>>
>>>>
>>>> Also is there any chance of you being able to bisect to determine
>>>> where the memory leak was introduced since as you pointed out it
>>>> didn't exist in 4.11.12 so odds are it was introduced somewhere
>>>> between 4.11 and the latest kernel release.
>>> Can be hard cause currently need to back to 4.11.12 - this is
>>> production host/router
>>> Will try to find some free/test router for tests/bicects with i40e
>>> driver (intel 710 cards)
>>>
>>>>
>>>> Thanks.
>>>>
>>>> - Alex
>>>>
>>>
>>>
>> Also forgoto to add errors for i40e when driver initialize:
>> [ 15.760569] i40e 0000:02:00.1: Error I40E_AQ_RC_ENOSPC adding RX
>> filters on PF, promiscuous mode forced on
>> [ 16.365587] i40e 0000:03:00.3: Error I40E_AQ_RC_ENOSPC adding RX
>> filters on PF, promiscuous mode forced on
>> [ 16.367686] i40e 0000:02:00.2: Error I40E_AQ_RC_ENOSPC adding RX
>> filters on PF, promiscuous mode forced on
>> [ 16.368816] i40e 0000:03:00.0: Error I40E_AQ_RC_ENOSPC adding RX
>> filters on PF, promiscuous mode forced on
>> [ 16.369877] i40e 0000:03:00.2: Error I40E_AQ_RC_ENOSPC adding RX
>> filters on PF, promiscuous mode forced on
>> [ 16.370941] i40e 0000:02:00.3: Error I40E_AQ_RC_ENOSPC adding RX
>> filters on PF, promiscuous mode forced on
>> [ 16.372005] i40e 0000:02:00.0: Error I40E_AQ_RC_ENOSPC adding RX
>> filters on PF, promiscuous mode forced on
>> [ 16.373029] i40e 0000:03:00.1: Error I40E_AQ_RC_ENOSPC adding RX
>> filters on PF, promiscuous mode forced on
>>
>> some params that are set for this nic's
>> ip link set up dev $i
>> ethtool -A $i autoneg off rx off tx off
>> ethtool -G $i rx 1024 tx 2048
>> ip link set $i txqueuelen 1000
>> ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 512
>> tx-usecs 128
>> ethtool -L $i combined 6
>> #ethtool -N $i rx-flow-hash udp4 sdfn
>> ethtool -K $i ntuple on
>> ethtool -K $i gro off
>> ethtool -K $i tso off
>>
>>
>>
>>
> Also after TSO/GRO on there is memory usage change - and leaking faster
> Below image from memory usage before change with TSO/GRO OFF and after
> enabling TSO/GRO
>
> https://ibb.co/dTqBY6
>
>
> Thanks
> Pawel
>
>
>
With settings like this:
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2
enp3s0f3'
for i in $ifc
do
ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 512
tx-usecs 128
ethtool -K $i gro on
ethtool -K $i tso on
done
Server is leaking about 4-6MB per each 10 seconds
MEMLEAK:
5 MB/10sec
6 MB/10sec
4 MB/10sec
4 MB/10sec
Other settings TSO/GRO off
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2
enp3s0f3'
for i in $ifc
do
ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 512
tx-usecs 128
ethtool -K $i gro off
ethtool -K $i tso off
done
Same leak about 5MB per 10 seconds
MEMLEAK:
5 MB/10sec
5 MB/10sec
5 MB/10sec
Other settings rx-usecs change from 512 to 1024:
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2
enp3s0f3'
for i in $ifc
do
ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 1024
tx-usecs 128
ethtool -K $i gro off
ethtool -K $i tso off
done
MEMLEAK:
4 MB/10sec
3 MB/10sec
4 MB/10sec
4 MB/10sec
So memleak have something to do with rx-usecs (less interrupts but
bigger latency for traffic)
But also enabling TSO/GRO making leak about 1MB bigger for each 10 seconds
Powered by blists - more mailing lists