[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <888ec430-1d70-68e6-2ee8-bab53d21bb0f@itcare.pl>
Date: Thu, 19 Oct 2017 01:22:42 +0200
From: Paweł Staszewski <pstaszewski@...are.pl>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Pavlos Parissis <pavlos.parissis@...il.com>,
"Anders K. Pedersen | Cohaesio" <akp@...aesio.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"alexander.h.duyck@...el.com" <alexander.h.duyck@...el.com>
Subject: Re: Linux 4.12+ memory leak on router with i40e NICs
W dniu 2017-10-19 o 00:58, Paweł Staszewski pisze:
>
>
> W dniu 2017-10-19 o 00:50, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-10-19 o 00:20, Paweł Staszewski pisze:
>>>
>>>
>>> W dniu 2017-10-18 o 17:44, Paweł Staszewski pisze:
>>>>
>>>>
>>>> W dniu 2017-10-17 o 16:08, Paweł Staszewski pisze:
>>>>>
>>>>>
>>>>> W dniu 2017-10-17 o 13:52, Paweł Staszewski pisze:
>>>>>>
>>>>>>
>>>>>> W dniu 2017-10-17 o 13:05, Paweł Staszewski pisze:
>>>>>>>
>>>>>>>
>>>>>>> W dniu 2017-10-17 o 12:59, Paweł Staszewski pisze:
>>>>>>>>
>>>>>>>>
>>>>>>>> W dniu 2017-10-17 o 12:51, Paweł Staszewski pisze:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> W dniu 2017-10-17 o 12:20, Paweł Staszewski pisze:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> W dniu 2017-10-17 o 11:48, Paweł Staszewski pisze:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> W dniu 2017-10-17 o 02:44, Paweł Staszewski pisze:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> W dniu 2017-10-17 o 01:56, Alexander Duyck pisze:
>>>>>>>>>>>>> On Mon, Oct 16, 2017 at 4:34 PM, Paweł Staszewski
>>>>>>>>>>>>> <pstaszewski@...are.pl> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> W dniu 2017-10-16 o 18:26, Paweł Staszewski pisze:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> W dniu 2017-10-16 o 13:20, Pavlos Parissis pisze:
>>>>>>>>>>>>>>>> On 15/10/2017 02:58 πμ, Alexander Duyck wrote:
>>>>>>>>>>>>>>>>> Hi Pawel,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To clarify is that Dave Miller's tree or Linus's that
>>>>>>>>>>>>>>>>> you are talking
>>>>>>>>>>>>>>>>> about? If it is Dave's tree how long ago was it you
>>>>>>>>>>>>>>>>> pulled it since I
>>>>>>>>>>>>>>>>> think the fix was just pushed by Jeff Kirsher a few
>>>>>>>>>>>>>>>>> days ago.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The issue should be fixed in the following commit:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do you know when it is going to be available on
>>>>>>>>>>>>>>>> net-next and linux-stable
>>>>>>>>>>>>>>>> repos?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Pavlos
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will make some tests today night with "net" git tree
>>>>>>>>>>>>>>> where this patch is
>>>>>>>>>>>>>>> included.
>>>>>>>>>>>>>>> Starting from 0:00 CET
>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Upgraded and looks like problem is not solved with that
>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>> Currently running system with
>>>>>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Still about 0.5GB of memory is leaking somewhere
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also can confirm that the latest kernel where memory is
>>>>>>>>>>>>>> not leaking (with
>>>>>>>>>>>>>> use i40e driver intel 710 cards) is 4.11.12
>>>>>>>>>>>>>> With kernel 4.11.12 - after hour no change in memory usage.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> also checked that with ixgbe instead of i40e with same
>>>>>>>>>>>>>> net.git kernel there
>>>>>>>>>>>>>> is no memleak - after hour same memory usage - so for
>>>>>>>>>>>>>> 100% this is i40e
>>>>>>>>>>>>>> driver problem.
>>>>>>>>>>>>> So how long was the run to get the .5GB of memory leaking?
>>>>>>>>>>>> 1 hour
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also is there any chance of you being able to bisect to
>>>>>>>>>>>>> determine
>>>>>>>>>>>>> where the memory leak was introduced since as you pointed
>>>>>>>>>>>>> out it
>>>>>>>>>>>>> didn't exist in 4.11.12 so odds are it was introduced
>>>>>>>>>>>>> somewhere
>>>>>>>>>>>>> between 4.11 and the latest kernel release.
>>>>>>>>>>>> Can be hard cause currently need to back to 4.11.12 - this
>>>>>>>>>>>> is production host/router
>>>>>>>>>>>> Will try to find some free/test router for tests/bicects
>>>>>>>>>>>> with i40e driver (intel 710 cards)
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Alex
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Also forgoto to add errors for i40e when driver initialize:
>>>>>>>>>>> [ 15.760569] i40e 0000:02:00.1: Error I40E_AQ_RC_ENOSPC
>>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>>> [ 16.365587] i40e 0000:03:00.3: Error I40E_AQ_RC_ENOSPC
>>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>>> [ 16.367686] i40e 0000:02:00.2: Error I40E_AQ_RC_ENOSPC
>>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>>> [ 16.368816] i40e 0000:03:00.0: Error I40E_AQ_RC_ENOSPC
>>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>>> [ 16.369877] i40e 0000:03:00.2: Error I40E_AQ_RC_ENOSPC
>>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>>> [ 16.370941] i40e 0000:02:00.3: Error I40E_AQ_RC_ENOSPC
>>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>>> [ 16.372005] i40e 0000:02:00.0: Error I40E_AQ_RC_ENOSPC
>>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>>> [ 16.373029] i40e 0000:03:00.1: Error I40E_AQ_RC_ENOSPC
>>>>>>>>>>> adding RX filters on PF, promiscuous mode forced on
>>>>>>>>>>>
>>>>>>>>>>> some params that are set for this nic's
>>>>>>>>>>> ip link set up dev $i
>>>>>>>>>>> ethtool -A $i autoneg off rx off tx off
>>>>>>>>>>> ethtool -G $i rx 1024 tx 2048
>>>>>>>>>>> ip link set $i txqueuelen 1000
>>>>>>>>>>> ethtool -C $i adaptive-rx off adaptive-tx off
>>>>>>>>>>> rx-usecs 512 tx-usecs 128
>>>>>>>>>>> ethtool -L $i combined 6
>>>>>>>>>>> #ethtool -N $i rx-flow-hash udp4 sdfn
>>>>>>>>>>> ethtool -K $i ntuple on
>>>>>>>>>>> ethtool -K $i gro off
>>>>>>>>>>> ethtool -K $i tso off
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Also after TSO/GRO on there is memory usage change - and
>>>>>>>>>> leaking faster
>>>>>>>>>> Below image from memory usage before change with TSO/GRO OFF
>>>>>>>>>> and after enabling TSO/GRO
>>>>>>>>>>
>>>>>>>>>> https://ibb.co/dTqBY6
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Pawel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> With settings like this:
>>>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1
>>>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>>>> for i in $ifc
>>>>>>>>> do
>>>>>>>>> ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs
>>>>>>>>> 512 tx-usecs 128
>>>>>>>>> ethtool -K $i gro on
>>>>>>>>> ethtool -K $i tso on
>>>>>>>>>
>>>>>>>>> done
>>>>>>>>>
>>>>>>>>> Server is leaking about 4-6MB per each 10 seconds
>>>>>>>>> MEMLEAK:
>>>>>>>>> 5 MB/10sec
>>>>>>>>> 6 MB/10sec
>>>>>>>>> 4 MB/10sec
>>>>>>>>> 4 MB/10sec
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Other settings TSO/GRO off
>>>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1
>>>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>>>> for i in $ifc
>>>>>>>>> do
>>>>>>>>> ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs
>>>>>>>>> 512 tx-usecs 128
>>>>>>>>> ethtool -K $i gro off
>>>>>>>>> ethtool -K $i tso off
>>>>>>>>>
>>>>>>>>> done
>>>>>>>>>
>>>>>>>>> Same leak about 5MB per 10 seconds
>>>>>>>>> MEMLEAK:
>>>>>>>>> 5 MB/10sec
>>>>>>>>> 5 MB/10sec
>>>>>>>>> 5 MB/10sec
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Other settings rx-usecs change from 512 to 1024:
>>>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1
>>>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>>>> for i in $ifc
>>>>>>>>> do
>>>>>>>>> ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs
>>>>>>>>> 1024 tx-usecs 128
>>>>>>>>> ethtool -K $i gro off
>>>>>>>>> ethtool -K $i tso off
>>>>>>>>>
>>>>>>>>> done
>>>>>>>>>
>>>>>>>>> MEMLEAK:
>>>>>>>>> 4 MB/10sec
>>>>>>>>> 3 MB/10sec
>>>>>>>>> 4 MB/10sec
>>>>>>>>> 4 MB/10sec
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So memleak have something to do with rx-usecs (less interrupts
>>>>>>>>> but bigger latency for traffic)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But also enabling TSO/GRO making leak about 1MB bigger for
>>>>>>>>> each 10 seconds
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> So far best config is:
>>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1
>>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>>> for i in $ifc
>>>>>>>> do
>>>>>>>> ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs
>>>>>>>> 64 tx-usecs 512
>>>>>>>> ethtool -K $i gro off
>>>>>>>> ethtool -K $i tso on
>>>>>>>>
>>>>>>>> done
>>>>>>>>
>>>>>>>> MEMLEAK - about 2MB/10secs
>>>>>>>> 2 MB/10sec
>>>>>>>> 2 MB/10sec
>>>>>>>> 2 MB/10sec
>>>>>>>>
>>>>>>>>
>>>>>>>> With - rx-usecs set to 256 (about 7-9MB/10secs memleak)
>>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1
>>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>>> for i in $ifc
>>>>>>>> do
>>>>>>>> ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs
>>>>>>>> 256 tx-usecs 512
>>>>>>>> ethtool -K $i gro off
>>>>>>>> ethtool -K $i tso on
>>>>>>>>
>>>>>>>> done
>>>>>>>>
>>>>>>>> MEMLEAK:
>>>>>>>> 7 MB/10sec
>>>>>>>> 7 MB/10sec
>>>>>>>> 8 MB/10sec
>>>>>>>> 9 MB/10sec
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> And even less memleak with rx-usecs set to 32
>>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1
>>>>>>> enp3s0f2 enp3s0f3'
>>>>>>> for i in $ifc
>>>>>>> do
>>>>>>> ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs
>>>>>>> 32 tx-usecs 512
>>>>>>> ethtool -K $i gro off
>>>>>>> ethtool -K $i tso on
>>>>>>>
>>>>>>> done
>>>>>>>
>>>>>>>
>>>>>>> MEMLEAK - about 0-2MB for each 10 seconds
>>>>>>> 0 MB/10sec
>>>>>>> 1 MB/10sec
>>>>>>> 0 MB/10sec
>>>>>>> 2 MB/10sec
>>>>>>> 1 MB/10sec
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> So best settings - to have as less leak as possible for now
>>>>>> (rx-usecs set to 16):
>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1
>>>>>> enp3s0f2 enp3s0f3'
>>>>>> for i in $ifc
>>>>>> do
>>>>>> ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 16
>>>>>> tx-usecs 768
>>>>>> ethtool -K $i gro on
>>>>>> ethtool -K $i tso on
>>>>>>
>>>>>> done
>>>>>>
>>>>>>
>>>>>> MEMLEAK: (0-1MB/10seconds)
>>>>>> 0 MB/10sec
>>>>>> 0 MB/10sec
>>>>>> 0 MB/10sec
>>>>>> 1 MB/10sec
>>>>>> 1 MB/10sec
>>>>>> -1 MB/10sec
>>>>>> 1 MB/10sec
>>>>>> 1 MB/10sec
>>>>>> 0 MB/10sec
>>>>>>
>>>>>> (there are some memory recycles - so this is good :) )
>>>>>>
>>>>>>
>>>>>>
>>>>>> Compared to(rx-usecs 512):
>>>>>>
>>>>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1
>>>>>> enp3s0f2 enp3s0f3'
>>>>>> for i in $ifc
>>>>>> do
>>>>>> ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs
>>>>>> 512 tx-usecs 128
>>>>>> ethtool -K $i gro on
>>>>>> ethtool -K $i tso on
>>>>>>
>>>>>> done
>>>>>>
>>>>>> Server is leaking about 4-6MB per each 10 seconds
>>>>>> MEMLEAK:
>>>>>> 5 MB/10sec
>>>>>> 6 MB/10sec
>>>>>> 4 MB/10sec
>>>>>> 4 MB/10sec
>>>>>>
>>>>>>
>>>>>
>>>>> And graph where all changes for rx-usecs was done over some time:
>>>>> https://ibb.co/nrRfbR
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Cant eliminate the problem with settings - memleak is bigger or
>>>> less visible with rx-usecs set to low values - but then have 100%
>>>> cpu load - cant have rx-usecs set to 16
>>>>
>>>> Cant find also other host with same cards or that are using i40e
>>>> driver for tests with bisecting
>>>> So will just replace to mellanox :)
>>>>
>>>>
>>> Also after fresh reboot with i40e
>>> startup settings:
>>> ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2
>>> enp3s0f3'
>>> for i in $ifc
>>> do
>>> ip link set up dev $i
>>> ethtool -A $i autoneg off rx off tx off
>>> ethtool -G $i rx 2048 tx 2048
>>> ip link set $i txqueuelen 1000
>>> #ethtool -C $i rx-usecs 256
>>> ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 17
>>> tx-usecs 125
>>> ethtool -L $i combined 6
>>> #ethtool -N $i rx-flow-hash udp4 sdfn
>>> #ethtool -K $i ntuple on
>>> #ethtool -K $i gro off
>>> #ethtool -K $i tso off
>>> done
>>>
>>>
>>> After issuing:
>>>
>>> ethtool -K enp2s0f0 gro on tso on
>>>
>>> dmesg shows
>>> [35764.338259] i40e 0000:02:00.0: PF reset failed, -15
>>>
>>>
>>> and no traffic on the card :)
>>>
>>>
>> Also checked now
>> bigger rx ring
>> ethtool -G $i rx 2048 tx 2048
>>
>>
>> Bigger memleag :)
>>
>>
>>
> ok need to change cards now to ixgbe .... no reply no help for i40e so
> ....
>
> maybee someone else with i40e will gather more data i have only this
> host soo far - will try to install this cards to other hosts after
> change but alll this movement will takes about 2 maybee 3 months -
> nobody from my team want to but now cards that supports i40e cause of
> this bug soo this is hard now to debug - i need to change also all
> cards now >10G to mellanox that have no such bug ... sorry :)
>
>
Last tests from my side:)
settings
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2
enp3s0f3'
for i in $ifc
do
ip link set up dev $i
ethtool -A $i autoneg off rx off tx off
ethtool -G $i rx 2048 tx 2048
ip link set $i txqueuelen 1000
ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 17
tx-usecs 125
ethtool -L $i combined 6
ethtool -K $i ntuple on
ethtool -K $i gro on
ethtool -K $i tso on
done
MEMLEAK 1-2MB/10secs
1 MB/10sec
2 MB/10sec
1 MB/10sec
2 MB/10sec
2 MB/10sec
2 MB/10sec
1 MB/10sec
2 MB/10sec
2 MB/10sec
2 MB/10sec
1 MB/10sec
2 MB/10sec
1 MB/10sec
1 MB/10sec
0 MB/10sec
2 MB/10sec
2 MB/10sec
0 MB/10sec
2 MB/10sec
5 MB/10sec
Change rx-usecs 16 tx usecs 16
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2
enp3s0f3'
for i in $ifc
do
ip link set up dev $i
ethtool -A $i autoneg off rx off tx off
ethtool -G $i rx 2048 tx 2048
ip link set $i txqueuelen 1000
ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 16
tx-usecs 16
ethtool -L $i combined 6
ethtool -K $i ntuple on
ethtool -K $i gro on
ethtool -K $i tso on
done
MEMLEAK: 0-2MB/s with some recycles
0 MB/10sec
0 MB/10sec
0 MB/10sec
0 MB/10sec
0 MB/10sec
0 MB/10sec
1 MB/10sec
0 MB/10sec
2 MB/10sec
0 MB/10sec
2 MB/10sec
-1 MB/10sec
0 MB/10sec
2 MB/10sec
0 MB/10sec
2 MB/10sec
-1 MB/10sec
1 MB/10sec
Powered by blists - more mailing lists