[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee05284e-3ab1-482f-a727-981b9fd4e9ee@intel.com>
Date: Mon, 7 Jul 2025 15:03:33 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
CC: Maciej Fijalkowski <maciej.fijalkowski@...el.com>, Jakub Kicinski
<kuba@...nel.org>, Przemek Kitszel <przemyslaw.kitszel@...el.com>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"Damato, Joe" <jdamato@...tly.com>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>, "Czapnik, Lukasz"
<lukasz.czapnik@...el.com>, "Dumazet, Eric" <edumazet@...gle.com>, "Zaki,
Ahmed" <ahmed.zaki@...el.com>, Martin Karsten <mkarsten@...terloo.ca>, "Igor
Raits" <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>, "Zdenek
Pesek" <zdenek.pesek@...ddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
driver after upgrade to 6.13.y (regression in commit 492a044508ad)
On 7/7/2025 11:32 AM, Jacob Keller wrote:
>
>
> On 7/3/2025 9:16 AM, Jacob Keller wrote:
>> On 7/2/2025 11:46 PM, Jaroslav Pulchart wrote:
>>>> think iperf doesn't do that, which might be part of whats causing this
>>>> issue. I'm going to try to see if I can generate such fragmentation to
>>>> confirm. Is your MTU kept at the default ethernet size?
>>>
>>> Our MTU size is set to 9000 everywhere.
>>>
>>
>> Ok. I am re-trying with MTU 9000 and using some traffic generated by wrk
>> now. I do see much larger memory use (~2GB) when using MTU 9000, so that
>> tracks with what your system shows. Currently its fluctuating between
>> 1.9 and 2G. I'll leave this going for a couple of days while on vacation
>> and see if anything pops up.
>>
>> Thanks,
>> Jake
>
> Good news! After several days of running a wrk and iperf3 workload with
> 9k MTU, I see a significant increase in the memory usage from the page
> allocations:
>
> 7.3G 953314 drivers/net/ethernet/intel/ice/ice_txrx.c:682 [ice]
> func:ice_alloc_mapped_page
>
> ~5GB extra.
>
> At least I can reproduce this now. Its unclear how long it took since I
> was out on vacation from Wednesday through until now.
>
> I do have a singular hypothesis regarding the way we're currently
> tracking the page count, (just based on differences between ice and
> i40e). I'm going to attempt to align with i40e and re-run the test.
> Hopefully I'll have some more information in a day or two.
Bad news: my hypothesis was incorrect.
Good news: I can immediately see the problem if I set MTU to 9K and
start an iperf3 session and just watch the count of allocations from
ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell
if a change is helping.
I ported the stats from i40e for tracking the page allocations, and I
can see that we're allocating new pages despite not actually performing
releases.
I don't yet have a good understanding of what causes this, and the logic
in ice is pretty hard to track...
I'm going to try the page pool patches myself to see if this test bed
triggers the same problems. Unfortunately I think I need someone else
with more experience with the hotpath code to help figure out whats
going wrong here...
Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (237 bytes)
Powered by blists - more mailing lists