[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <08fae312-2e3e-4622-94ab-7960accc8008@intel.com>
Date: Mon, 30 Jun 2025 14:56:28 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
CC: Maciej Fijalkowski <maciej.fijalkowski@...el.com>, Jakub Kicinski
<kuba@...nel.org>, Przemek Kitszel <przemyslaw.kitszel@...el.com>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"Damato, Joe" <jdamato@...tly.com>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>, "Czapnik, Lukasz"
<lukasz.czapnik@...el.com>, "Dumazet, Eric" <edumazet@...gle.com>, "Zaki,
Ahmed" <ahmed.zaki@...el.com>, Martin Karsten <mkarsten@...terloo.ca>, "Igor
Raits" <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>, "Zdenek
Pesek" <zdenek.pesek@...ddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
driver after upgrade to 6.13.y (regression in commit 492a044508ad)
On 6/30/2025 1:01 PM, Jaroslav Pulchart wrote:
>>
>>
>>
>> On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote:
>>>>
>>>>
>>>>
>>>> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote:
>>>>>>
>>>>>>>
>>>>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
>>>>>>>> Great, please send me a link to the related patch set. I can apply them in
>>>>>>>> our kernel build and try them ASAP!
>>>>>>>
>>>>>>> Sorry if I'm repeating the question - have you tried
>>>>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
>>>>>>> is low enough to use it for production workloads.
>>>>>>
>>>>>> I try it now, the fresh booted server:
>>>>>>
>>>>>> # sort -g /proc/allocinfo| tail -n 15
>>>>>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc
>>>>>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>>>>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
>>>>>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page
>>>>>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>>>>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
>>>>>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc
>>>>>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
>>>>>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page
>>>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
>>>>>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>>>>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc
>>>>>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>>>>> [ice] func:ice_alloc_mapped_page
>>>>>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
>>>>>>
>>>>>
>>>>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
>>>>> func:ice_alloc_mapped_page" is just growing...
>>>>>
>>>>> # uptime ; sort -g /proc/allocinfo| tail -n 15
>>>>> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81
>>>>>
>>>>> # sort -g /proc/allocinfo| tail -n 15
>>>>> 85216896 443838 fs/dcache.c:1681 func:__d_alloc
>>>>> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio
>>>>> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>>>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>>>>> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page
>>>>> 186793984 45604 mm/memory.c:1056 func:folio_prealloc
>>>>> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>>>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
>>>>> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page
>>>>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>>>>> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio
>>>>> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc
>>>>> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio
>>>>> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio
>>>>> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
>>>>> [ice] func:ice_alloc_mapped_page
>>>>>
>>>> ice_alloc_mapped_page is the function used to allocate the pages for the
>>>> Rx ring buffers.
>>>>
>>>> There were a number of fixes for the hot path from Maciej which might be
>>>> related. Although those fixes were primarily for XDP they do impact the
>>>> regular hot path as well.
>>>>
>>>> These were fixes on top of work he did which landed in v6.13, so it
>>>> seems plausible they might be related. In particular one which mentions
>>>> a missing buffer put:
>>>>
>>>> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
>>>>
>>>> It says the following:
>>>>> While at it, address an error path of ice_add_xdp_frag() - we were
>>>>> missing buffer putting from day 1 there.
>>>>>
>>>>
>>>> It seems to me the issue must be somehow related to the buffer cleanup
>>>> logic for the Rx ring, since thats the only thing allocated by
>>>> ice_alloc_mapped_page.
>>>>
>>>> It might be something fixed with the work Maciej did.. but it seems very
>>>> weird that 492a044508ad ("ice: Add support for persistent NAPI config")
>>>> would affect that logic at all....
>>>
>>> I believe there were/are at least two separate issues. Regarding
>>> commit 492a044508ad (“ice: Add support for persistent NAPI config”):
>>> * On 6.13.y and 6.14.y kernels, this change prevented us from lowering
>>> the driver’s initial, large memory allocation immediately after server
>>> power-up. A few hours (max few days) later, this inevitably led to an
>>> out-of-memory condition.
>>> * Reverting the commit in those series only delayed the OOM, it
>>> allowed the queue size (and thus memory footprint) to shrink on boot
>>> just as it did in 6.12.y but didn’t eliminate the underlying 'leak'.
>>> * In 6.15.y, however, that revert isn’t required (and isn’t even
>>> applicable). The after boot allocation can once again be tuned down
>>> without patching. Still, we observe the same increase in memory use
>>> over time, as shown in the 'allocmap' output.
>>> Thus, commit 492a044508ad led us down a false trail, or at the very
>>> least hastened the inevitable OOM.
>>
>> That seems reasonable. I'm still surprised the specific commit leads to
>> any large increase in memory, since it should only be a few bytes per
>> NAPI. But there may be some related driver-specific issues.
>
> Actually, the large base allocation has existed for quite some time,
> the mentioned commit didn’t suddenly grow our memory usage, it only
> prevented us from shrinking it via "ethtool -L <iface> combined
> <small-number>"
> after boot. In other words, we’re still stuck with the same big
> allocation, we just can’t tune it down (till reverting the commit)
>
>>
>> Either way, we clearly need to isolate how we're leaking memory in the
>> hot path. I think it might be related to the fixes from Maciej which are
>> pretty recent so might not be in 6.13 or 6.14
>
> I’m fine with the fix for the mainline (now 6.15.y), the 6.13.y and
> 6.14.y are already EOL. Could you please tell me which 6.15.y stable
> release first incorporates that patch? Is it included in current
> 6.15.5, or will it arrive in a later point release?
Unfortunately it looks like the fix I mentioned has landed in 6.14, so
its not a fix for your issue (since you mentioned 6.14 has failed
testing in your system)
$ git describe --first-parent --contains --match=v* --exclude=*rc*
743bbd93cf29f653fae0e1416a31f03231689911
v6.14~251^2~15^2~2
I don't see any other relevant changes since v6.14. I can try to see if
I see similar issues with CONFIG_MEM_ALLOC_PROFILING on some test
systems here.
Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (237 bytes)
Powered by blists - more mailing lists