[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8fFZ6FU1+1__FndEoFQgHqSXN+330qvNTWMvMfiXc2DpN8NQ@mail.gmail.com>
Date: Mon, 30 Jun 2025 22:01:06 +0200
From: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
To: Jacob Keller <jacob.e.keller@...el.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@...el.com>, Jakub Kicinski <kuba@...nel.org>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>, "Damato, Joe" <jdamato@...tly.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>,
"Czapnik, Lukasz" <lukasz.czapnik@...el.com>, "Dumazet, Eric" <edumazet@...gle.com>,
"Zaki, Ahmed" <ahmed.zaki@...el.com>, Martin Karsten <mkarsten@...terloo.ca>,
Igor Raits <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>,
Zdenek Pesek <zdenek.pesek@...ddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
driver after upgrade to 6.13.y (regression in commit 492a044508ad)
>
>
>
> On 6/30/2025 10:24 AM, Jaroslav Pulchart wrote:
> >>
> >>
> >>
> >> On 6/30/2025 12:35 AM, Jaroslav Pulchart wrote:
> >>>>
> >>>>>
> >>>>> On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
> >>>>>> Great, please send me a link to the related patch set. I can apply them in
> >>>>>> our kernel build and try them ASAP!
> >>>>>
> >>>>> Sorry if I'm repeating the question - have you tried
> >>>>> CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
> >>>>> is low enough to use it for production workloads.
> >>>>
> >>>> I try it now, the fresh booted server:
> >>>>
> >>>> # sort -g /proc/allocinfo| tail -n 15
> >>>> 45409728 236509 fs/dcache.c:1681 func:__d_alloc
> >>>> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> >>>> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
> >>>> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page
> >>>> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> >>>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> >>>> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
> >>>> 191594496 46776 mm/memory.c:1056 func:folio_prealloc
> >>>> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
> >>>> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page
> >>>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> >>>> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> >>>> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc
> >>>> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> >>>> [ice] func:ice_alloc_mapped_page
> >>>> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
> >>>>
> >>>
> >>> The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
> >>> func:ice_alloc_mapped_page" is just growing...
> >>>
> >>> # uptime ; sort -g /proc/allocinfo| tail -n 15
> >>> 09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81
> >>>
> >>> # sort -g /proc/allocinfo| tail -n 15
> >>> 85216896 443838 fs/dcache.c:1681 func:__d_alloc
> >>> 106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio
> >>> 116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> >>> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> >>> 143556608 6894 mm/slub.c:2452 func:alloc_slab_page
> >>> 186793984 45604 mm/memory.c:1056 func:folio_prealloc
> >>> 362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> >>> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> >>> 598237184 51309 mm/slub.c:2450 func:alloc_slab_page
> >>> 838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> >>> 929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio
> >>> 1034657792 252602 mm/memory.c:1054 func:folio_prealloc
> >>> 1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio
> >>> 1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio
> >>> 2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> >>> [ice] func:ice_alloc_mapped_page
> >>>
> >> ice_alloc_mapped_page is the function used to allocate the pages for the
> >> Rx ring buffers.
> >>
> >> There were a number of fixes for the hot path from Maciej which might be
> >> related. Although those fixes were primarily for XDP they do impact the
> >> regular hot path as well.
> >>
> >> These were fixes on top of work he did which landed in v6.13, so it
> >> seems plausible they might be related. In particular one which mentions
> >> a missing buffer put:
> >>
> >> 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
> >>
> >> It says the following:
> >>> While at it, address an error path of ice_add_xdp_frag() - we were
> >>> missing buffer putting from day 1 there.
> >>>
> >>
> >> It seems to me the issue must be somehow related to the buffer cleanup
> >> logic for the Rx ring, since thats the only thing allocated by
> >> ice_alloc_mapped_page.
> >>
> >> It might be something fixed with the work Maciej did.. but it seems very
> >> weird that 492a044508ad ("ice: Add support for persistent NAPI config")
> >> would affect that logic at all....
> >
> > I believe there were/are at least two separate issues. Regarding
> > commit 492a044508ad (“ice: Add support for persistent NAPI config”):
> > * On 6.13.y and 6.14.y kernels, this change prevented us from lowering
> > the driver’s initial, large memory allocation immediately after server
> > power-up. A few hours (max few days) later, this inevitably led to an
> > out-of-memory condition.
> > * Reverting the commit in those series only delayed the OOM, it
> > allowed the queue size (and thus memory footprint) to shrink on boot
> > just as it did in 6.12.y but didn’t eliminate the underlying 'leak'.
> > * In 6.15.y, however, that revert isn’t required (and isn’t even
> > applicable). The after boot allocation can once again be tuned down
> > without patching. Still, we observe the same increase in memory use
> > over time, as shown in the 'allocmap' output.
> > Thus, commit 492a044508ad led us down a false trail, or at the very
> > least hastened the inevitable OOM.
>
> That seems reasonable. I'm still surprised the specific commit leads to
> any large increase in memory, since it should only be a few bytes per
> NAPI. But there may be some related driver-specific issues.
Actually, the large base allocation has existed for quite some time,
the mentioned commit didn’t suddenly grow our memory usage, it only
prevented us from shrinking it via "ethtool -L <iface> combined
<small-number>"
after boot. In other words, we’re still stuck with the same big
allocation, we just can’t tune it down (till reverting the commit)
>
> Either way, we clearly need to isolate how we're leaking memory in the
> hot path. I think it might be related to the fixes from Maciej which are
> pretty recent so might not be in 6.13 or 6.14
I’m fine with the fix for the mainline (now 6.15.y), the 6.13.y and
6.14.y are already EOL. Could you please tell me which 6.15.y stable
release first incorporates that patch? Is it included in current
6.15.5, or will it arrive in a later point release?
Powered by blists - more mailing lists