[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8fFZ5rS8Xg11LvyQHzFh3aVHbKdRHpuhrpV_Wc7oYRcMZFRA@mail.gmail.com>
Date: Mon, 30 Jun 2025 09:35:09 +0200
From: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Przemek Kitszel <przemyslaw.kitszel@...el.com>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"Keller, Jacob E" <jacob.e.keller@...el.com>, "Damato, Joe" <jdamato@...tly.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>,
"Czapnik, Lukasz" <lukasz.czapnik@...el.com>, "Dumazet, Eric" <edumazet@...gle.com>,
"Zaki, Ahmed" <ahmed.zaki@...el.com>, Martin Karsten <mkarsten@...terloo.ca>,
Igor Raits <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>,
Zdenek Pesek <zdenek.pesek@...ddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
driver after upgrade to 6.13.y (regression in commit 492a044508ad)
>
> >
> > On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
> > > Great, please send me a link to the related patch set. I can apply them in
> > > our kernel build and try them ASAP!
> >
> > Sorry if I'm repeating the question - have you tried
> > CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
> > is low enough to use it for production workloads.
>
> I try it now, the fresh booted server:
>
> # sort -g /proc/allocinfo| tail -n 15
> 45409728 236509 fs/dcache.c:1681 func:__d_alloc
> 71041024 17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
> 71524352 11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
> 85098496 4486 mm/slub.c:2452 func:alloc_slab_page
> 115470992 101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
> 134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
> 141426688 34528 mm/filemap.c:1978 func:__filemap_get_folio
> 191594496 46776 mm/memory.c:1056 func:folio_prealloc
> 360710144 172 mm/khugepaged.c:1084 func:alloc_charge_folio
> 444076032 33790 mm/slub.c:2450 func:alloc_slab_page
> 530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
> 975175680 465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
> 1022427136 249616 mm/memory.c:1054 func:folio_prealloc
> 1105125376 139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> [ice] func:ice_alloc_mapped_page
> 1621598208 395848 mm/readahead.c:186 func:ractl_alloc_folio
>
The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
func:ice_alloc_mapped_page" is just growing...
# uptime ; sort -g /proc/allocinfo| tail -n 15
09:33:58 up 4 days, 6 min, 1 user, load average: 6.65, 8.18, 9.81
# sort -g /proc/allocinfo| tail -n 15
85216896 443838 fs/dcache.c:1681 func:__d_alloc
106156032 25917 mm/shmem.c:1854 func:shmem_alloc_folio
116850096 102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
134479872 32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
143556608 6894 mm/slub.c:2452 func:alloc_slab_page
186793984 45604 mm/memory.c:1056 func:folio_prealloc
362807296 88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
530579456 129536 mm/page_ext.c:271 func:alloc_page_ext
598237184 51309 mm/slub.c:2450 func:alloc_slab_page
838860800 400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
929083392 226827 mm/filemap.c:1978 func:__filemap_get_folio
1034657792 252602 mm/memory.c:1054 func:folio_prealloc
1262485504 602 mm/khugepaged.c:1084 func:alloc_charge_folio
1335377920 325970 mm/readahead.c:186 func:ractl_alloc_folio
2544877568 315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
[ice] func:ice_alloc_mapped_page
>
> >
> > > st 25. 6. 2025 v 16:03 odesÃlatel Przemek Kitszel <
> > > przemyslaw.kitszel@...el.com> napsal:
> > >
> > > > On 6/25/25 14:17, Jaroslav Pulchart wrote:
> > > > > Hello
> > > > >
> > > > > We are still facing the memory issue with Intel 810 NICs (even on latest
> > > > > 6.15.y).
> > > > >
> > > > > Our current stabilization and solution is to move everything to a new
> > > > > INTEL-FREE server and get rid of last Intel sights there (after Intel's
> > > > > CPU vulnerabilities fuckups NICs are next step).
> > > > >
> > > > > Any help welcomed,
> > > > > Jaroslav P.
> > > > >
> > > > >
> > > >
> > > > Thank you for urging us, I can understand the frustration.
> > > >
> > > > We have identified some (unrelated) memory leaks, will soon ship fixes.
> > > > And, as there were no clear issue with any commit/version you have
> > > > posted to be a culprit, there is a chance that our random findings could
> > > > help. Anyway going to zero kmemleak reports is good in itself, that is
> > > > a good start.
> > > >
> > > > Will ask my VAL too to increase efforts in this area too.
> >
Powered by blists - more mailing lists