netdev - Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAK8fFZ5rS8Xg11LvyQHzFh3aVHbKdRHpuhrpV_Wc7oYRcMZFRA@mail.gmail.com>
Date: Mon, 30 Jun 2025 09:35:09 +0200
From: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Przemek Kitszel <przemyslaw.kitszel@...el.com>, 
	"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>, 
	"Keller, Jacob E" <jacob.e.keller@...el.com>, "Damato, Joe" <jdamato@...tly.com>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>, 
	Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>, 
	"Czapnik, Lukasz" <lukasz.czapnik@...el.com>, "Dumazet, Eric" <edumazet@...gle.com>, 
	"Zaki, Ahmed" <ahmed.zaki@...el.com>, Martin Karsten <mkarsten@...terloo.ca>, 
	Igor Raits <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>, 
	Zdenek Pesek <zdenek.pesek@...ddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
 driver after upgrade to 6.13.y (regression in commit 492a044508ad)

>
> >
> > On Wed, 25 Jun 2025 19:51:08 +0200 Jaroslav Pulchart wrote:
> > > Great, please send me a link to the related patch set. I can apply them in
> > > our kernel build and try them ASAP!
> >
> > Sorry if I'm repeating the question - have you tried
> > CONFIG_MEM_ALLOC_PROFILING? Reportedly the overhead in recent kernels
> > is low enough to use it for production workloads.
>
> I try it now, the fresh booted server:
>
> # sort -g /proc/allocinfo| tail -n 15
>     45409728   236509 fs/dcache.c:1681 func:__d_alloc
>     71041024    17344 mm/percpu-vm.c:95 func:pcpu_alloc_pages
>     71524352    11140 kernel/dma/direct.c:141 func:__dma_direct_alloc_pages
>     85098496     4486 mm/slub.c:2452 func:alloc_slab_page
>    115470992   101647 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
>    134479872    32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
>    141426688    34528 mm/filemap.c:1978 func:__filemap_get_folio
>    191594496    46776 mm/memory.c:1056 func:folio_prealloc
>    360710144      172 mm/khugepaged.c:1084 func:alloc_charge_folio
>    444076032    33790 mm/slub.c:2450 func:alloc_slab_page
>    530579456   129536 mm/page_ext.c:271 func:alloc_page_ext
>    975175680      465 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
>   1022427136   249616 mm/memory.c:1054 func:folio_prealloc
>   1105125376   139252 drivers/net/ethernet/intel/ice/ice_txrx.c:681
> [ice] func:ice_alloc_mapped_page
>   1621598208   395848 mm/readahead.c:186 func:ractl_alloc_folio
>

The "drivers/net/ethernet/intel/ice/ice_txrx.c:681 [ice]
func:ice_alloc_mapped_page" is just growing...

# uptime ; sort -g /proc/allocinfo| tail -n 15
 09:33:58 up 4 days, 6 min,  1 user,  load average: 6.65, 8.18, 9.81

# sort -g /proc/allocinfo| tail -n 15
    85216896   443838 fs/dcache.c:1681 func:__d_alloc
   106156032    25917 mm/shmem.c:1854 func:shmem_alloc_folio
   116850096   102861 fs/ext4/super.c:1388 [ext4] func:ext4_alloc_inode
   134479872    32832 kernel/events/ring_buffer.c:811 func:perf_mmap_alloc_page
   143556608     6894 mm/slub.c:2452 func:alloc_slab_page
   186793984    45604 mm/memory.c:1056 func:folio_prealloc
   362807296    88576 mm/percpu-vm.c:95 func:pcpu_alloc_pages
   530579456   129536 mm/page_ext.c:271 func:alloc_page_ext
   598237184    51309 mm/slub.c:2450 func:alloc_slab_page
   838860800      400 mm/huge_memory.c:1165 func:vma_alloc_anon_folio_pmd
   929083392   226827 mm/filemap.c:1978 func:__filemap_get_folio
  1034657792   252602 mm/memory.c:1054 func:folio_prealloc
  1262485504      602 mm/khugepaged.c:1084 func:alloc_charge_folio
  1335377920   325970 mm/readahead.c:186 func:ractl_alloc_folio
  2544877568   315003 drivers/net/ethernet/intel/ice/ice_txrx.c:681
[ice] func:ice_alloc_mapped_page

>
> >
> > > st 25. 6. 2025 v 16:03 odesílatel Przemek Kitszel <
> > > przemyslaw.kitszel@...el.com> napsal:
> > >
> > > > On 6/25/25 14:17, Jaroslav Pulchart wrote:
> > > > > Hello
> > > > >
> > > > > We are still facing the memory issue with Intel 810 NICs (even on latest
> > > > > 6.15.y).
> > > > >
> > > > > Our current stabilization and solution is to move everything to a new
> > > > > INTEL-FREE server and get rid of last Intel sights there (after Intel's
> > > > > CPU vulnerabilities fuckups NICs are next step).
> > > > >
> > > > > Any help welcomed,
> > > > > Jaroslav P.
> > > > >
> > > > >
> > > >
> > > > Thank you for urging us, I can understand the frustration.
> > > >
> > > > We have identified some (unrelated) memory leaks, will soon ship fixes.
> > > > And, as there were no clear issue with any commit/version you have
> > > > posted to be a culprit, there is a chance that our random findings could
> > > > help. Anyway going to zero kmemleak reports is good in itself, that is
> > > > a good start.
> > > >
> > > > Will ask my VAL too to increase efforts in this area too.
> >