netdev - Re: [PATCH net-next 00/15] net: page_pool: add netlink-based introspection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231109081412.161ce68f@kernel.org>
Date: Thu, 9 Nov 2023 08:14:12 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Ilias Apalodimas <ilias.apalodimas@...aro.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
 pabeni@...hat.com, almasrymina@...gle.com, hawk@...nel.org
Subject: Re: [PATCH net-next 00/15] net: page_pool: add netlink-based
 introspection

On Thu, 9 Nov 2023 10:11:47 +0200 Ilias Apalodimas wrote:
> > We immediately run into page pool leaks both real and false positive
> > warnings. As Eric pointed out/predicted there's no guarantee that
> > applications will read / close their sockets so a page pool page
> > may be stuck in a socket (but not leaked) forever. This happens
> > a lot in our fleet. Most of these are obviously due to application
> > bugs but we should not be printing kernel warnings due to minor
> > application resource leaks.  
> 
> Fair enough, I guess you mean 'continuous warnings'?

Yes, in this case but I'm making a general statement.
Or do you mean that there's a typo / grammar issue?

> > Conversely the page pool memory may get leaked at runtime, and
> > we have no way to detect / track that, unless someone reconfigures
> > the NIC and destroys the page pools which leaked the pages.
> >
> > The solution presented here is to expose the memory use of page
> > pools via netlink. This allows for continuous monitoring of memory
> > used by page pools, regardless if they were destroyed or not.
> > Sample in patch 15 can print the memory use and recycling
> > efficiency:
> >
> > $ ./page-pool
> >     eth0[2]     page pools: 10 (zombies: 0)
> >                 refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
> >                 recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)
> 
> That's reasonable, and the recycling rate is pretty impressive. 

This is just from a test machine, fresh boot, maybe a short iperf run,
I don't remember now :) In any case not real workload.

> Any idea how that translated to enhancements overall? mem/cpu pressure etc

I haven't collected much prod data at this stage, I'm hoping to add
this to the internal kernel and then do a more thorough investigation.