lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250728182020.GA29111@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
Date: Mon, 28 Jul 2025 11:20:20 -0700
From: Dipayaan Roy <dipayanroy@...ux.microsoft.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: horms@...nel.org, kys@...rosoft.com, haiyangz@...rosoft.com,
	wei.liu@...nel.org, decui@...rosoft.com, andrew+netdev@...n.ch,
	davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com,
	longli@...rosoft.com, kotaranov@...rosoft.com, ast@...nel.org,
	daniel@...earbox.net, hawk@...nel.org, john.fastabend@...il.com,
	sdf@...ichev.me, lorenzo@...nel.org, michal.kubiak@...el.com,
	ernis@...ux.microsoft.com, shradhagupta@...ux.microsoft.com,
	shirazsaleem@...rosoft.com, rosenp@...il.com,
	netdev@...r.kernel.org, linux-hyperv@...r.kernel.org,
	linux-rdma@...r.kernel.org, bpf@...r.kernel.org,
	linux-kernel@...r.kernel.org, ssengar@...ux.microsoft.com,
	dipayanroy@...rosoft.com
Subject: Re: [PATCH v2] net: mana: Use page pool fragments for RX buffers
 instead of full pages to improve memory efficiency.

On Fri, Jul 25, 2025 at 05:54:18PM -0700, Jakub Kicinski wrote:
> On Wed, 23 Jul 2025 12:07:06 -0700 Dipayaan Roy wrote:
> > This patch enhances RX buffer handling in the mana driver by allocating
> > pages from a page pool and slicing them into MTU-sized fragments, rather
> > than dedicating a full page per packet. This approach is especially
> > beneficial on systems with large page sizes like 64KB.
> > 
> > Key improvements:
> > 
> > - Proper integration of page pool for RX buffer allocations.
> > - MTU-sized buffer slicing to improve memory utilization.
> > - Reduce overall per Rx queue memory footprint.
> > - Automatic fallback to full-page buffers when:
> >    * Jumbo frames are enabled (MTU > PAGE_SIZE / 2).
> >    * The XDP path is active, to avoid complexities with fragment reuse.
> > - Removal of redundant pre-allocated RX buffers used in scenarios like MTU
> >   changes, ensuring consistency in RX buffer allocation.
> > 
> > Testing on VMs with 64KB pages shows around 200% throughput improvement.
> > Memory efficiency is significantly improved due to reduced wastage in page
> > allocations. Example: We are now able to fit 35 rx buffers in a single 64kb
> > page for MTU size of 1500, instead of 1 rx buffer per page previously.
> 
> The diff is pretty large and messy, please try to extract some
> refactoring patches that make the final transition easier to review.
> 
> > - iperf3, iperf2, and nttcp benchmarks.
> > - Jumbo frames with MTU 9000.
> > - Native XDP programs (XDP_PASS, XDP_DROP, XDP_TX, XDP_REDIRECT) for
> >   testing the XDP path in driver.
> > - Page leak detection (kmemleak).
> 
> kmemleak doesn't detect page leaks AFAIU, just slab objects
> 
> > - Driver load/unload, reboot, and stress scenarios.
> > 
> > Signed-off-by: Dipayaan Roy <dipayanroy@...ux.microsoft.com>
> > 
> > Reviewed-by: Jacob Keller <jacob.e.keller@...el.com>
> > Reviewed-by: Saurabh Sengar <ssengar@...ux.microsoft.com>
> 
> > -	if (apc->port_is_up)
> > +	if (apc->port_is_up) {
> > +		/* Re-create rxq's after xdp prog was loaded or unloaded.
> > +		 * Ex: re create rxq's to switch from full pages to smaller
> > +		 * size page fragments when xdp prog is unloaded and vice-versa.
> > +		 */
> > +
> > +		err = mana_detach(ndev, false);
> > +		if (err) {
> > +			netdev_err(ndev, "mana_detach failed at xdp set: %d\n", err);
> > +			goto out;
> > +		}
> > +
> > +		err = mana_attach(ndev);
> > +		if (err) {
> > +			netdev_err(ndev, "mana_attach failed at xdp set: %d\n", err);
> > +			goto out;
> > +		}
> 
> If the system is low on memory you will make it unreachable.
> It's a very poor design.
> 
> > -/* Release pre-allocated RX buffers */
> > -void mana_pre_dealloc_rxbufs(struct mana_port_context *mpc)
> > -{
> > -	struct device *dev;
> > -	int i;
> > -
> 
> Looks like you're deleting the infrastructure the driver had for
> pre-allocating memory. Not even mentioning it in the commit message.
> This ability needs to be maintain. Please test with memory allocation
> injections and make sure the driver survives failed reconfig requests.
> The reconfiguration should be cleanly rejected if mem alloc fails,
> and the driver should continue to work with old settings in place.
> -- 
> pw-bot: cr

Hi Jakub,

Thanks for the review. I agree with your point on low memory during the
reconfig of mana driver. I am sending out a v3 that will not touch the
driver infrastructure for pre-allocating the pages for rx buffer to avoid
failure in  mana driver reconfig due to low memory scenarios. And make
sure all other mana driver reconfig path uses the pre-alloc rx buffers as
a safe guard for the low memory condition.

The v3 will be a single patch focusing only on the improvement in memory
utilization and throughput that it is trying to achieve.


Thanks
Dipayaan Roy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ