lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250723183517.GB25631@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
Date: Wed, 23 Jul 2025 11:35:17 -0700
From: Dipayaan Roy <dipayanroy@...ux.microsoft.com>
To: Simon Horman <horms@...nel.org>
Cc: kuba@...nel.org, kys@...rosoft.com, haiyangz@...rosoft.com,
	wei.liu@...nel.org, decui@...rosoft.com, andrew+netdev@...n.ch,
	davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com,
	longli@...rosoft.com, kotaranov@...rosoft.com, horms@...nel.org,
	ast@...nel.org, daniel@...earbox.net, hawk@...nel.org,
	john.fastabend@...il.com, sdf@...ichev.me, lorenzo@...nel.org,
	michal.kubiak@...el.com, ernis@...ux.microsoft.com,
	shradhagupta@...ux.microsoft.com, shirazsaleem@...rosoft.com,
	rosenp@...il.com, netdev@...r.kernel.org,
	linux-hyperv@...r.kernel.org, linux-rdma@...r.kernel.org,
	bpf@...r.kernel.org, dipayanroy@...ux.microsoft.com,
	ssengar@...ux.microsoft.com
Subject: Re: [PATCH] net: mana: Use page pool fragments for RX buffers
 instead of full pages to improve memory efficiency and throughput.

Hi Simon,

Thank you for the review, I am sending
out v2 to address these comments.


Regards
Dipayaan Roy


On Tue, Jul 22, 2025 at 12:19:29PM +0100, Simon Horman wrote:
> On Mon, Jul 21, 2025 at 03:14:17AM -0700, Dipayaan Roy wrote:
> > This patch enhances RX buffer handling in the mana driver by allocating
> > pages from a page pool and slicing them into MTU-sized fragments, rather
> > than dedicating a full page per packet. This approach is especially
> > beneficial on systems with 64KB page sizes.
> > 
> > Key improvements:
> > 
> > - Proper integration of page pool for RX buffer allocations.
> > - MTU-sized buffer slicing to improve memory utilization.
> > - Reduce overall per Rx queue memory footprint.
> > - Automatic fallback to full-page buffers when:
> >    * Jumbo frames are enabled (MTU > PAGE_SIZE / 2).
> >    * The XDP path is active, to avoid complexities with fragment reuse.
> > - Removal of redundant pre-allocated RX buffers used in scenarios like MTU
> >   changes, ensuring consistency in RX buffer allocation.
> > 
> > Testing on VMs with 64KB pages shows around 200% throughput improvement.
> > Memory efficiency is significantly improved due to reduced wastage in page
> > allocations. Example: We are now able to fit 35 Rx buffers in a single 64KB
> > page for MTU size of 1500, instead of 1 Rx buffer per page previously.
> > 
> > Tested:
> > 
> > - iperf3, iperf2, and nttcp benchmarks.
> > - Jumbo frames with MTU 9000.
> > - Native XDP programs (XDP_PASS, XDP_DROP, XDP_TX, XDP_REDIRECT) for
> >   testing the driver???s XDP path.
> > - Page leak detection (kmemleak).
> > - Driver load/unload, reboot, and stress scenarios.
> > 
> > Signed-off-by: Dipayaan Roy <dipayanroy@...ux.microsoft.com>
> 
> Hi,
> 
> Some minor feedback from my side.
> 
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> 
> ...
> 
> > -int mana_pre_alloc_rxbufs(struct mana_port_context *mpc, int new_mtu, int num_queues)
> > -{
> > -	struct device *dev;
> > -	struct page *page;
> > -	dma_addr_t da;
> > -	int num_rxb;
> > -	void *va;
> > -	int i;
> > -
> > -	mana_get_rxbuf_cfg(new_mtu, &mpc->rxbpre_datasize,
> > -			   &mpc->rxbpre_alloc_size, &mpc->rxbpre_headroom);
> > -
> > -	dev = mpc->ac->gdma_dev->gdma_context->dev;
> > -
> > -	num_rxb = num_queues * mpc->rx_queue_size;
> > -
> > -	WARN(mpc->rxbufs_pre, "mana rxbufs_pre exists\n");
> > -	mpc->rxbufs_pre = kmalloc_array(num_rxb, sizeof(void *), GFP_KERNEL);
> > -	if (!mpc->rxbufs_pre)
> > -		goto error;
> >  
> > -	mpc->das_pre = kmalloc_array(num_rxb, sizeof(dma_addr_t), GFP_KERNEL);
> > -	if (!mpc->das_pre)
> > -		goto error;
> > -
> > -	mpc->rxbpre_total = 0;
> > -
> > -	for (i = 0; i < num_rxb; i++) {
> > -		page = dev_alloc_pages(get_order(mpc->rxbpre_alloc_size));
> > -		if (!page)
> > -			goto error;
> > -
> > -		va = page_to_virt(page);
> > -
> > -		da = dma_map_single(dev, va + mpc->rxbpre_headroom,
> > -				    mpc->rxbpre_datasize, DMA_FROM_DEVICE);
> > -		if (dma_mapping_error(dev, da)) {
> > -			put_page(page);
> > -			goto error;
> > +	/* For xdp and jumbo frames make sure only one packet fits per page */
> > +	if (((mtu + MANA_RXBUF_PAD) > PAGE_SIZE / 2) || rcu_access_pointer(apc->bpf_prog)) {
> 
> The line above seems to have unnecessary parentheses.
> And should be line wrapped to be 80 columns wide or less,
> as is still preferred by Networking code.
> The latter condition is flagged by checkpatch.pl --max-line-length=80
> 
> 	if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 ||
> 	    rcu_access_pointer(apc->bpf_prog)) {
> 
> (The above is completely untested)
> 
> Also, I am a little confused by the use of rcu_access_pointer()
> above and below, as bpf_prog does not seem to be managed by RCU.
> 
> Flagged by Sparse.
> 
> > +		if (rcu_access_pointer(apc->bpf_prog)) {
> > +			*headroom = XDP_PACKET_HEADROOM;
> > +			*alloc_size = PAGE_SIZE;
> > +		} else {
> > +			*headroom = 0; /* no support for XDP */
> > +			*alloc_size = SKB_DATA_ALIGN(mtu + MANA_RXBUF_PAD + *headroom);
> >  		}
> >  
> > -		mpc->rxbufs_pre[i] = va;
> > -		mpc->das_pre[i] = da;
> > -		mpc->rxbpre_total = i + 1;
> > +		*frag_count = 1;
> > +		return;
> >  	}
> >  
> > -	return 0;
> > +	/* Standard MTU case - optimize for multiple packets per page */
> > +	*headroom = 0;
> >  
> > -error:
> > -	netdev_err(mpc->ndev, "Failed to pre-allocate RX buffers for %d queues\n", num_queues);
> > -	mana_pre_dealloc_rxbufs(mpc);
> > -	return -ENOMEM;
> > +	/* Calculate base buffer size needed */
> > +	u32 len = SKB_DATA_ALIGN(mtu + MANA_RXBUF_PAD + *headroom);
> > +	u32 buf_size = ALIGN(len, MANA_RX_FRAG_ALIGNMENT);
> > +
> > +	/* Calculate how many packets can fit in a page */
> > +	*frag_count = PAGE_SIZE / buf_size;
> > +	*alloc_size = buf_size;
> >  }
> 
> ...
> 
> > -static void *mana_get_rxfrag(struct mana_rxq *rxq, struct device *dev,
> > -			     dma_addr_t *da, bool *from_pool)
> > +static void *mana_get_rxfrag(struct mana_rxq *rxq,
> > +			     struct device *dev, dma_addr_t *da, bool *from_pool)
> >  {
> >  	struct page *page;
> > +	u32 offset;
> >  	void *va;
> > -
> >  	*from_pool = false;
> >  
> > -	/* Reuse XDP dropped page if available */
> > -	if (rxq->xdp_save_va) {
> > -		va = rxq->xdp_save_va;
> > -		rxq->xdp_save_va = NULL;
> > -	} else {
> > -		page = page_pool_dev_alloc_pages(rxq->page_pool);
> > -		if (!page)
> > +	/* Don't use fragments for jumbo frames or XDP (i.e when fragment = 1 per page) */
> > +	if (rxq->frag_count == 1) {
> > +		/* Reuse XDP dropped page if available */
> > +		if (rxq->xdp_save_va) {
> > +			va = rxq->xdp_save_va;
> > +			rxq->xdp_save_va = NULL;
> > +		} else {
> > +			page = page_pool_dev_alloc_pages(rxq->page_pool);
> > +			if (!page)
> > +				return NULL;
> > +
> > +			*from_pool = true;
> > +			va = page_to_virt(page);
> > +		}
> > +
> > +		*da = dma_map_single(dev, va + rxq->headroom, rxq->datasize,
> > +				     DMA_FROM_DEVICE);
> > +		if (dma_mapping_error(dev, *da)) {
> > +			if (*from_pool)
> > +				page_pool_put_full_page(rxq->page_pool, page, false);
> > +			else
> > +				put_page(virt_to_head_page(va));
> 
> The put logic above seems to appear in this patch
> more than once. IMHO a helper would be nice.
> 
> > +
> >  			return NULL;
> > +		}
> >  
> > -		*from_pool = true;
> > -		va = page_to_virt(page);
> > +		return va;
> >  	}
> 
> ...
> 
> -- 
> pw-bot: changes-requested

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ