lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ecpczt8q.fsf@redhat.com>
Date: Tue, 02 Dec 2025 18:30:29 +0100
From: Paolo Valerio <pvalerio@...hat.com>
To: Théo Lebrun <theo.lebrun@...tlin.com>,
 netdev@...r.kernel.org
Cc: Nicolas Ferre <nicolas.ferre@...rochip.com>, Claudiu Beznea
 <claudiu.beznea@...on.dev>, Andrew Lunn <andrew+netdev@...n.ch>, "David
 S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub
 Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Lorenzo
 Bianconi <lorenzo@...nel.org>, Grégory Clement
 <gregory.clement@...tlin.com>, Benoît Monin
 <benoit.monin@...tlin.com>, Thomas Petazzoni
 <thomas.petazzoni@...tlin.com>
Subject: Re: [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support

On 27 Nov 2025 at 02:21:52 PM, Théo Lebrun <theo.lebrun@...tlin.com> wrote:

> Hello Paolo,
>
> On Wed Nov 19, 2025 at 2:53 PM CET, Paolo Valerio wrote:
>> Subject: [PATCH RFC net-next 1/6] cadence: macb/gem: Add page pool support
>
> `git log --oneline drivers/net/ethernet/cadence/` tells us all commits
> in this series should be prefixed with "net: macb: ".
>

ack

>> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
>> index 87414a2ddf6e..dcf768bd1bc1 100644
>> --- a/drivers/net/ethernet/cadence/macb.h
>> +++ b/drivers/net/ethernet/cadence/macb.h
>> @@ -14,6 +14,8 @@
>>  #include <linux/interrupt.h>
>>  #include <linux/phy/phy.h>
>>  #include <linux/workqueue.h>
>> +#include <net/page_pool/helpers.h>
>> +#include <net/xdp.h>
>>  
>>  #define MACB_GREGS_NBR 16
>>  #define MACB_GREGS_VERSION 2
>> @@ -957,6 +959,10 @@ struct macb_dma_desc_ptp {
>>  /* Scaled PPM fraction */
>>  #define PPM_FRACTION	16
>>  
>> +/* The buf includes headroom compatible with both skb and xdpf */
>> +#define MACB_PP_HEADROOM		XDP_PACKET_HEADROOM
>> +#define MACB_PP_MAX_BUF_SIZE(num)	(((num) * PAGE_SIZE) - MACB_PP_HEADROOM)
>
> From my previous review, you know I think MACB_PP_MAX_BUF_SIZE() should
> disappear.
>
> I also don't see the point of MACB_PP_HEADROOM. Maybe if it was
> max(XDP_PACKET_HEADROOM, NET_SKB_PAD) it would be more useful, but that
> isn't useful anyway. Can we drop it and use XDP_PACKET_HEADROOM directly
> in the code?
>

sure

>>  /* struct macb_tx_skb - data about an skb which is being transmitted
>>   * @skb: skb currently being transmitted, only set for the last buffer
>>   *       of the frame
>> @@ -1262,10 +1268,11 @@ struct macb_queue {
>>  	unsigned int		rx_tail;
>>  	unsigned int		rx_prepared_head;
>>  	struct macb_dma_desc	*rx_ring;
>> -	struct sk_buff		**rx_skbuff;
>> +	void			**rx_buff;
>
> It would help review if the s/rx_skbuff/rx_buff/ renaming was done in a
> separate commit with a commit message being "this only renames X and
> implies no functional changes".
>

ack, will do

>>  	void			*rx_buffers;
>>  	struct napi_struct	napi_rx;
>>  	struct queue_stats stats;
>> +	struct page_pool	*page_pool;
>>  };
>>  
>>  struct ethtool_rx_fs_item {
>> @@ -1289,6 +1296,7 @@ struct macb {
>>  	struct macb_dma_desc	*rx_ring_tieoff;
>>  	dma_addr_t		rx_ring_tieoff_dma;
>>  	size_t			rx_buffer_size;
>> +	u16			rx_offset;
>
> u16 makes me worried that we might do mistakes. For example the
> following propagates the u16 type.
>
>         bp->rx_offset + data - page_address(page)
>
> We can spare the additional 6 bytes and turn it into a size_t. It'll
> fall in holes anyway, at least it does for my target according to pahole.
>

will do

>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>> index e461f5072884..985c81913ba6 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -1250,11 +1250,28 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>>  	return packets;
>>  }
>>  
>> -static void gem_rx_refill(struct macb_queue *queue)
>> +static void *gem_page_pool_get_buff(struct page_pool *pool,
>> +				    dma_addr_t *dma_addr, gfp_t gfp_mask)
>> +{
>> +	struct page *page;
>> +
>> +	if (!pool)
>> +		return NULL;
>> +
>> +	page = page_pool_alloc_pages(pool, gfp_mask | __GFP_NOWARN);
>> +	if (!page)
>> +		return NULL;
>> +
>> +	*dma_addr = page_pool_get_dma_addr(page) + MACB_PP_HEADROOM;
>> +
>> +	return page_address(page);
>> +}
>
> Do we need a separate function called from only one location? Or we
> could change its name to highlight it allocates and does not just "get"
> a buffer.
>

I guess it's fine to open-code this.

>> @@ -1267,25 +1284,17 @@ static void gem_rx_refill(struct macb_queue *queue)
>>  
>>  		desc = macb_rx_desc(queue, entry);
>>  
>> -		if (!queue->rx_skbuff[entry]) {
>> +		if (!queue->rx_buff[entry]) {
>>  			/* allocate sk_buff for this free entry in ring */
>> -			skb = netdev_alloc_skb(bp->dev, bp->rx_buffer_size);
>> -			if (unlikely(!skb)) {
>> +			data = gem_page_pool_get_buff(queue->page_pool, &paddr,
>> +						      napi ? GFP_ATOMIC : GFP_KERNEL);
>
> I don't get why the gfp flags computation is spread across
> gem_page_pool_get_buff() and gem_rx_refill().
>

Not sure I get the point here.
This will be open-coded, so atomic vs non-atomic flag will not be passed
to gem_page_pool_get_buff() anymore after the change.
Not sure this answer your question.

>> @@ -1349,12 +1344,16 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  		  int budget)
>>  {
>>  	struct macb *bp = queue->bp;
>> +	int			buffer_size;
>>  	unsigned int		len;
>>  	unsigned int		entry;
>> +	void			*data;
>>  	struct sk_buff		*skb;
>>  	struct macb_dma_desc	*desc;
>>  	int			count = 0;
>>  
>> +	buffer_size = DIV_ROUND_UP(bp->rx_buffer_size, PAGE_SIZE) * PAGE_SIZE;
>
> This looks like
>
>         buffer_size = ALIGN(bp->rx_buffer_size, PAGE_SIZE);
>
> no? Anyway I think it should be dropped. It does get dropped next patch
> in this RFC.
>

will proceed squashing the two patches

>> @@ -1387,24 +1386,49 @@ static int gem_rx(struct macb_queue *queue, struct napi_struct *napi,
>>  			queue->stats.rx_dropped++;
>>  			break;
>>  		}
>> -		skb = queue->rx_skbuff[entry];
>> -		if (unlikely(!skb)) {
>> +		data = queue->rx_buff[entry];
>> +		if (unlikely(!data)) {
>>  			netdev_err(bp->dev,
>>  				   "inconsistent Rx descriptor chain\n");
>>  			bp->dev->stats.rx_dropped++;
>>  			queue->stats.rx_dropped++;
>>  			break;
>>  		}
>> +
>> +		skb = napi_build_skb(data, buffer_size);
>> +		if (unlikely(!skb)) {
>> +			netdev_err(bp->dev,
>> +				   "Unable to allocate sk_buff\n");
>> +			page_pool_put_full_page(queue->page_pool,
>> +						virt_to_head_page(data),
>> +						false);
>> +			break;
>
> We should `queue->rx_skbuff[entry] = NULL` here no?
> We free a page and keep a pointer to it.
>

This will be squashed, but the point is still valid as it's the same as
the other patch

>> +		}
>> +
>> +		/* Properly align Ethernet header.
>> +		 *
>> +		 * Hardware can add dummy bytes if asked using the RBOF
>> +		 * field inside the NCFGR register. That feature isn't
>> +		 * available if hardware is RSC capable.
>> +		 *
>> +		 * We cannot fallback to doing the 2-byte shift before
>> +		 * DMA mapping because the address field does not allow
>> +		 * setting the low 2/3 bits.
>> +		 * It is 3 bits if HW_DMA_CAP_PTP, else 2 bits.
>> +		 */
>> +		skb_reserve(skb, bp->rx_offset);
>> +		skb_mark_for_recycle(skb);
>
> I have a platform with RSC support and NET_IP_ALIGN=2. What is yours
> like? It'd be nice if we can test different cases of this RBOF topic.
>

my caps:
macb 1f00100000.ethernet: Cadence caps 0x00548061

Bit RSC looks unset and NET_IP_ALIGN is 0 in my case.

>>  		/* now everything is ready for receiving packet */
>> -		queue->rx_skbuff[entry] = NULL;
>> +		queue->rx_buff[entry] = NULL;
>>  		len = ctrl & bp->rx_frm_len_mask;
>>  
>>  		netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
>>  
>> +		dma_sync_single_for_cpu(&bp->pdev->dev,
>> +					addr, len,
>> +					page_pool_get_dma_dir(queue->page_pool));
>
> Any reason for the call to dma_sync_single_for_cpu(), we could hardcode
> it to DMA_FROM_DEVICE no?
>

I saw in the other reply you found the answer

>> @@ -2477,13 +2497,13 @@ static int gem_alloc_rx_buffers(struct macb *bp)
>>  
>>  	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
>>  		size = bp->rx_ring_size * sizeof(struct sk_buff *);
>
> sizeof is called with the wrong type. Not that it matters because
> pointers are pointers, but we can take this opportunity to move to
>
>         sizeof(*queue->rx_buff)
>

definitely

>> -		queue->rx_skbuff = kzalloc(size, GFP_KERNEL);
>> -		if (!queue->rx_skbuff)
>> +		queue->rx_buff = kzalloc(size, GFP_KERNEL);
>> +		if (!queue->rx_buff)
>>  			return -ENOMEM;
>>  		else
>>  			netdev_dbg(bp->dev,
>> -				   "Allocated %d RX struct sk_buff entries at %p\n",
>> -				   bp->rx_ring_size, queue->rx_skbuff);
>> +				   "Allocated %d RX buff entries at %p\n",
>> +				   bp->rx_ring_size, queue->rx_buff);
>
> Opportunity to deindent this block?
>

ack

> Thanks,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ