[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87fr9szti5.fsf@redhat.com>
Date: Tue, 02 Dec 2025 18:24:50 +0100
From: Paolo Valerio <pvalerio@...hat.com>
To: Théo Lebrun <theo.lebrun@...tlin.com>,
netdev@...r.kernel.org
Cc: Nicolas Ferre <nicolas.ferre@...rochip.com>, Claudiu Beznea
<claudiu.beznea@...on.dev>, Andrew Lunn <andrew+netdev@...n.ch>, "David
S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub
Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Lorenzo
Bianconi <lorenzo@...nel.org>
Subject: Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page
pool integration
Hello Théo,
thank you for the feedback
On 26 Nov 2025 at 07:08:14 PM, Théo Lebrun <theo.lebrun@...tlin.com> wrote:
> Hello Paolo,
>
> So this is an initial review, I'll start here with five series-wide
> topics and send small per-line comments (ie nitpicks) in a second stage.
>
>
>
> ### Rx buffer size computation
>
> The buffer size computation should be reworked. At the end of the series
> it looks like:
>
> static int macb_open(struct net_device *dev)
> {
> size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN;
>
> // ...
>
> macb_init_rx_buffer_size(bp, bufsz);
>
> // ...
> }
>
> static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
> {
> if (!macb_is_gem(bp)) {
> bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
> } else {
> bp->rx_buffer_size = size
> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
> + MACB_PP_HEADROOM;
>
> if (bp->rx_buffer_size > PAGE_SIZE)
> bp->rx_buffer_size = PAGE_SIZE;
>
> if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE)
> bp->rx_buffer_size = roundup(bp->rx_buffer_size, RX_BUFFER_MULTIPLE);
> }
> }
>
> Most of the issues with this code is not stemming from your series, but
> this big rework is the right moment to fix it all.
>
> - NET_IP_ALIGN is accounted for in the headroom even though it isn't
> present if !RSC.
>
that's something I noticed and I was a unsure about the reason.
> - When skbuff.c/h allocates an SKB buffer, it SKB_DATA_ALIGN()s
> headroom+data. We should probably do the same. In our case it would
> be:
>
> bp->rx_buffer_size = SKB_DATA_ALIGN(MACB_PP_HEADROOM + size) +
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> // or
> bp->rx_buffer_size = SKB_HEAD_ALIGN(MACB_PP_HEADROOM + size);
>
> I've not computed if it can ever give a different value to your
> series in terms of total size or shinfo alignment. I'd guess it is
> unlikely.
>
I'll take a look at this
> - If the size clamping to PAGE_SIZE comes into play, we are probably
> doomed. It means we cannot deal with the MTU and we'll probably get
> corruption. If we do put a check in place, it should loudly fail
> rather than silently clamp.
>
That should not happen, unless I'm missing something.
E.g., 9000B mtu on a 4K PAGE_SIZE kernel should be handled with multiple
descriptors. The clamping is there because according with how the series
creates the pool, the maximum buffer size is page order 0.
Hardware-wise bp->rx_buffer_size should also be taken into account for
the receive buffer size.
> TLDR: I think macb_init_rx_buffer_size() should encapsulate the whole
> rx buffer size computation. It can use bp->rx_offset and add on top
> MTU & co. It might start failing if >PAGE_SIZE (?).
>
ack, I'll move that part
>
>
> ### Buffer variable names
>
> Related: so many variables, fields or constants have ambiguous names,
> can we do something about it?
>
> - bp->rx_offset is named oddly to my ears. Offset to what?
> Maybe bp->rx_head or bp->rx_headroom?
>
bp->rx_headroom sounds a good choice to me, but if you have a stronger
preference for bp->rx_head just let me know.
> - bp->rx_buffer_size: it used to be approximately the payload size
> (plus NET_IP_ALIGN). Now it contains the XDP headroom and shinfo.
> That's on GEM, because on MACB it means something different.
>
> This line is a bit ironic and illustrates the trouble:
> buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset
>
> - data in gem_rx|gem_rx_refill|gem_free_rx_buffers() points to what we
> could call skb->head, ie start of buffer & start of XDP headroom.
> Its name implied skb->data to me, ie after the headroom.
>
> It also made `data - page_address(page) + bp->rx_offset` hard to
> understand. It is easier for me to understand that the following is
> the page fragment offset till skb->data:
>
> buff_head + bp->rx_headroom - page_address(page)
>
ack, will change this
> - MACB_MAX_PAD is ambiguous. It rhymes with NET_SKB_PAD which is the
> default SKB headroom but here it contains part of the headroom
> (XDP_PACKET_HEADROOM but not NET_IP_ALIGN) and the tailroom (shinfo).
>
uhm, looking at this, I think NET_IP_ALIGN should be part of it (for the
!rsc case).
I'll revisit this part, though.
> - Field `data` in `struct macb_tx_buff` points to skb/xdp_frame but my
> initial thought was skb->data pointer (ie after headroom).
> What about va or ptr or buff or frame or ...?
>
I see. At some point I considered buff, but then I realized
tx_buff->buff was not perfect, hence data :)
I guess one between frame or va works, thanks.
> TLDR: I had a hard time understanding size/offset expressions (both from
> existing code and the series) because of variable names.
>
ack. Will revisit this aspect based on your suggestions.
>
>
> ### Duplicated buffer size computations
>
> Last point related to buffer size computation:
>
> - it is duplicated in macb_set_mtu() but forgets NET_IP_ALIGN & proper
> SKB_DATA_ALIGN() and,
>
> - it is duplicated in gem_xdp_setup() but I don't see why because the
> buffer size is computed to work fine with/without XDP enabled. Also
> this check means we cannot load an XDP program before macb_open()
> because bp->rx_buffer_size == 0.
>
> TLDR: Let's deduplicate size computations to minimise chances of getting
> it wrong.
>
ack
>
>
> ### Allocation changes
>
> I am not convinced by patches 1/6 and 2/6 that change the alloc strategy
> in two steps, from netdev_alloc_skb() to page_pool_alloc_pages() to
> page_pool_alloc_frag().
>
> - The transient solution isn't acceptable when PAGE_SIZE is large.
> We have 16K and would never want one buffer per page.
>
> - It forces you to introduce temporary code & constants which is added
> noise IMO. MACB_PP_MAX_BUF_SIZE() is odd as is the alignment of
> buffer sizes to page sizes. It forces you to deal with
> `bp->rx_buffer_size > PAGE_SIZE` which we could ignore. Right?
>
> TLDR: do alloc changes in one step.
>
yes, makes sense I'll squash them.
>
>
> ### XDP_SETUP_PROG if netif_running()
>
> I'd like to start a discussion on the expected behavior on XDP program
> change if netif_running(). Summarised:
>
> static int gem_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
> struct netlink_ext_ack *extack)
> {
> bool running = netif_running(dev);
> bool need_update = !!bp->prog != !!prog;
>
> if (running && need_update)
> macb_close(dev);
> old_prog = rcu_replace_pointer(bp->prog, prog, lockdep_rtnl_is_held());
> if (running && need_update)
> return macb_open(dev);
> }
>
> Have you experimented with that? I don't see anything graceful in our
> close operation, it looks like we'll get corruption or dropped packets
> or both. We shouldn't impose that on the user who just wanted to swap
> the program.
>
> I cannot find any good reason that implies we wouldn't be able to swap
> our XDP program on the fly. If we think it is unsafe, I'd vote for
> starting with a -EBUSY return code and iterating on that.
>
I didn't experiment much with this, other than simply adding and
removing programs as needed during my tests. Didn't experience
particular issues.
The reason a close/open sequence was added here was mostly because I was
considering to account XDP_PACKET_HEADROOM only when a program was
present. I later decided to not proceed with that (mostly to avoid
changing too many things at once).
Given the geometry of the buffer remains untouched in either case, I
see no particular reasons we can't swap on the fly as you suggest.
I'll try this and change it, thanks!
> TLDR: macb_close() on XDP program change is probably a bad idea.
>
> Thanks,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
Powered by blists - more mailing lists