netdev - Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87fr9szti5.fsf@redhat.com>
Date: Tue, 02 Dec 2025 18:24:50 +0100
From: Paolo Valerio <pvalerio@...hat.com>
To: Théo Lebrun <theo.lebrun@...tlin.com>,
 netdev@...r.kernel.org
Cc: Nicolas Ferre <nicolas.ferre@...rochip.com>, Claudiu Beznea
 <claudiu.beznea@...on.dev>, Andrew Lunn <andrew+netdev@...n.ch>, "David
 S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub
 Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Lorenzo
 Bianconi <lorenzo@...nel.org>
Subject: Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page
 pool integration

Hello Théo,

thank you for the feedback

On 26 Nov 2025 at 07:08:14 PM, Théo Lebrun <theo.lebrun@...tlin.com> wrote:

> Hello Paolo,
>
> So this is an initial review, I'll start here with five series-wide
> topics and send small per-line comments (ie nitpicks) in a second stage.
>
>
>
> ### Rx buffer size computation
>
> The buffer size computation should be reworked. At the end of the series
> it looks like:
>
> static int macb_open(struct net_device *dev)
> {
>     size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN;
>
>     // ...
>
>     macb_init_rx_buffer_size(bp, bufsz);
>
>     // ...
> }
>
> static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
> {
>     if (!macb_is_gem(bp)) {
>         bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
>     } else {
>         bp->rx_buffer_size = size
>             + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
>             + MACB_PP_HEADROOM;
>
>         if (bp->rx_buffer_size > PAGE_SIZE)
>             bp->rx_buffer_size = PAGE_SIZE;
>
>         if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE)
>             bp->rx_buffer_size = roundup(bp->rx_buffer_size, RX_BUFFER_MULTIPLE);
>     }
> }
>
> Most of the issues with this code is not stemming from your series, but
> this big rework is the right moment to fix it all.
>
>  - NET_IP_ALIGN is accounted for in the headroom even though it isn't
>    present if !RSC.
>

that's something I noticed and I was a unsure about the reason.

>  - When skbuff.c/h allocates an SKB buffer, it SKB_DATA_ALIGN()s
>    headroom+data. We should probably do the same. In our case it would
>    be:
>
>    bp->rx_buffer_size = SKB_DATA_ALIGN(MACB_PP_HEADROOM + size) +
>                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>    // or
>    bp->rx_buffer_size = SKB_HEAD_ALIGN(MACB_PP_HEADROOM + size);
>
>    I've not computed if it can ever give a different value to your
>    series in terms of total size or shinfo alignment. I'd guess it is
>    unlikely.
>

I'll take a look at this

>  - If the size clamping to PAGE_SIZE comes into play, we are probably
>    doomed. It means we cannot deal with the MTU and we'll probably get
>    corruption. If we do put a check in place, it should loudly fail
>    rather than silently clamp.
>

That should not happen, unless I'm missing something.
E.g., 9000B mtu on a 4K PAGE_SIZE kernel should be handled with multiple
descriptors. The clamping is there because according with how the series
creates the pool, the maximum buffer size is page order 0.

Hardware-wise bp->rx_buffer_size should also be taken into account for
the receive buffer size.

> TLDR: I think macb_init_rx_buffer_size() should encapsulate the whole
> rx buffer size computation. It can use bp->rx_offset and add on top
> MTU & co. It might start failing if >PAGE_SIZE (?).
>

ack, I'll move that part

>
>
> ### Buffer variable names
>
> Related: so many variables, fields or constants have ambiguous names,
> can we do something about it?
>
>  - bp->rx_offset is named oddly to my ears. Offset to what?
>    Maybe bp->rx_head or bp->rx_headroom?
>

bp->rx_headroom sounds a good choice to me, but if you have a stronger
preference for bp->rx_head just let me know.

>  - bp->rx_buffer_size: it used to be approximately the payload size
>    (plus NET_IP_ALIGN). Now it contains the XDP headroom and shinfo.
>    That's on GEM, because on MACB it means something different.
>
>    This line is a bit ironic and illustrates the trouble:
>       buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset
>
>  - data in gem_rx|gem_rx_refill|gem_free_rx_buffers() points to what we
>    could call skb->head, ie start of buffer & start of XDP headroom.
>    Its name implied skb->data to me, ie after the headroom.
>
>    It also made `data - page_address(page) + bp->rx_offset` hard to
>    understand. It is easier for me to understand that the following is
>    the page fragment offset till skb->data:
>
>       buff_head + bp->rx_headroom - page_address(page)
>

ack, will change this

>  - MACB_MAX_PAD is ambiguous. It rhymes with NET_SKB_PAD which is the
>    default SKB headroom but here it contains part of the headroom
>    (XDP_PACKET_HEADROOM but not NET_IP_ALIGN) and the tailroom (shinfo).
>

uhm, looking at this, I think NET_IP_ALIGN should be part of it (for the
!rsc case).
I'll revisit this part, though.

>  - Field `data` in `struct macb_tx_buff` points to skb/xdp_frame but my
>    initial thought was skb->data pointer (ie after headroom).
>    What about va or ptr or buff or frame or ...?
>

I see. At some point I considered buff, but then I realized
tx_buff->buff was not perfect, hence data :)

I guess one between frame or va works, thanks.

> TLDR: I had a hard time understanding size/offset expressions (both from
> existing code and the series) because of variable names.
>

ack. Will revisit this aspect based on your suggestions.

>
>
> ### Duplicated buffer size computations
>
> Last point related to buffer size computation:
>
>  - it is duplicated in macb_set_mtu() but forgets NET_IP_ALIGN & proper
>    SKB_DATA_ALIGN() and,
>
>  - it is duplicated in gem_xdp_setup() but I don't see why because the
>    buffer size is computed to work fine with/without XDP enabled. Also
>    this check means we cannot load an XDP program before macb_open()
>    because bp->rx_buffer_size == 0.
>
> TLDR: Let's deduplicate size computations to minimise chances of getting
> it wrong.
>

ack

>
>
> ### Allocation changes
>
> I am not convinced by patches 1/6 and 2/6 that change the alloc strategy
> in two steps, from netdev_alloc_skb() to page_pool_alloc_pages() to
> page_pool_alloc_frag().
>
>  - The transient solution isn't acceptable when PAGE_SIZE is large.
>    We have 16K and would never want one buffer per page.
>
>  - It forces you to introduce temporary code & constants which is added
>    noise IMO. MACB_PP_MAX_BUF_SIZE() is odd as is the alignment of
>    buffer sizes to page sizes. It forces you to deal with
>    `bp->rx_buffer_size > PAGE_SIZE` which we could ignore. Right?
>
> TLDR: do alloc changes in one step.
>

yes, makes sense I'll squash them.

>
>
> ### XDP_SETUP_PROG if netif_running()
>
> I'd like to start a discussion on the expected behavior on XDP program
> change if netif_running(). Summarised:
>
> static int gem_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
>              struct netlink_ext_ack *extack)
> {
>     bool running = netif_running(dev);
>     bool need_update = !!bp->prog != !!prog;
>
>     if (running && need_update)
>         macb_close(dev);
>     old_prog = rcu_replace_pointer(bp->prog, prog, lockdep_rtnl_is_held());
>     if (running && need_update)
>         return macb_open(dev);
> }
>
> Have you experimented with that? I don't see anything graceful in our
> close operation, it looks like we'll get corruption or dropped packets
> or both. We shouldn't impose that on the user who just wanted to swap
> the program.
>
> I cannot find any good reason that implies we wouldn't be able to swap
> our XDP program on the fly. If we think it is unsafe, I'd vote for
> starting with a -EBUSY return code and iterating on that.
>

I didn't experiment much with this, other than simply adding and
removing programs as needed during my tests. Didn't experience
particular issues.

The reason a close/open sequence was added here was mostly because I was
considering to account XDP_PACKET_HEADROOM only when a program was
present. I later decided to not proceed with that (mostly to avoid
changing too many things at once).

Given the geometry of the buffer remains untouched in either case, I
see no particular reasons we can't swap on the fly as you suggest.

I'll try this and change it, thanks!

> TLDR: macb_close() on XDP program change is probably a bad idea.
>
> Thanks,
>
> --
> Théo Lebrun, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com