netdev - Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page pool integration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DEITSIO441QL.X81MVLL3EIV4@bootlin.com>
Date: Wed, 26 Nov 2025 19:08:14 +0100
From: Théo Lebrun <theo.lebrun@...tlin.com>
To: "Paolo Valerio" <pvalerio@...hat.com>, <netdev@...r.kernel.org>
Cc: "Nicolas Ferre" <nicolas.ferre@...rochip.com>, "Claudiu Beznea"
 <claudiu.beznea@...on.dev>, "Andrew Lunn" <andrew+netdev@...n.ch>, "David
 S. Miller" <davem@...emloft.net>, "Eric Dumazet" <edumazet@...gle.com>,
 "Jakub Kicinski" <kuba@...nel.org>, "Paolo Abeni" <pabeni@...hat.com>,
 "Lorenzo Bianconi" <lorenzo@...nel.org>
Subject: Re: [PATCH RFC net-next 0/6] net: macb: Add XDP support and page
 pool integration

Hello Paolo,

So this is an initial review, I'll start here with five series-wide
topics and send small per-line comments (ie nitpicks) in a second stage.



### Rx buffer size computation

The buffer size computation should be reworked. At the end of the series
it looks like:

static int macb_open(struct net_device *dev)
{
    size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN;

    // ...

    macb_init_rx_buffer_size(bp, bufsz);

    // ...
}

static void macb_init_rx_buffer_size(struct macb *bp, size_t size)
{
    if (!macb_is_gem(bp)) {
        bp->rx_buffer_size = MACB_RX_BUFFER_SIZE;
    } else {
        bp->rx_buffer_size = size
            + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
            + MACB_PP_HEADROOM;

        if (bp->rx_buffer_size > PAGE_SIZE)
            bp->rx_buffer_size = PAGE_SIZE;

        if (bp->rx_buffer_size % RX_BUFFER_MULTIPLE)
            bp->rx_buffer_size = roundup(bp->rx_buffer_size, RX_BUFFER_MULTIPLE);
    }
}

Most of the issues with this code is not stemming from your series, but
this big rework is the right moment to fix it all.

 - NET_IP_ALIGN is accounted for in the headroom even though it isn't
   present if !RSC.

 - When skbuff.c/h allocates an SKB buffer, it SKB_DATA_ALIGN()s
   headroom+data. We should probably do the same. In our case it would
   be:

   bp->rx_buffer_size = SKB_DATA_ALIGN(MACB_PP_HEADROOM + size) +
                        SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
   // or
   bp->rx_buffer_size = SKB_HEAD_ALIGN(MACB_PP_HEADROOM + size);

   I've not computed if it can ever give a different value to your
   series in terms of total size or shinfo alignment. I'd guess it is
   unlikely.

 - If the size clamping to PAGE_SIZE comes into play, we are probably
   doomed. It means we cannot deal with the MTU and we'll probably get
   corruption. If we do put a check in place, it should loudly fail
   rather than silently clamp.

TLDR: I think macb_init_rx_buffer_size() should encapsulate the whole
rx buffer size computation. It can use bp->rx_offset and add on top
MTU & co. It might start failing if >PAGE_SIZE (?).



### Buffer variable names

Related: so many variables, fields or constants have ambiguous names,
can we do something about it?

 - bp->rx_offset is named oddly to my ears. Offset to what?
   Maybe bp->rx_head or bp->rx_headroom?

 - bp->rx_buffer_size: it used to be approximately the payload size
   (plus NET_IP_ALIGN). Now it contains the XDP headroom and shinfo.
   That's on GEM, because on MACB it means something different.

   This line is a bit ironic and illustrates the trouble:
      buffer_size = SKB_WITH_OVERHEAD(bp->rx_buffer_size) - bp->rx_offset

 - data in gem_rx|gem_rx_refill|gem_free_rx_buffers() points to what we
   could call skb->head, ie start of buffer & start of XDP headroom.
   Its name implied skb->data to me, ie after the headroom.

   It also made `data - page_address(page) + bp->rx_offset` hard to
   understand. It is easier for me to understand that the following is
   the page fragment offset till skb->data:

      buff_head + bp->rx_headroom - page_address(page)

 - MACB_MAX_PAD is ambiguous. It rhymes with NET_SKB_PAD which is the
   default SKB headroom but here it contains part of the headroom
   (XDP_PACKET_HEADROOM but not NET_IP_ALIGN) and the tailroom (shinfo).

 - Field `data` in `struct macb_tx_buff` points to skb/xdp_frame but my
   initial thought was skb->data pointer (ie after headroom).
   What about va or ptr or buff or frame or ...?

TLDR: I had a hard time understanding size/offset expressions (both from
existing code and the series) because of variable names.



### Duplicated buffer size computations

Last point related to buffer size computation:

 - it is duplicated in macb_set_mtu() but forgets NET_IP_ALIGN & proper
   SKB_DATA_ALIGN() and,

 - it is duplicated in gem_xdp_setup() but I don't see why because the
   buffer size is computed to work fine with/without XDP enabled. Also
   this check means we cannot load an XDP program before macb_open()
   because bp->rx_buffer_size == 0.

TLDR: Let's deduplicate size computations to minimise chances of getting
it wrong.



### Allocation changes

I am not convinced by patches 1/6 and 2/6 that change the alloc strategy
in two steps, from netdev_alloc_skb() to page_pool_alloc_pages() to
page_pool_alloc_frag().

 - The transient solution isn't acceptable when PAGE_SIZE is large.
   We have 16K and would never want one buffer per page.

 - It forces you to introduce temporary code & constants which is added
   noise IMO. MACB_PP_MAX_BUF_SIZE() is odd as is the alignment of
   buffer sizes to page sizes. It forces you to deal with
   `bp->rx_buffer_size > PAGE_SIZE` which we could ignore. Right?

TLDR: do alloc changes in one step.



### XDP_SETUP_PROG if netif_running()

I'd like to start a discussion on the expected behavior on XDP program
change if netif_running(). Summarised:

static int gem_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
             struct netlink_ext_ack *extack)
{
    bool running = netif_running(dev);
    bool need_update = !!bp->prog != !!prog;

    if (running && need_update)
        macb_close(dev);
    old_prog = rcu_replace_pointer(bp->prog, prog, lockdep_rtnl_is_held());
    if (running && need_update)
        return macb_open(dev);
}

Have you experimented with that? I don't see anything graceful in our
close operation, it looks like we'll get corruption or dropped packets
or both. We shouldn't impose that on the user who just wanted to swap
the program.

I cannot find any good reason that implies we wouldn't be able to swap
our XDP program on the fly. If we think it is unsafe, I'd vote for
starting with a -EBUSY return code and iterating on that.

TLDR: macb_close() on XDP program change is probably a bad idea.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com