[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45522396-0fad-406e-ba53-0bb4aee53e67@kernel.org>
Date: Thu, 6 Mar 2025 12:12:51 +0100
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Arthur Fabre <arthur@...hurfabre.com>,
Alexander Lobakin <aleksander.lobakin@...el.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, jakub@...udflare.com,
yan@...udflare.com, jbrandeburg@...udflare.com, thoiland@...hat.com,
lbiancon@...hat.com, Arthur Fabre <afabre@...udflare.com>
Subject: Re: [PATCH RFC bpf-next 07/20] xdp: Track if metadata is supported in
xdp_frame <> xdp_buff conversions
On 05/03/2025 18.02, Arthur Fabre wrote:
> On Wed Mar 5, 2025 at 4:24 PM CET, Alexander Lobakin wrote:
>> From: Arthur <arthur@...hurfabre.com>
>> Date: Wed, 05 Mar 2025 15:32:04 +0100
>>
>>> From: Arthur Fabre <afabre@...udflare.com>
>>>
>>> xdp_buff stores whether metadata is supported by a NIC by setting
>>> data_meta to be greater than data.
>>>
>>> But xdp_frame only stores the metadata size (as metasize), so converting
>>> between xdp_frame and xdp_buff is lossy.
>>>
>>> Steal an unused bit in xdp_frame to track whether metadata is supported
>>> or not.
>>>
>>> This will lets us have "generic" functions for setting skb fields from
>>> either xdp_frame or xdp_buff from drivers.
>>>
>>> Signed-off-by: Arthur Fabre <afabre@...udflare.com>
>>> ---
>>> include/net/xdp.h | 10 +++++++++-
>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>>> index 58019fa299b56dbd45c104fdfa807f73af6e4fa4..84afe07d09efdb2ab0cb78b904f02cb74f9a56b6 100644
>>> --- a/include/net/xdp.h
>>> +++ b/include/net/xdp.h
>>> @@ -116,6 +116,9 @@ static __always_inline void xdp_buff_set_frag_pfmemalloc(struct xdp_buff *xdp)
>>> xdp->flags |= XDP_FLAGS_FRAGS_PF_MEMALLOC;
>>> }
>>>
>>> +static bool xdp_data_meta_unsupported(const struct xdp_buff *xdp);
>>> +static void xdp_set_data_meta_invalid(struct xdp_buff *xdp);
>>> +
>>> static __always_inline void *xdp_buff_traits(const struct xdp_buff *xdp)
>>> {
>>> return xdp->data_hard_start + _XDP_FRAME_SIZE;
>>> @@ -270,7 +273,9 @@ struct xdp_frame {
>>> void *data;
>>> u32 len;
>>> u32 headroom;
>>> - u32 metasize; /* uses lower 8-bits */
>>> + u32 :23, /* unused */
>>> + meta_unsupported:1,
>>> + metasize:8;
>>
>> See the history of this structure how we got rid of using bitfields here
>> and why.
>>
>> ...because of performance.
>>
>> Even though metasize uses only 8 bits, 1-byte access is slower than
>> 32-byte access.
>
> Interesting, thanks!
>
I agree with Olek, we should not use bitfields. Thanks for catching this.
(The xdp_frame have a flags member...)
Why don't we use the flags member for storing this information?
>> I was going to write "you can use the fact that metasize is always a
>> multiple of 4 or that it's never > 252, for example, you could reuse LSB
>> as a flag indicating that meta is not supported", but first of all
>>
>> Do we still have drivers which don't support metadata?
>> Why don't they do that? It's not HW-specific or even driver-specific.
>> They don't reserve headroom? Then they're invalid, at least XDP_REDIRECT
>> won't work.
>>
I'm fairly sure that all drivers support XDP_REDIRECT.
Except didn't Lorenzo add a feature bit for this?
(so, some drivers might explicitly not-support this)
>> So maybe we need to fix those drivers first, if there are any.
>
> Most drivers don't support metadata unfortunately:
>
>> rg -U "xdp_prepare_buff\([^)]*false\);" drivers/net/
> drivers/net/tun.c
> 1712: xdp_prepare_buff(&xdp, buf, pad, len, false);
>
> drivers/net/ethernet/microsoft/mana/mana_bpf.c
> 94: xdp_prepare_buff(xdp, buf_va, XDP_PACKET_HEADROOM, pkt_len, false);
>
> drivers/net/ethernet/marvell/mvneta.c
> 2344: xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
> 2345: data_len, false);
>
> drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
> 1436: xdp_prepare_buff(&xdp, hard_start, OTX2_HEAD_ROOM,
> 1437: cqe->sg.seg_size, false);
>
> drivers/net/ethernet/socionext/netsec.c
> 1021: xdp_prepare_buff(&xdp, desc->addr, NETSEC_RXBUF_HEADROOM,
> 1022: pkt_len, false);
>
> drivers/net/ethernet/google/gve/gve_rx.c
> 740: xdp_prepare_buff(&new, frame, headroom, len, false);
> 859: xdp_prepare_buff(&xdp, page_info->page_address +
> 860: page_info->page_offset, GVE_RX_PAD,
> 861: len, false);
>
> drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> 3984: xdp_prepare_buff(&xdp, data,
> 3985: MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM,
> 3986: rx_bytes, false);
>
> drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> 794: xdp_prepare_buff(&xdp, hard_start, rx_ring->page_offset,
> 795: buff->len, false);
>
> drivers/net/ethernet/cavium/thunder/nicvf_main.c
> 554: xdp_prepare_buff(&xdp, hard_start, data - hard_start, len, false);
>
> drivers/net/ethernet/ti/cpsw_new.c
> 348: xdp_prepare_buff(&xdp, pa, headroom, size, false);
>
> drivers/net/ethernet/freescale/enetc/enetc.c
> 1710: xdp_prepare_buff(xdp_buff, hard_start - rx_ring->buffer_offset,
> 1711: rx_ring->buffer_offset, size, false);
>
> drivers/net/ethernet/ti/am65-cpsw-nuss.c
> 1335: xdp_prepare_buff(&xdp, page_addr, AM65_CPSW_HEADROOM,
> 1336: pkt_len, false);
>
> drivers/net/ethernet/ti/cpsw.c
> 403: xdp_prepare_buff(&xdp, pa, headroom, size, false);
>
> drivers/net/ethernet/sfc/rx.c
> 289: xdp_prepare_buff(&xdp, *ehp - EFX_XDP_HEADROOM, EFX_XDP_HEADROOM,
> 290: rx_buf->len, false);
>
> drivers/net/ethernet/mediatek/mtk_eth_soc.c
> 2097: xdp_prepare_buff(&xdp, data, MTK_PP_HEADROOM, pktlen,
> 2098: false);
>
> drivers/net/ethernet/sfc/siena/rx.c
> 291: xdp_prepare_buff(&xdp, *ehp - EFX_XDP_HEADROOM, EFX_XDP_HEADROOM,
> 292: rx_buf->len, false)
>
> I don't know if it's just because no one has added calls to
> skb_metadata_set() in yet, or if there's a more fundamental reason.
>
I simply think driver developers have been lazy.
If someone want some easy kernel commits, these drivers should be fairly
easy to fix...
> I think they all reserve some amount of headroom, but not always the
> full XDP_PACKET_HEADROOM. Eg sfc:
>
The Intel drivers use 192 (AFAIK if that is still true). The API ended
up supporting non-standard XDP_PACKET_HEADROOM, due to the Intel
drivers, when XDP support was added to those (which is a long time ago now).
> drivers/net/ethernet/sfc/net_driver.h:
> /* Non-standard XDP_PACKET_HEADROOM and tailroom to satisfy XDP_REDIRECT and
> * still fit two standard MTU size packets into a single 4K page.
> */
> #define EFX_XDP_HEADROOM 128
>
This is smaller than most drivers, but still have enough headroom for
xdp_frame + traits.
> If it's just because skb_metadata_set() is missing, I can take the
> patches from this series that adds a "generic" XDP -> skb hook ("trait:
> Propagate presence of traits to sk_buff"), have it call
> skb_metadata_set(), and try to add it to all the drivers in a separate
> series.
>
I think someone should cleanup those drivers and add support.
--Jesper
>>> /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
>>> * while mem_type is valid on remote CPU.
>>> */
>>> @@ -369,6 +374,8 @@ void xdp_convert_frame_to_buff(const struct xdp_frame *frame,
>>> xdp->data = frame->data;
>>> xdp->data_end = frame->data + frame->len;
>>> xdp->data_meta = frame->data - frame->metasize;
>>> + if (frame->meta_unsupported)
>>> + xdp_set_data_meta_invalid(xdp);
>>> xdp->frame_sz = frame->frame_sz;
>>> xdp->flags = frame->flags;
>>> }
>>> @@ -396,6 +403,7 @@ int xdp_update_frame_from_buff(const struct xdp_buff *xdp,
>>> xdp_frame->len = xdp->data_end - xdp->data;
>>> xdp_frame->headroom = headroom - sizeof(*xdp_frame);
>>> xdp_frame->metasize = metasize;
>>> + xdp_frame->meta_unsupported = xdp_data_meta_unsupported(xdp);
>>> xdp_frame->frame_sz = xdp->frame_sz;
>>> xdp_frame->flags = xdp->flags;
>>
>> Thanks,
>> Olek
Powered by blists - more mailing lists