[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3D448771-8354-46D0-BDFF-449FE1BD1B59@nutanix.com>
Date: Thu, 1 May 2025 01:42:06 +0000
From: Jon Kohler <jon@...anix.com>
To: Stanislav Fomichev <stfomichev@...il.com>
CC: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Jason Wang
<jasowang@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller"
<davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski
<kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Alexei Starovoitov
<ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard
Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Simon
Horman <horms@...nel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: Re: [PATCH net-next v2] xdp: Add helpers for head length, headroom,
and metadata length
> On Apr 30, 2025, at 9:34 PM, Stanislav Fomichev <stfomichev@...il.com> wrote:
>
> !-------------------------------------------------------------------|
> CAUTION: External Email
>
> |-------------------------------------------------------------------!
>
> On 05/01, Jon Kohler wrote:
>>
>>
>>> On Apr 30, 2025, at 5:09 PM, Stanislav Fomichev <stfomichev@...il.com> wrote:
>>>
>>> !-------------------------------------------------------------------|
>>> CAUTION: External Email
>>>
>>> |-------------------------------------------------------------------!
>>>
>>> On 04/30, Jon Kohler wrote:
>>>> Introduce new XDP helpers:
>>>> - xdp_headlen: Similar to skb_headlen
>>>> - xdp_headroom: Similar to skb_headroom
>>>> - xdp_metadata_len: Similar to skb_metadata_len
>>>>
>>>> Integrate these helpers into tap, tun, and XDP implementation to start.
>>>>
>>>> No functional changes introduced.
>>>>
>>>> Signed-off-by: Jon Kohler <jon@...anix.com>
>>>> ---
>>>> v1->v2: Integrate feedback from Willem
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_project_netdevbpf_patch_20250430182921.1704021-2D1-2Djon-40nutanix.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=9pdxzQszX_M0K3gEPeYOyMZZYSkRR8IMvxslS8320Eoctk58y-ELCdZ5iaryF2GH&s=J-ILB7E9VQ_plo0hyjEtzGzjy6G0_o4MMMmmE_z8vvc&e=
>>>>
>>>> drivers/net/tap.c | 6 +++---
>>>> drivers/net/tun.c | 12 +++++------
>>>> include/net/xdp.h | 54 +++++++++++++++++++++++++++++++++++++++++++----
>>>> net/core/xdp.c | 12 +++++------
>>>> 4 files changed, 65 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
>>>> index d4ece538f1b2..a62fbca4b08f 100644
>>>> --- a/drivers/net/tap.c
>>>> +++ b/drivers/net/tap.c
>>>> @@ -1048,7 +1048,7 @@ static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp)
>>>> struct sk_buff *skb;
>>>> int err, depth;
>>>>
>>>> - if (unlikely(xdp->data_end - xdp->data < ETH_HLEN)) {
>>>> + if (unlikely(xdp_headlen(xdp) < ETH_HLEN)) {
>>>> err = -EINVAL;
>>>> goto err;
>>>> }
>>>> @@ -1062,8 +1062,8 @@ static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp)
>>>> goto err;
>>>> }
>>>>
>>>> - skb_reserve(skb, xdp->data - xdp->data_hard_start);
>>>> - skb_put(skb, xdp->data_end - xdp->data);
>>>> + skb_reserve(skb, xdp_headroom(xdp));
>>>> + skb_put(skb, xdp_headlen(xdp));
>>>>
>>>> skb_set_network_header(skb, ETH_HLEN);
>>>> skb_reset_mac_header(skb);
>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>> index 7babd1e9a378..4c47eed71986 100644
>>>> --- a/drivers/net/tun.c
>>>> +++ b/drivers/net/tun.c
>>>> @@ -1567,7 +1567,7 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog,
>>>> dev_core_stats_rx_dropped_inc(tun->dev);
>>>> return err;
>>>> }
>>>> - dev_sw_netstats_rx_add(tun->dev, xdp->data_end - xdp->data);
>>>> + dev_sw_netstats_rx_add(tun->dev, xdp_headlen(xdp));
>>>> break;
>>>> case XDP_TX:
>>>> err = tun_xdp_tx(tun->dev, xdp);
>>>> @@ -1575,7 +1575,7 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog,
>>>> dev_core_stats_rx_dropped_inc(tun->dev);
>>>> return err;
>>>> }
>>>> - dev_sw_netstats_rx_add(tun->dev, xdp->data_end - xdp->data);
>>>> + dev_sw_netstats_rx_add(tun->dev, xdp_headlen(xdp));
>>>> break;
>>>> case XDP_PASS:
>>>> break;
>>>> @@ -2355,7 +2355,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>>>> struct xdp_buff *xdp, int *flush,
>>>> struct tun_page *tpage)
>>>> {
>>>> - unsigned int datasize = xdp->data_end - xdp->data;
>>>> + unsigned int datasize = xdp_headlen(xdp);
>>>> struct tun_xdp_hdr *hdr = xdp->data_hard_start;
>>>> struct virtio_net_hdr *gso = &hdr->gso;
>>>> struct bpf_prog *xdp_prog;
>>>> @@ -2415,14 +2415,14 @@ static int tun_xdp_one(struct tun_struct *tun,
>>>> goto out;
>>>> }
>>>>
>>>> - skb_reserve(skb, xdp->data - xdp->data_hard_start);
>>>> - skb_put(skb, xdp->data_end - xdp->data);
>>>> + skb_reserve(skb, xdp_headroom(xdp));
>>>> + skb_put(skb, xdp_headlen(xdp));
>>>>
>>>> /* The externally provided xdp_buff may have no metadata support, which
>>>> * is marked by xdp->data_meta being xdp->data + 1. This will lead to a
>>>> * metasize of -1 and is the reason why the condition checks for > 0.
>>>> */
>>>> - metasize = xdp->data - xdp->data_meta;
>>>> + metasize = xdp_metadata_len(xdp);
>>>> if (metasize > 0)
>>>> skb_metadata_set(skb, metasize);
>>>>
>>>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>>>> index 48efacbaa35d..044345b18305 100644
>>>> --- a/include/net/xdp.h
>>>> +++ b/include/net/xdp.h
>>>> @@ -151,10 +151,56 @@ xdp_get_shared_info_from_buff(const struct xdp_buff *xdp)
>>>> return (struct skb_shared_info *)xdp_data_hard_end(xdp);
>>>> }
>>>>
>>>> +/**
>>>> + * xdp_headlen - Calculate the length of the data in an XDP buffer
>>>> + * @xdp: Pointer to the XDP buffer structure
>>>> + *
>>>> + * Compute the length of the data contained in the XDP buffer. Does not
>>>> + * include frags, use xdp_get_buff_len() for that instead.
>>>> + *
>>>> + * Analogous to skb_headlen().
>>>> + *
>>>> + * Return: The length of the data in the XDP buffer in bytes.
>>>> + */
>>>> +static inline unsigned int xdp_headlen(const struct xdp_buff *xdp)
>>>> +{
>>>> + return xdp->data_end - xdp->data;
>>>> +}
>>>> +
>>>> +/**
>>>> + * xdp_headroom - Calculate the headroom available in an XDP buffer
>>>> + * @xdp: Pointer to the XDP buffer structure
>>>> + *
>>>> + * Compute the headroom in an XDP buffer.
>>>> + *
>>>> + * Analogous to the skb_headroom().
>>>> + *
>>>> + * Return: The size of the headroom in bytes.
>>>> + */
>>>> +static inline unsigned int xdp_headroom(const struct xdp_buff *xdp)
>>>> +{
>>>> + return xdp->data - xdp->data_hard_start;
>>>> +}
>>>> +
>>>> +/**
>>>> + * xdp_metadata_len - Calculate the length of metadata in an XDP buffer
>>>> + * @xdp: Pointer to the XDP buffer structure
>>>> + *
>>>> + * Compute the length of the metadata region in an XDP buffer.
>>>> + *
>>>> + * Analogous to skb_metadata_len().
>>>> + *
>>>> + * Return: The length of the metadata in bytes.
>>>> + */
>>>> +static inline unsigned int xdp_metadata_len(const struct xdp_buff *xdp)
>>>
>>> I believe this has to return int, not unsigned int. There are places
>>> where we do data_meta = data + 1, and the callers check whether
>>> the result is signed or sunsigned.
>>
>> The existing SKB APIs are the exact same return type (I copied them 1:1),
>> but I have a feeling that we’re never intending these values to either A) be
>> negative and/or B) wrap in strange ways?
>>
>>>
>>>> +{
>>>> + return xdp->data - xdp->data_meta;
>>>> +}
>>>> +
>>>> static __always_inline unsigned int
>>>> xdp_get_buff_len(const struct xdp_buff *xdp)
>>>> {
>>>> - unsigned int len = xdp->data_end - xdp->data;
>>>> + unsigned int len = xdp_headlen(xdp);
>>>> const struct skb_shared_info *sinfo;
>>>>
>>>> if (likely(!xdp_buff_has_frags(xdp)))
>>>> @@ -364,8 +410,8 @@ int xdp_update_frame_from_buff(const struct xdp_buff *xdp,
>>>> int metasize, headroom;
>>
>> Said another way, perhaps this should be unsigned?
>>
>>>>
>>>> /* Assure headroom is available for storing info */
>>>> - headroom = xdp->data - xdp->data_hard_start;
>>>> - metasize = xdp->data - xdp->data_meta;
>>>> + headroom = xdp_headroom(xdp);
>>>> + metasize = xdp_metadata_len(xdp);
>>>> metasize = metasize > 0 ? metasize : 0;
>>>
>>> ^^ like here
>>
>> Look across the tree, seems like more are unsigned than signed
>
> The ones that are unsigned are either calling xdp_data_meta_unsupported
> explicitly (and it does > to check for this condition, not signed math)
> or are running in the drivers that are guaranteed to have metadata
> support (and, hence, always have data_meta <= data).
>
>> These ones use unsigned:
>> xdp_convert_zc_to_xdp_frame
>
> This uses xdp_data_meta_unsupported
>
>> veth_xdp_rcv_skb
>> xsk_construct_skb
>> bnxt_copy_xdp
>> i40e_build_skb
>> i40e_construct_skb_zc
>> ice_build_skb (this is u8)
>> ice_construct_skb_zc
>> igb_build_skb
>> igb_construct_skb_zc
>> igc_build_skb
>> igc_construct_skb
>> igc_construct_skb_zc
>> ixgbe_build_skb
>> ixgbe_construct_skb_zc
>> ixgbevf_build_skb
>> mvneta_swbm_build_skb
>> mlx5e_xsk_construct_skb
>> mana_build_skb
>> stmmac_construct_skb_zc
>> bpf_prog_run_generic_xdp
>
> These run in the drivers that support metadata (data_meta <= data)
>
>> xdp_get_metalen
>
> This uses xdp_data_meta_unsupported
>
>> These ones are regular int:
>> xdp_build_skb_from_buff
>> xdp_build_skb_from_zc
>> xdp_update_frame_from_buff
>> tun_xdp_one
>> build_skb_from_xdp_buff
>
> These can be called from the drivers that support and don't support
> the metadata, so have to (correctly) use int.
>
>> Perhaps a separate patch to convert the regulars to unsigned,
>> thoughts?
>
> Take a look at xdp_set_data_meta_invalid and xdp_data_meta_unsupported.
> There are cases where xdp->data - xdp->data_meta is -1 (and the callers
> check for this condition), we can't use unsigned unconditionally
> (unless we use xdp_data_meta_unsupported).
Ah! Good catch, and thank you for helping me to understand that,
I appreciate it. About to turn in for the evening, will wait for any more
comments and I’m happy to send out a v3.
One thought is that I stumbled upon xdp_get_metalen in filter.c. I wonder it
would make sense to pirate that logic and move it into xdp.h? That might be
a simply solution here that would allow us to keep unsigned like SKB API?
Happy to take feedback either way.
Cheers,
Jon
Powered by blists - more mailing lists