[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d017748-3a4e-367f-94f4-2dcc6bb3e50e@nvidia.com>
Date: Tue, 29 Jun 2021 12:50:53 +0300
From: Maxim Mikityanskiy <maximmi@...dia.com>
To: Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
CC: <bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
Jesse Brandeburg <jesse.brandeburg@...el.com>,
Tony Nguyen <anthony.l.nguyen@...el.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Giuseppe Cavallaro <peppe.cavallaro@...com>,
Alexandre Torgue <alexandre.torgue@...s.st.com>,
Jose Abreu <joabreu@...opsys.com>,
"Maxime Coquelin" <mcoquelin.stm32@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Björn Töpel <bjorn@...nel.org>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
Krzysztof Kazimierczak <krzysztof.kazimierczak@...el.com>,
Ong Boon Leong <boon.leong.ong@...el.com>,
<intel-wired-lan@...ts.osuosl.org>,
<linux-stm32@...md-mailman.stormreply.com>,
<linux-arm-kernel@...ts.infradead.org>,
Tariq Toukan <tariqt@...dia.com>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Saeed Mahameed <saeedm@...dia.com>
Subject: Re: [PATCH net v2] xdp, net: fix for construct skb by xdp inside xsk
zc rx
On 2021-06-28 13:47, Maciej Fijalkowski wrote:
> On Thu, Jun 17, 2021 at 10:55:34PM +0800, Xuan Zhuo wrote:
>> When each driver supports xsk rx, if the received buff returns XDP_PASS
>> after run xdp prog, it must construct skb based on xdp. This patch
>> extracts this logic into a public function xdp_construct_skb().
>>
>> There is a bug in the original logic. When constructing skb, we should
>> copy the meta information to skb and then use __skb_pull() to correct
>> the data.
>
> Thanks for fixing the bug on Intel drivers, Xuan. However, together with
> Magnus we feel that include/net/xdp.h is not a correct place for
> introducing xdp_construct_skb. If mlx side could use it, then probably
> include/net/xdp_sock_drv.h is a better fit for that.
>
> Once again, CCing Maxim.
> Maxim, any chances that mlx driver could be aligned in a way that we could
> have a common function for creating skb on ZC path?
I'm sorry I missed the v1.
I have reviewed the differences between mlx5e_xsk_construct_skb and
xdp_construct_skb. I would say it is possible for mlx5 to adapt and use
the new API, but it may also require changes to xdp_construct_skb.
Please see the list of differences below.
>
> Otherwise, maybe we should think about introducing the Intel-specific
> common header in tree?
Sure, you can do it to share Intel-specific stuff between Intel drivers.
However, in this particular case I think all XSK-enabled drivers would
benefit from this function, especially after previous efforts that aimed
to minimize the differences between drivers, the amount of code in the
drivers and to share as much as possible. So, in my opinion, this stuff
belongs to xdp_sock_drv.h. (Moreover, I see this patch also changes
stmmac, so it's no longer Intel-specific.)
Differences between mlx5e_xsk_construct_skb and xdp_construct_skb:
1. __napi_alloc_skb is called with __GFP_NOWARN in xdp_construct_skb,
but without this flag in mlx5. Why do we need to use non-default flags?
Why can't we stick with napi_alloc_skb? I see only Intel drivers and XSK
in stmmac use __napi_alloc_skb instead of napi_alloc_skb, and it looks
to me as it was just copied from the regular (non-XSK) datapath of i40e
(i40e_construct_skb) to i40e's XSK, then to stmmac, then to
xdp_construct_skb, and I don't truly understand the reason of having
__GFP_NOWARN where it first appeared (i40e_construct_skb). Could someone
explain the reason for __GFP_NOWARN, so that we could decide whether we
want it in a generic XSK helper?
2. skb_put in mlx5 vs __skb_put in xdp_construct_skb - shouldn't be a
big deal, the difference is just an extra overflow check in skb_put.
3. XDP metadata. mlx5 calls xdp_set_data_meta_invalid, while other
XSK-enabled drivers set metadata size to 0 and allow the XDP program to
push some metadata. xdp_construct_skb only supports xdp_buffs with
metadata, it will break if xdp_data_meta_unsupported. There are a few
possible ways to address it:
3.1. Add a check for xdp_data_meta_unsupported to xdp_construct_skb. It
will lift the undocumented limitation of this function and allow it to
handle all valid kinds of xdp_buff.
3.2. Have two versions of xdp_construct_skb: one for xdp_buffs with
metadata, the other for ones without metadata. Sounds ugly and not
robust, but could spare a few CPU cycles for drivers that can't have
metadata.
3.3. Remove xdp_set_data_meta_invalid from mlx5. I think the reason for
this call was some design decision, rather than a technical limitation.
On the other hand, even though it will allow mlx5 to work with
xdp_construct_skb in its current implementation, it would still be nice
to combine it with 3.1 to avoid having issues with future drivers (if
not, at least document in a clear way that the xdp_buff parameter must
have metadata). Tariq/Saeed, could you comment on this point?
Thanks,
Max
>>
>> Fixes: 0a714186d3c0f ("i40e: add AF_XDP zero-copy Rx support")
>> Fixes: 2d4238f556972 ("ice: Add support for AF_XDP")
>> Fixes: bba2556efad66 ("net: stmmac: Enable RX via AF_XDP zero-copy")
>> Fixes: d0bcacd0a1309 ("ixgbe: add AF_XDP zero-copy Rx support")
>> Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
>> ---
>> drivers/net/ethernet/intel/i40e/i40e_xsk.c | 16 +---------
>> drivers/net/ethernet/intel/ice/ice_xsk.c | 12 +-------
>> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 12 +-------
>> .../net/ethernet/stmicro/stmmac/stmmac_main.c | 23 +-------------
>> include/net/xdp.h | 30 +++++++++++++++++++
>> 5 files changed, 34 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> index 68f177a86403..81b0f44eedda 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> @@ -246,23 +246,9 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
>> static struct sk_buff *i40e_construct_skb_zc(struct i40e_ring *rx_ring,
>> struct xdp_buff *xdp)
>> {
>> - unsigned int metasize = xdp->data - xdp->data_meta;
>> - unsigned int datasize = xdp->data_end - xdp->data;
>> struct sk_buff *skb;
>>
>> - /* allocate a skb to store the frags */
>> - skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
>> - xdp->data_end - xdp->data_hard_start,
>> - GFP_ATOMIC | __GFP_NOWARN);
>> - if (unlikely(!skb))
>> - goto out;
>> -
>> - skb_reserve(skb, xdp->data - xdp->data_hard_start);
>> - memcpy(__skb_put(skb, datasize), xdp->data, datasize);
>> - if (metasize)
>> - skb_metadata_set(skb, metasize);
>> -
>> -out:
>> + skb = xdp_construct_skb(xdp, &rx_ring->q_vector->napi);
>> xsk_buff_free(xdp);
>> return skb;
>> }
>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> index a1f89ea3c2bd..f95e1adcebda 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> @@ -430,22 +430,12 @@ static void ice_bump_ntc(struct ice_ring *rx_ring)
>> static struct sk_buff *
>> ice_construct_skb_zc(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf)
>> {
>> - unsigned int metasize = rx_buf->xdp->data - rx_buf->xdp->data_meta;
>> - unsigned int datasize = rx_buf->xdp->data_end - rx_buf->xdp->data;
>> - unsigned int datasize_hard = rx_buf->xdp->data_end -
>> - rx_buf->xdp->data_hard_start;
>> struct sk_buff *skb;
>>
>> - skb = __napi_alloc_skb(&rx_ring->q_vector->napi, datasize_hard,
>> - GFP_ATOMIC | __GFP_NOWARN);
>> + skb = xdp_construct_skb(rx_buf->xdp, &rx_ring->q_vector->napi);
>> if (unlikely(!skb))
>> return NULL;
>>
>> - skb_reserve(skb, rx_buf->xdp->data - rx_buf->xdp->data_hard_start);
>> - memcpy(__skb_put(skb, datasize), rx_buf->xdp->data, datasize);
>> - if (metasize)
>> - skb_metadata_set(skb, metasize);
>> -
>> xsk_buff_free(rx_buf->xdp);
>> rx_buf->xdp = NULL;
>> return skb;
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> index f72d2978263b..123945832c96 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> @@ -203,22 +203,12 @@ bool ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count)
>> static struct sk_buff *ixgbe_construct_skb_zc(struct ixgbe_ring *rx_ring,
>> struct ixgbe_rx_buffer *bi)
>> {
>> - unsigned int metasize = bi->xdp->data - bi->xdp->data_meta;
>> - unsigned int datasize = bi->xdp->data_end - bi->xdp->data;
>> struct sk_buff *skb;
>>
>> - /* allocate a skb to store the frags */
>> - skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
>> - bi->xdp->data_end - bi->xdp->data_hard_start,
>> - GFP_ATOMIC | __GFP_NOWARN);
>> + skb = xdp_construct_skb(bi->xdp, &rx_ring->q_vector->napi);
>> if (unlikely(!skb))
>> return NULL;
>>
>> - skb_reserve(skb, bi->xdp->data - bi->xdp->data_hard_start);
>> - memcpy(__skb_put(skb, datasize), bi->xdp->data, datasize);
>> - if (metasize)
>> - skb_metadata_set(skb, metasize);
>> -
>> xsk_buff_free(bi->xdp);
>> bi->xdp = NULL;
>> return skb;
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index c87202cbd3d6..143ac1edb876 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -4729,27 +4729,6 @@ static void stmmac_finalize_xdp_rx(struct stmmac_priv *priv,
>> xdp_do_flush();
>> }
>>
>> -static struct sk_buff *stmmac_construct_skb_zc(struct stmmac_channel *ch,
>> - struct xdp_buff *xdp)
>> -{
>> - unsigned int metasize = xdp->data - xdp->data_meta;
>> - unsigned int datasize = xdp->data_end - xdp->data;
>> - struct sk_buff *skb;
>> -
>> - skb = __napi_alloc_skb(&ch->rxtx_napi,
>> - xdp->data_end - xdp->data_hard_start,
>> - GFP_ATOMIC | __GFP_NOWARN);
>> - if (unlikely(!skb))
>> - return NULL;
>> -
>> - skb_reserve(skb, xdp->data - xdp->data_hard_start);
>> - memcpy(__skb_put(skb, datasize), xdp->data, datasize);
>> - if (metasize)
>> - skb_metadata_set(skb, metasize);
>> -
>> - return skb;
>> -}
>> -
>> static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
>> struct dma_desc *p, struct dma_desc *np,
>> struct xdp_buff *xdp)
>> @@ -4761,7 +4740,7 @@ static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
>> struct sk_buff *skb;
>> u32 hash;
>>
>> - skb = stmmac_construct_skb_zc(ch, xdp);
>> + skb = xdp_construct_skb(xdp, &ch->rxtx_napi);
>> if (!skb) {
>> priv->dev->stats.rx_dropped++;
>> return;
>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>> index a5bc214a49d9..561e21eaf718 100644
>> --- a/include/net/xdp.h
>> +++ b/include/net/xdp.h
>> @@ -95,6 +95,36 @@ xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
>> xdp->data_meta = meta_valid ? data : data + 1;
>> }
>>
>> +static __always_inline struct sk_buff *
>> +xdp_construct_skb(struct xdp_buff *xdp, struct napi_struct *napi)
>> +{
>> + unsigned int metasize;
>> + unsigned int datasize;
>> + unsigned int headroom;
>> + struct sk_buff *skb;
>> + unsigned int len;
>> +
>> + /* this include metasize */
>> + datasize = xdp->data_end - xdp->data_meta;
>> + metasize = xdp->data - xdp->data_meta;
>> + headroom = xdp->data_meta - xdp->data_hard_start;
>> + len = xdp->data_end - xdp->data_hard_start;
>> +
>> + /* allocate a skb to store the frags */
>> + skb = __napi_alloc_skb(napi, len, GFP_ATOMIC | __GFP_NOWARN);
>> + if (unlikely(!skb))
>> + return NULL;
>> +
>> + skb_reserve(skb, headroom);
>> + memcpy(__skb_put(skb, datasize), xdp->data_meta, datasize);
>> + if (metasize) {
>> + __skb_pull(skb, metasize);
>> + skb_metadata_set(skb, metasize);
>> + }
>> +
>> + return skb;
>> +}
>> +
>> /* Reserve memory area at end-of data area.
>> *
>> * This macro reserves tailroom in the XDP buffer by limiting the
>> --
>> 2.31.0
>>
Powered by blists - more mailing lists