[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKH8qBszqz7Qi=E0=gsF0KDHqw4+QEWYyQvqRyS2_E_UsjNKvw@mail.gmail.com>
Date: Tue, 17 Jan 2023 12:33:02 -0800
From: Stanislav Fomichev <sdf@...gle.com>
To: Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: bpf@...r.kernel.org, brouer@...hat.com, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, martin.lau@...ux.dev,
song@...nel.org, yhs@...com, john.fastabend@...il.com,
kpsingh@...nel.org, haoluo@...gle.com, jolsa@...nel.org,
David Ahern <dsahern@...il.com>,
Jakub Kicinski <kuba@...nel.org>,
Willem de Bruijn <willemb@...gle.com>,
Anatoly Burakov <anatoly.burakov@...el.com>,
Alexander Lobakin <alexandr.lobakin@...el.com>,
Magnus Karlsson <magnus.karlsson@...il.com>,
Maryam Tahhan <mtahhan@...hat.com>, xdp-hints@...-project.net,
netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v7 10/17] veth: Support RX XDP metadata
On Mon, Jan 16, 2023 at 8:21 AM Jesper Dangaard Brouer
<jbrouer@...hat.com> wrote:
>
>
>
> On 12/01/2023 01.32, Stanislav Fomichev wrote:
> > The goal is to enable end-to-end testing of the metadata for AF_XDP.
>
> For me the goal with veth goes beyond *testing*.
>
> This patch ignores the xdp_frame case. I'm not blocking this patch, but
> I'm saying we need to make sure there is a way forward for accessing
> XDP-hints when handling redirected xdp_frame's.
Sure, let's work towards getting that other part addressed!
> I have two use-cases we should cover (as future work).
>
> (#1) We have customers that want to redirect from physical NIC hardware
> into containers, and then have the veth XDP-prog (selectively) redirect
> into an AF_XDP socket (when matching fastpath packets). Here they
> (minimum) want access to the XDP hint info on HW checksum.
>
> (#2) Both veth and cpumap can create SKBs based on xdp_frame's. Here it
> is essential to get HW checksum and HW hash when creating these SKBs
> (else netstack have to do expensive csum calc and parsing in
> flow-dissector).
>From my PoW, I'd probably have to look into the TX side first (tx
timestamp) before looking into xdp->skb path. So if somebody on your
side has cycles, feel free to drive this effort. I'm happy to provide
reviews/comments/etc. I think we've discussed in the past that this
will most likely look like another set of "export" kfuncs?
We can start with extending new
Documentation/networking/xdp-rx-metadata.rst with a high-level design.
> > Cc: John Fastabend <john.fastabend@...il.com>
> > Cc: David Ahern <dsahern@...il.com>
> > Cc: Martin KaFai Lau <martin.lau@...ux.dev>
> > Cc: Jakub Kicinski <kuba@...nel.org>
> > Cc: Willem de Bruijn <willemb@...gle.com>
> > Cc: Jesper Dangaard Brouer <brouer@...hat.com>
> > Cc: Anatoly Burakov <anatoly.burakov@...el.com>
> > Cc: Alexander Lobakin <alexandr.lobakin@...el.com>
> > Cc: Magnus Karlsson <magnus.karlsson@...il.com>
> > Cc: Maryam Tahhan <mtahhan@...hat.com>
> > Cc: xdp-hints@...-project.net
> > Cc: netdev@...r.kernel.org
> > Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> > ---
> > drivers/net/veth.c | 31 +++++++++++++++++++++++++++++++
> > 1 file changed, 31 insertions(+)
> >
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index 70f50602287a..ba3e05832843 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -118,6 +118,7 @@ static struct {
> >
> > struct veth_xdp_buff {
> > struct xdp_buff xdp;
> > + struct sk_buff *skb;
> > };
> >
> > static int veth_get_link_ksettings(struct net_device *dev,
> > @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
> >
> > xdp_convert_frame_to_buff(frame, xdp);
> > xdp->rxq = &rq->xdp_rxq;
> > + vxbuf.skb = NULL;
> >
> > act = bpf_prog_run_xdp(xdp_prog, xdp);
> >
> > @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> > __skb_push(skb, skb->data - skb_mac_header(skb));
> > if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
> > goto drop;
> > + vxbuf.skb = skb;
> >
> > orig_data = xdp->data;
> > orig_data_end = xdp->data_end;
> > @@ -1602,6 +1605,28 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> > }
> > }
> >
> > +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
> > +{
> > + struct veth_xdp_buff *_ctx = (void *)ctx;
> > +
> > + if (!_ctx->skb)
> > + return -EOPNOTSUPP;
> > +
> > + *timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp;
>
> The SKB stores this skb_hwtstamps() in skb_shared_info memory area.
> This memory area is actually also available to xdp_frames. Thus, we
> could store the HW rx_timestamp in same location for redirected
> xdp_frames. This could make code path sharing possible between SKB vs
> xdp_frame in veth.
>
> This would also make it fast to "transfer" HW rx_timestamp when creating
> an SKB from an xdp_frame, as data is already written in the correct place.
>
> Performance wise the down-side is that skb_shared_info memory area is in
> a separate cacheline. Thus, when no HW rx_timestamp is available, then
> it is very expensive for a veth XDP bpf-prog to access this, just to get
> a zero back. Having an xdp_frame->flags bit that knows if HW
> rx_timestamp have been stored, can mitigate this.
That's one way to do it; although I'm not sure about the cases which
don't use xdp_frame and use stack-allocated xdp_buff.
> > + return 0;
> > +}
> > +
> > +static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash)
> > +{
> > + struct veth_xdp_buff *_ctx = (void *)ctx;
> > +
> > + if (!_ctx->skb)
> > + return -EOPNOTSUPP;
>
> For xdp_frame case, I'm considering simply storing the u32 RX-hash in
> struct xdp_frame. This makes it easy to extract for xdp_frame to SKB
> create use-case.
>
> As have been mentioned before, the SKB also requires knowing the RSS
> hash-type. This HW hash-type actually contains a lot of information,
> that today is lost when reduced to the SKB hash-type. Due to
> standardization from Microsoft, most HW provide info on (L3) IPv4 or
> IPv6, and on (L4) TCP or UDP (and often SCTP). Often hardware
> descriptor also provide info on the header length. Future work in this
> area is exciting as we can speedup parsing of packets in XDP, if we can
> get are more detailed HW info on hash "packet-type".
Something like the version we've discussed a while back [0]?
Seems workable overall if we remove it from the UAPI? (not everyone
was happy about UAPI parts IIRC)
0: https://lore.kernel.org/bpf/20221115030210.3159213-7-sdf@google.com/
> > +
> > + *hash = skb_get_hash(_ctx->skb); > + return 0;
> > +}
> > +
>
> --Jesper
>
Powered by blists - more mailing lists