[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6004b254ce7_2664208d0@john-XPS-13-9370.notmuch>
Date: Sun, 17 Jan 2021 13:55:32 -0800
From: John Fastabend <john.fastabend@...il.com>
To: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, netdev@...r.kernel.org
Cc: "Michael S. Tsirkin" <mst@...hat.com>,
Jason Wang <jasowang@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Björn Töpel <bjorn.topel@...el.com>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Jonathan Lemon <jonathan.lemon@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
KP Singh <kpsingh@...nel.org>,
Willem de Bruijn <willemb@...gle.com>,
Steffen Klassert <steffen.klassert@...unet.com>,
Alexander Lobakin <alobakin@...me>,
Miaohe Lin <linmiaohe@...wei.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
Mauro Carvalho Chehab <mchehab+huawei@...nel.org>,
Antoine Tenart <atenart@...nel.org>,
Michal Kubecek <mkubecek@...e.cz>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Meir Lichtinger <meirl@...lanox.com>,
virtualization@...ts.linux-foundation.org, bpf@...r.kernel.org
Subject: RE: [PATCH bpf-next] xsk: build skb by page
Xuan Zhuo wrote:
> This patch is used to construct skb based on page to save memory copy
> overhead.
>
> This has one problem:
>
> We construct the skb by fill the data page as a frag into the skb. In
> this way, the linear space is empty, and the header information is also
> in the frag, not in the linear space, which is not allowed for some
> network cards. For example, Mellanox Technologies MT27710 Family
> [ConnectX-4 Lx] will get the following error message:
>
> mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
> 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
> WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
> 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
> 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
>
> I also tried to use build_skb to construct skb, but because of the
> existence of skb_shinfo, it must be behind the linear space, so this
> method is not working. We can't put skb_shinfo on desc->addr, it will be
> exposed to users, this is not safe.
>
> Finally, I added a feature NETIF_F_SKB_NO_LINEAR to identify whether the
> network card supports the header information of the packet in the frag
> and not in the linear space.
>
> ---------------- Performance Testing ------------
>
> The test environment is Aliyun ECS server.
> Test cmd:
> ```
> xdpsock -i eth0 -t -S -s <msg size>
> ```
>
> Test result data:
>
> size 64 512 1024 1500
> copy 1916747 1775988 1600203 1440054
> page 1974058 1953655 1945463 1904478
> percent 3.0% 10.0% 21.58% 32.3%
Looks like a good perf bump. Some easy suggestions below
> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> + struct xdp_desc *desc, int *err)
> +{
Passing a 'int *err' here is ugly IMO use the ERR_PTR/PTR_ERR macros
and roll it into the return value.
or maybe use the out: pattern used in the kernel, but just doing direct
returns like now but with ERR_PTR() would also be fine.
> + struct sk_buff *skb ;
struct sk_buff *skb = NULL;
err = -ENOMEM;
> +
> + if (xs->dev->features & NETIF_F_SKB_NO_LINEAR) {
> + skb = xsk_build_skb_zerocopy(xs, desc);
> + if (unlikely(!skb)) {
goto out
> + *err = -ENOMEM;
> + return NULL;
> + }
> + } else {
> + char *buffer;
> + u64 addr;
> + u32 len;
> + int err;
> +
> + len = desc->len;
> + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> + if (unlikely(!skb)) {
goto out;
> + *err = -ENOMEM;
> + return NULL;
> + }
> +
> + skb_put(skb, len);
> + addr = desc->addr;
> + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> + err = skb_store_bits(skb, 0, buffer, len);
> +
> + if (unlikely(err)) {
> + kfree_skb(skb);
err = -EINVAL;
goto out
> + *err = -EINVAL;
> + return NULL;
> + }
> + }
> +
> + skb->dev = xs->dev;
> + skb->priority = xs->sk.sk_priority;
> + skb->mark = xs->sk.sk_mark;
> + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> + skb->destructor = xsk_destruct_skb;
> +
> + return skb;
out:
kfree_skb(skb)
return ERR_PTR(err);
> +}
> +
Otherwise looks good thanks.
Powered by blists - more mailing lists