[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c687e1d8-e36a-8f23-342a-22b2a1efb372@gmail.com>
Date: Sun, 27 Feb 2022 20:06:03 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Harold Huang <baymaxhuang@...il.com>, netdev@...r.kernel.org
Cc: jasowang@...hat.com, pabeni@...hat.com,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
open list <linux-kernel@...r.kernel.org>,
"open list:XDP (eXpress Data Path)" <bpf@...r.kernel.org>,
edumazet@...gle.com
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from
batched XDP buffs
On 2/25/22 01:02, Harold Huang wrote:
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
>
> Test topology:
>
> [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
>
> Iperf stream:
>
> Before:
> ...
> [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
> [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
> [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
> [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
> [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
> [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver
>
> After:
> ...
> [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
> [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
> [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
> [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
> [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
> [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
> ....
>
> Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> Signed-off-by: Harold Huang <baymaxhuang@...il.com>
> ---
> v1 -> v2
> - fix commit messages
> - add queued flag to avoid void unnecessary napi suggested by Jason
>
> drivers/net/tun.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..c7d8b7c821d8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
> }
>
> static int tun_xdp_one(struct tun_struct *tun,
> - struct tun_file *tfile,
> + struct tun_file *tfile, int *queued,
> struct xdp_buff *xdp, int *flush,
> struct tun_page *tpage)
> {
> @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> struct virtio_net_hdr *gso = &hdr->gso;
> struct bpf_prog *xdp_prog;
> struct sk_buff *skb = NULL;
> + struct sk_buff_head *queue;
> u32 rxhash = 0, act;
> int buflen = hdr->buflen;
> int err = 0;
> @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
> !tfile->detached)
> rxhash = __skb_get_hash_symmetric(skb);
>
> - netif_receive_skb(skb);
> + if (tfile->napi_enabled) {
> + queue = &tfile->sk.sk_write_queue;
> + spin_lock(&queue->lock);
> + __skb_queue_tail(queue, skb);
> + spin_unlock(&queue->lock);
> + (*queued)++;
> + } else {
> + netif_receive_skb(skb);
> + }
>
> /* No need to disable preemption here since this function is
> * always called with bh disabled
> @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> if (ctl && (ctl->type == TUN_MSG_PTR)) {
> struct tun_page tpage;
> int n = ctl->num;
> - int flush = 0;
> + int flush = 0, queued = 0;
>
> memset(&tpage, 0, sizeof(tpage));
>
> @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>
> for (i = 0; i < n; i++) {
> xdp = &((struct xdp_buff *)ctl->ptr)[i];
> - tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> + tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);
How big n can be ?
BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
struct tun_msg_ctl *ctl = m->msg_control;
if (ctl && (ctl->type == TUN_MSG_PTR)) {
int n = ctl->num; // can be set to values in [0..65535]
for (i = 0; i < n; i++) {
xdp = &((struct xdp_buff *)ctl->ptr)[i];
I really do not understand how we prevent malicious user space from
crashing the kernel.
> }
>
> if (flush)
> xdp_do_flush();
>
> + if (tfile->napi_enabled && queued > 0)
> + napi_schedule(&tfile->napi);
> +
> rcu_read_unlock();
> local_bh_enable();
>
Powered by blists - more mailing lists