[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACGkMEtTdvbc1rk6sk=KE7J2L0=R2M-FMxK+DfJDUYMTPbPJGA@mail.gmail.com>
Date: Mon, 28 Feb 2022 12:20:06 +0800
From: Jason Wang <jasowang@...hat.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Harold Huang <baymaxhuang@...il.com>,
netdev <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
open list <linux-kernel@...r.kernel.org>,
"open list:XDP (eXpress Data Path)" <bpf@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from
batched XDP buffs
On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <eric.dumazet@...il.com> wrote:
>
>
> On 2/25/22 01:02, Harold Huang wrote:
> > In tun, NAPI is supported and we can also use NAPI in the path of
> > batched XDP buffs to accelerate packet processing. What is more, after
> > we use NAPI, GRO is also supported. The iperf shows that the throughput of
> > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> > Gbps nearly reachs the line speed of the phy nic and there is still about
> > 15% idle cpu core remaining on the vhost thread.
> >
> > Test topology:
> >
> > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
> >
> > Iperf stream:
> >
> > Before:
> > ...
> > [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
> > [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
> > [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
> > [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
> > [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
> > [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver
> >
> > After:
> > ...
> > [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
> > [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
> > [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
> > [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
> > [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
> > [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
> > ....
> >
> > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> > Signed-off-by: Harold Huang <baymaxhuang@...il.com>
> > ---
> > v1 -> v2
> > - fix commit messages
> > - add queued flag to avoid void unnecessary napi suggested by Jason
> >
> > drivers/net/tun.c | 20 ++++++++++++++++----
> > 1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index fed85447701a..c7d8b7c821d8 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
> > }
> >
> > static int tun_xdp_one(struct tun_struct *tun,
> > - struct tun_file *tfile,
> > + struct tun_file *tfile, int *queued,
> > struct xdp_buff *xdp, int *flush,
> > struct tun_page *tpage)
> > {
> > @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> > struct virtio_net_hdr *gso = &hdr->gso;
> > struct bpf_prog *xdp_prog;
> > struct sk_buff *skb = NULL;
> > + struct sk_buff_head *queue;
> > u32 rxhash = 0, act;
> > int buflen = hdr->buflen;
> > int err = 0;
> > @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
> > !tfile->detached)
> > rxhash = __skb_get_hash_symmetric(skb);
> >
> > - netif_receive_skb(skb);
> > + if (tfile->napi_enabled) {
> > + queue = &tfile->sk.sk_write_queue;
> > + spin_lock(&queue->lock);
> > + __skb_queue_tail(queue, skb);
> > + spin_unlock(&queue->lock);
> > + (*queued)++;
> > + } else {
> > + netif_receive_skb(skb);
> > + }
> >
> > /* No need to disable preemption here since this function is
> > * always called with bh disabled
> > @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> > if (ctl && (ctl->type == TUN_MSG_PTR)) {
> > struct tun_page tpage;
> > int n = ctl->num;
> > - int flush = 0;
> > + int flush = 0, queued = 0;
> >
> > memset(&tpage, 0, sizeof(tpage));
> >
> > @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> >
> > for (i = 0; i < n; i++) {
> > xdp = &((struct xdp_buff *)ctl->ptr)[i];
> > - tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> > + tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);
>
>
> How big n can be ?
>
> BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
>
> struct tun_msg_ctl *ctl = m->msg_control;
>
> if (ctl && (ctl->type == TUN_MSG_PTR)) {
>
> int n = ctl->num; // can be set to values in [0..65535]
>
> for (i = 0; i < n; i++) {
>
> xdp = &((struct xdp_buff *)ctl->ptr)[i];
>
>
> I really do not understand how we prevent malicious user space from
> crashing the kernel.
It looks to me the only user for this is vhost-net which limits it to
64, userspace can't use sendmsg() directly on tap.
Thanks
>
>
>
> > }
> >
> > if (flush)
> > xdp_do_flush();
> >
> > + if (tfile->napi_enabled && queued > 0)
> > + napi_schedule(&tfile->napi);
> > +
> > rcu_read_unlock();
> > local_bh_enable();
> >
>
Powered by blists - more mailing lists