lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 21 Dec 2020 18:07:54 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     wangyunjian <wangyunjian@...wei.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        virtualization@...ts.linux-foundation.org,
        "Lilijun (Jerry)" <jerry.lilijun@...wei.com>,
        chenchanghu <chenchanghu@...wei.com>,
        xudingke <xudingke@...wei.com>,
        "huangbin (J)" <brian.huangbin@...wei.com>
Subject: Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

On Wed, Dec 16, 2020 at 3:20 AM wangyunjian <wangyunjian@...wei.com> wrote:
>
> From: Yunjian Wang <wangyunjian@...wei.com>
>
> Currently we break the loop and wake up the vhost_worker when
> sendmsg fails. When the worker wakes up again, we'll meet the
> same error.

The patch is based on the assumption that such error cases always
return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?

> This will cause high CPU load. To fix this issue,
> we can skip this description by ignoring the error. When we
> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
> the case we don't skip the description and don't drop packet.

the -> that

here and above: description -> descriptor

Perhaps slightly revise to more explicitly state that

1. in the case of persistent failure (i.e., bad packet), the driver
drops the packet
2. in the case of transient failure (e.g,. memory pressure) the driver
schedules the worker to try again later


> Signed-off-by: Yunjian Wang <wangyunjian@...wei.com>
> ---
>  drivers/vhost/net.c | 21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index c8784dfafdd7..3d33f3183abe 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -827,16 +827,13 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>                                 msg.msg_flags &= ~MSG_MORE;
>                 }
>
> -               /* TODO: Check specific error and bomb out unless ENOBUFS? */
>                 err = sock->ops->sendmsg(sock, &msg, len);
> -               if (unlikely(err < 0)) {
> +               if (unlikely(err == -EAGAIN)) {
>                         vhost_discard_vq_desc(vq, 1);
>                         vhost_net_enable_vq(net, vq);
>                         break;
> -               }
> -               if (err != len)
> -                       pr_debug("Truncated TX packet: len %d != %zd\n",
> -                                err, len);
> +               } else if (unlikely(err != len))
> +                       vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);

sending -> send

Even though vq_err is a wrapper around pr_debug, I agree with Michael
that such a change should be a separate patch to net-next, does not
belong in a fix.

More importantly, the error message is now the same for persistent
errors and for truncated packets. But on truncation the packet was
sent, so that is not entirely correct.

>  done:
>                 vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
>                 vq->heads[nvq->done_idx].len = 0;
> @@ -922,7 +919,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>                         msg.msg_flags &= ~MSG_MORE;
>                 }
>
> -               /* TODO: Check specific error and bomb out unless ENOBUFS? */
>                 err = sock->ops->sendmsg(sock, &msg, len);
>                 if (unlikely(err < 0)) {
>                         if (zcopy_used) {
> @@ -931,13 +927,14 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>                                 nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
>                                         % UIO_MAXIOV;
>                         }
> -                       vhost_discard_vq_desc(vq, 1);
> -                       vhost_net_enable_vq(net, vq);
> -                       break;
> +                       if (err == -EAGAIN) {
> +                               vhost_discard_vq_desc(vq, 1);
> +                               vhost_net_enable_vq(net, vq);
> +                               break;
> +                       }
>                 }
>                 if (err != len)
> -                       pr_debug("Truncated TX packet: "
> -                                " len %d != %zd\n", err, len);
> +                       vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);
>                 if (!zcopy_used)
>                         vhost_add_used_and_signal(&net->dev, vq, head, 0);
>                 else
> --
> 2.23.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ