lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 22 Dec 2020 09:24:53 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     wangyunjian <wangyunjian@...wei.com>,
        Network Development <netdev@...r.kernel.org>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        virtualization@...ts.linux-foundation.org,
        "Lilijun (Jerry)" <jerry.lilijun@...wei.com>,
        chenchanghu <chenchanghu@...wei.com>,
        xudingke <xudingke@...wei.com>,
        "huangbin (J)" <brian.huangbin@...wei.com>
Subject: Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

On Mon, Dec 21, 2020 at 11:41 PM Jason Wang <jasowang@...hat.com> wrote:
>
>
> On 2020/12/22 上午7:07, Willem de Bruijn wrote:
> > On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian@...wei.com>  wrote:
> >> From: Yunjian Wang<wangyunjian@...wei.com>
> >>
> >> Currently we break the loop and wake up the vhost_worker when
> >> sendmsg fails. When the worker wakes up again, we'll meet the
> >> same error.
> > The patch is based on the assumption that such error cases always
> > return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?
> >
> >> This will cause high CPU load. To fix this issue,
> >> we can skip this description by ignoring the error. When we
> >> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
> >> the case we don't skip the description and don't drop packet.
> > the -> that
> >
> > here and above: description -> descriptor
> >
> > Perhaps slightly revise to more explicitly state that
> >
> > 1. in the case of persistent failure (i.e., bad packet), the driver
> > drops the packet
> > 2. in the case of transient failure (e.g,. memory pressure) the driver
> > schedules the worker to try again later
>
>
> If we want to go with this way, we need a better time to wakeup the
> worker. Otherwise it just produces more stress on the cpu that is what
> this patch tries to avoid.

Perhaps I misunderstood the purpose of the patch: is it to drop
everything, regardless of transient or persistent failure, until the
ring runs out of descriptors?

I can understand both a blocking and drop strategy during memory
pressure. But partial drop strategy until exceeding ring capacity
seems like a peculiar hybrid?

Powered by blists - more mailing lists