netdev - Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <acebdc23-7627-e170-cdfb-b7656c05e5c5@redhat.com>
Date:   Tue, 22 Dec 2020 12:41:14 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        wangyunjian <wangyunjian@...wei.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        virtualization@...ts.linux-foundation.org,
        "Lilijun (Jerry)" <jerry.lilijun@...wei.com>,
        chenchanghu <chenchanghu@...wei.com>,
        xudingke <xudingke@...wei.com>,
        "huangbin (J)" <brian.huangbin@...wei.com>
Subject: Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg
 fails


On 2020/12/22 上午7:07, Willem de Bruijn wrote:
> On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian@...wei.com>  wrote:
>> From: Yunjian Wang<wangyunjian@...wei.com>
>>
>> Currently we break the loop and wake up the vhost_worker when
>> sendmsg fails. When the worker wakes up again, we'll meet the
>> same error.
> The patch is based on the assumption that such error cases always
> return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?
>
>> This will cause high CPU load. To fix this issue,
>> we can skip this description by ignoring the error. When we
>> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
>> the case we don't skip the description and don't drop packet.
> the -> that
>
> here and above: description -> descriptor
>
> Perhaps slightly revise to more explicitly state that
>
> 1. in the case of persistent failure (i.e., bad packet), the driver
> drops the packet
> 2. in the case of transient failure (e.g,. memory pressure) the driver
> schedules the worker to try again later


If we want to go with this way, we need a better time to wakeup the 
worker. Otherwise it just produces more stress on the cpu that is what 
this patch tries to avoid.

Thanks


>
>