netdev - Re: [PATCH vhost] vhost_net: Fix too many vring kick on busypoll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b0670083-d938-8b96-56d2-65d41d6b7c8c@redhat.com>
Date:   Mon, 2 Jul 2018 14:17:07 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Toshiaki Makita <makita.toshiaki@....ntt.co.jp>,
        "Michael S. Tsirkin" <mst@...hat.com>
Cc:     kvm@...r.kernel.org, virtualization@...ts.linux-foundation.org,
        netdev@...r.kernel.org,
        Tonghao Zhang <zhangtonghao@...ichuxing.com>
Subject: Re: [PATCH vhost] vhost_net: Fix too many vring kick on busypoll



On 2018年07月02日 12:37, Toshiaki Makita wrote:
> On 2018/07/02 11:54, Jason Wang wrote:
>> On 2018年07月02日 10:45, Toshiaki Makita wrote:
>>> Hi Jason,
>>>
>>> On 2018/06/29 18:30, Jason Wang wrote:
>>>> On 2018年06月29日 16:09, Toshiaki Makita wrote:
>>> ...
>>>>> To fix this, poll the work instead of enabling notification when
>>>>> busypoll is interrupted by something. IMHO signal_pending() and
>>>>> vhost_has_work() are kind of interruptions rather than signals to
>>>>> completely cancel the busypoll, so let's run busypoll after the
>>>>> necessary work is done. In order to avoid too long busyloop due to
>>>>> interruption, save the endtime in vq field and use it when reentering
>>>>> the work function.
>>>> I think we don't care long busyloop unless e.g tx can starve rx?
>>> I just want to keep it user-controllable. Unless memorizing it busypoll
>>> can run unexpectedly long.
>> I think the total amount of time for busy polling is bounded. If I was
>> wrong, it should be a bug somewhere.
> Consider this kind of scenario:
> 0. Set 100us busypoll for example.
> 1. handle_tx() runs busypoll.
> 2. Something like zerocopy queues tx_work within 100us.
> 3. busypoll exits and call handle_tx() again.
> 4. Repeat 1-3.
>
> In this case handle_tx() does not process packets but busypoll
> essentially runs beyond 100us without endtime memorized. This may be
> just a theoretical problem, but I was worried that more code to poll tx
> queue can be added in the future and it becomes realistic.

Yes, but consider zerocopy tends to batch 16 used packets and we will 
finally finish all processing of packets. The above won't be endless, so 
it was probably tolerable.

>
>>>>> Performance numbers:
>>>>>
>>>>> - Bulk transfer from guest to external physical server.
>>>>>        [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)-->
>>>>> [Server]
>>>> Just to confirm in this case since zerocopy is enabled, we are in fact
>>>> use the generic XDP datapath?
>>> For some reason zerocopy was not applied for most packets, so in most
>>> cases driver XDP was used. I was going to dig into it but not yet.
>> Right, just to confirm this. This is expected.
>>
>> In tuntap, we do native XDP only for small and non zerocopy packets. See
>> tun_can_build_skb(). The reason is XDP may adjust packet header which is
>> not supported by zercopy. We can only use XDP generic for zerocopy in
>> this case.
> I think I understand when driver XDP can be used. What I'm not sure and
> was going to narrow down is why zerocopy is mostly not applied.
>

I see, any touch to the zerocopy packet (clone, header expansion or 
segmentation) that lead a userspace copy will increase the error counter 
in vhost_net. Then vhost_net_tx_select_zcopy() may choose not to use 
zerocopy. So it was probably something in your setup or a bug somewhere.

Thanks