netdev - Re: [PATCH vhost] vhost_net: Fix too many vring kick on busypoll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <97046866-71b8-1256-0836-f16c0bbf4c7b@redhat.com>
Date:   Mon, 2 Jul 2018 16:54:48 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Toshiaki Makita <makita.toshiaki@....ntt.co.jp>,
        "Michael S. Tsirkin" <mst@...hat.com>
Cc:     kvm@...r.kernel.org, virtualization@...ts.linux-foundation.org,
        netdev@...r.kernel.org,
        Tonghao Zhang <zhangtonghao@...ichuxing.com>
Subject: Re: [PATCH vhost] vhost_net: Fix too many vring kick on busypoll



On 2018年07月02日 16:05, Toshiaki Makita wrote:
> On 2018/07/02 16:52, Jason Wang wrote:
>> On 2018年07月02日 15:11, Toshiaki Makita wrote:
>>> On 2018/07/02 15:17, Jason Wang wrote:
>>>> On 2018年07月02日 12:37, Toshiaki Makita wrote:
>>>>> On 2018/07/02 11:54, Jason Wang wrote:
>>>>>> On 2018年07月02日 10:45, Toshiaki Makita wrote:
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> On 2018/06/29 18:30, Jason Wang wrote:
>>>>>>>> On 2018年06月29日 16:09, Toshiaki Makita wrote:
>>>>>>> ...
>>>>>>>>> To fix this, poll the work instead of enabling notification when
>>>>>>>>> busypoll is interrupted by something. IMHO signal_pending() and
>>>>>>>>> vhost_has_work() are kind of interruptions rather than signals to
>>>>>>>>> completely cancel the busypoll, so let's run busypoll after the
>>>>>>>>> necessary work is done. In order to avoid too long busyloop due to
>>>>>>>>> interruption, save the endtime in vq field and use it when
>>>>>>>>> reentering
>>>>>>>>> the work function.
>>>>>>>> I think we don't care long busyloop unless e.g tx can starve rx?
>>>>>>> I just want to keep it user-controllable. Unless memorizing it
>>>>>>> busypoll
>>>>>>> can run unexpectedly long.
>>>>>> I think the total amount of time for busy polling is bounded. If I was
>>>>>> wrong, it should be a bug somewhere.
>>>>> Consider this kind of scenario:
>>>>> 0. Set 100us busypoll for example.
>>>>> 1. handle_tx() runs busypoll.
>>>>> 2. Something like zerocopy queues tx_work within 100us.
>>>>> 3. busypoll exits and call handle_tx() again.
>>>>> 4. Repeat 1-3.
>>>>>
>>>>> In this case handle_tx() does not process packets but busypoll
>>>>> essentially runs beyond 100us without endtime memorized. This may be
>>>>> just a theoretical problem, but I was worried that more code to poll tx
>>>>> queue can be added in the future and it becomes realistic.
>>>> Yes, but consider zerocopy tends to batch 16 used packets and we will
>>>> finally finish all processing of packets. The above won't be endless, so
>>>> it was probably tolerable.
>>> Right. So endtime memorization is more like a future-proof thing.
>>> Would you like to keep it or change something?
>> I think we'd better introduce it only when we meet real bugs.
> I'll change it to a flag to indicate the previous busypoll is interrupted.
>
>>>>>>>>> Performance numbers:
>>>>>>>>>
>>>>>>>>> - Bulk transfer from guest to external physical server.
>>>>>>>>>          [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)-->
>>>>>>>>> [Server]
>>>>>>>> Just to confirm in this case since zerocopy is enabled, we are in
>>>>>>>> fact
>>>>>>>> use the generic XDP datapath?
>>>>>>> For some reason zerocopy was not applied for most packets, so in most
>>>>>>> cases driver XDP was used. I was going to dig into it but not yet.
>>>>>> Right, just to confirm this. This is expected.
>>>>>>
>>>>>> In tuntap, we do native XDP only for small and non zerocopy
>>>>>> packets. See
>>>>>> tun_can_build_skb(). The reason is XDP may adjust packet header
>>>>>> which is
>>>>>> not supported by zercopy. We can only use XDP generic for zerocopy in
>>>>>> this case.
>>>>> I think I understand when driver XDP can be used. What I'm not sure and
>>>>> was going to narrow down is why zerocopy is mostly not applied.
>>>>>
>>>> I see, any touch to the zerocopy packet (clone, header expansion or
>>>> segmentation) that lead a userspace copy will increase the error counter
>>>> in vhost_net. Then vhost_net_tx_select_zcopy() may choose not to use
>>>> zerocopy. So it was probably something in your setup or a bug somewhere.
>>> Thanks for the hint!
> Seems zerocopy packets are always nonlinear and
> netif_receive_generic_xdp() calls skb_linearize() in which
> __pskb_pull_tail() calls skb_zcopy_clear(). Looks like tx_zcopy_err is
> always counted when zerocopy is used with XDP in my env.
>

Right, since it needs do copy there, so we tend to disable zerocopy in 
this case.

Thanks