netdev - Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c11c50db-4e66-f3c2-5ebb-519ad6dbc2fe@redhat.com>
Date:   Thu, 11 May 2017 10:43:58 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Anton Ivanov <anton.ivanov@...bridgegreys.com>,
        Stefan Hajnoczi <stefanha@...hat.com>
Cc:     "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
        "Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton
 (Was: "Re: net_sched strange in 4.11")



On 2017年05月10日 17:42, Anton Ivanov wrote:
> On 10/05/17 09:56, Jason Wang wrote:
>>
>>
>> On 2017年05月10日 13:28, Anton Ivanov wrote:
>>> On 10/05/17 03:18, Jason Wang wrote:
>>>>
>>>> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>>>>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>>>>> I have figured it out. Two issues.
>>>>>>
>>>>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>>>>> the qdisc
>>>>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>>>>> TCQ_F_CAN_BYPASS is
>>>>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>>>>> answers my
>>>>>> "how does this work at all" question).
>>>>>>
>>>>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>>>>> off in
>>>>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>>>>> the driver
>>>>>> is not DQL enabled xmit_more is never ever set. If the driver is DQL
>>>>>> enabled
>>>>>> the queue is adjusted to ensure xmit_more stops happening within
>>>>>> 10-15 xmit
>>>>>> cycles.
>>>>>>
>>>>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
>>>>>> There,
>>>>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>>>>> packets.
>>>>>> The cost of putting them into the vNIC buffers is negligible. You 
>>>>>> want
>>>>>> xmit_more to happen - it makes between 50% and 300% (depending on 
>>>>>> vNIC
>>>>>> design) difference. If there is no xmit_more the vNIC will 
>>>>>> immediately
>>>>>> "kick" the hypervisor and try to signal that  the packet needs to 
>>>>>> move
>>>>>> straight away (as for example in virtio_net).
>>>> How do you measure the performance? TCP or just measure pps?
>>> In this particular case - tcp from guest. I have a couple of other
>>> benchmarks (forwarding, etc).
>>
>> One more question, is the number for virtio-net or other emulated vNIC?
>
> Other for now - you are cc-ed to keep you in the loop.
>
> Virtio is next on my list - I am revisiting the l2tpv3.c driver in 
> QEMU and looking at how to preserve bulking by adding back sendmmsg 
> (as well as a list of other features/transports).
>
> We had sendmmsg removed for the final inclusion in QEMU 2.1, it 
> presently uses only recvmmsg so for the time being it does not care. 
> That will most likely change once it starts using sendmmsg as well.

An issue is that qemu net API does not support bulking, do you plan to 
add it?

Thanks