[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16de49b9-dec1-d1df-aa59-dedd90fffb92@cambridgegreys.com>
Date: Wed, 10 May 2017 06:28:59 +0100
From: Anton Ivanov <anton.ivanov@...bridgegreys.com>
To: Jason Wang <jasowang@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
"Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton
(Was: "Re: net_sched strange in 4.11")
On 10/05/17 03:18, Jason Wang wrote:
>
>
> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>> I have figured it out. Two issues.
>>>
>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>> the qdisc
>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>> TCQ_F_CAN_BYPASS is
>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>> answers my
>>> "how does this work at all" question).
>>>
>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>> off in
>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>> the driver
>>> is not DQL enabled xmit_more is never ever set. If the driver is DQL
>>> enabled
>>> the queue is adjusted to ensure xmit_more stops happening within
>>> 10-15 xmit
>>> cycles.
>>>
>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
>>> There,
>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>> packets.
>>> The cost of putting them into the vNIC buffers is negligible. You want
>>> xmit_more to happen - it makes between 50% and 300% (depending on vNIC
>>> design) difference. If there is no xmit_more the vNIC will immediately
>>> "kick" the hypervisor and try to signal that the packet needs to move
>>> straight away (as for example in virtio_net).
>
> How do you measure the performance? TCP or just measure pps?
In this particular case - tcp from guest. I have a couple of other
benchmarks (forwarding, etc).
>
>>>
>>> In addition to that, the perceived line rate is proportional to this
>>> cost,
>>> so I am not sure that the current dql math holds. In fact, I think
>>> it does
>>> not - it is trying to adjust something which influences the
>>> perceived line
>>> rate.
>>>
>>> So - how do we turn BOTH bypass and DQL adjustment while under
>>> virtualization and set them to be "always qdisc" + "always xmit_more
>>> allowed"
>
> Virtio-net net does not support BQL. Before commit ea7735d97ba9
> ("virtio-net: move free_old_xmit_skbs"), it's even impossible to
> support that since we don't have tx interrupt for each packet. I
> haven't measured the impact of xmit_more, maybe I was wrong but I
> think it may help in some cases since it may improve the batching on
> host more or less.
If you do not support BQL, you might as well look the xmit_more part
kick code path. Line 1127.
bool kick = !skb->xmit_more; effectively means kick = true;
It will never be triggered. You will be kicking each packet and per
packet. xmit_more is now set only out of BQL. If BQL is not enabled you
never get it. Now, will the current dql code work correctly if you do
not have a defined line rate and completion interrupts - no idea.
Probably not. IMHO instead of trying to fix it there should be a way for
a device or architecture to turn it off.
To be clear - I ran into this working on my own drivers for UML, you are
cc-ed because you are likely to be one of the most affected.
A.
>
> Thanks
>
>>>
>>> A.
>>>
>>> P.S. Cc-ing virtio maintainer
>> CCing Michael Tsirkin and Jason Wang, who are the core virtio and
>> virtio-net maintainers. (I maintain the vsock driver - it's unrelated
>> to this discussion.)
>>
>>> A.
>>>
>>>
>>> On 08/05/17 08:15, Anton Ivanov wrote:
>>>> Hi all,
>>>>
>>>> I was revising some of my old work for UML to prepare it for
>>>> submission
>>>> and I noticed that skb->xmit_more does not seem to be set any more.
>>>>
>>>> I traced the issue as far as net/sched/sched_generic.c
>>>>
>>>> try_bulk_dequeue_skb() is never invoked (the drivers I am working
>>>> on are
>>>> dql enabled so that is not the problem).
>>>>
>>>> More interestingly, if I put a breakpoint and debug output into
>>>> dequeue_skb() around line 147 - right before the bulk: tag that skb
>>>> there is always NULL. ???
>>>>
>>>> Similarly, debug in pfifo_fast_dequeue shows only NULLs being
>>>> dequeued.
>>>> Again - ???
>>>>
>>>> First and foremost, I apologize for the silly question, but how can
>>>> this
>>>> work at all? I see the skbs showing up at the driver level, why are
>>>> NULLs being returned at qdisc dequeue and where do the skbs at the
>>>> driver level come from?
>>>>
>>>> Second, where should I look to fix it?
>>>>
>>>> A.
>>>>
>>>
>>> --
>>> Anton R. Ivanov
>>>
>>> Cambridge Greys Limited, England company No 10273661
>>> http://www.cambridgegreys.com/
>>>
>
>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
Powered by blists - more mailing lists