lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27ae4e1c-7c6c-14c2-f3a4-9d0b1265d034@cambridgegreys.com>
Date:   Tue, 9 May 2017 08:46:46 +0100
From:   Anton Ivanov <anton.ivanov@...bridgegreys.com>
To:     "David S. Miller" <davem@...emloft.net>
Cc:     netdev@...r.kernel.org, Stefan Hajnoczi <stefanha@...hat.com>
Subject: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton
 (Was: "Re: net_sched strange in 4.11")

I have figured it out. Two issues.

1) skb->xmit_more is hardly ever set under virtualization because the 
qdisc is usually bypassed because of TCQ_F_CAN_BYPASS. Once 
TCQ_F_CAN_BYPASS is set a virtual NIC driver is not likely see 
skb->xmit_more (this answers my "how does this work at all" question).

2) If that flag is turned off (I patched sched_generic to turn it off in 
pfifo_fast while testing), DQL keeps xmit_more from being set. If the 
driver is not DQL enabled xmit_more is never ever set. If the driver is 
DQL enabled the queue is adjusted to ensure xmit_more stops happening 
within 10-15 xmit cycles.

That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc. 
There, the BIG cost is telling the hypervisor that it needs to "kick" 
the packets. The cost of putting them into the vNIC buffers is 
negligible. You want xmit_more to happen - it makes between 50% and 300% 
(depending on vNIC design) difference. If there is no xmit_more the vNIC 
will immediately "kick" the hypervisor and try to signal that  the 
packet needs to move straight away (as for example in virtio_net).

In addition to that, the perceived line rate is proportional to this 
cost, so I am not sure that the current dql math holds. In fact, I think 
it does not - it is trying to adjust something which influences the 
perceived line rate.

So - how do we turn BOTH bypass and DQL adjustment while under 
virtualization and set them to be "always qdisc" + "always xmit_more 
allowed"

A.

P.S. Cc-ing virtio maintainer

A.


On 08/05/17 08:15, Anton Ivanov wrote:
> Hi all,
>
> I was revising some of my old work for UML to prepare it for 
> submission and I noticed that skb->xmit_more does not seem to be set 
> any more.
>
> I traced the issue as far as net/sched/sched_generic.c
>
> try_bulk_dequeue_skb() is never invoked (the drivers I am working on 
> are dql enabled so that is not the problem).
>
> More interestingly, if I put a breakpoint and debug output into 
> dequeue_skb() around line 147 - right before the bulk: tag that skb 
> there is always NULL. ???
>
> Similarly, debug in pfifo_fast_dequeue shows only NULLs being 
> dequeued. Again - ???
>
> First and foremost, I apologize for the silly question, but how can 
> this work at all? I see the skbs showing up at the driver level, why 
> are NULLs being returned at qdisc dequeue and where do the skbs at the 
> driver level come from?
>
> Second, where should I look to fix it?
>
> A.
>


-- 
Anton R. Ivanov

Cambridge Greys Limited, England company No 10273661
http://www.cambridgegreys.com/

Powered by blists - more mailing lists