lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 27 Feb 2024 10:32:13 +0800
From: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Dave Taht <dave.taht@...il.com>,
 hengqi@...ux.alibaba.com,
 netdev@...r.kernel.org,
 "Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: virtio-net + BQL

On Mon, 26 Feb 2024 13:03:12 +0800, Jason Wang <jasowang@...hat.com> wrote:
> On Mon, Feb 26, 2024 at 4:26 AM Michael S. Tsirkin <mst@...hat.com> wrote:
> >
> > On Sun, Feb 25, 2024 at 01:58:53PM -0500, Dave Taht wrote:
> > > On Sun, Feb 25, 2024 at 1:36 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > > >
> > > > On Fri, Feb 23, 2024 at 07:58:34AM -0500, Dave Taht wrote:
> > > > > On Fri, Feb 23, 2024 at 3:59 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
> > > > > >
> > > > > > Hi Dave,
> > > > > >
> > > > > > We study the BQL recently.
> > > > > >
> > > > > > For virtio-net, the skb orphan mode is the problem for the BQL. But now, we have
> > > > > > netdim, maybe it is time for a change. @Heng is working for the netdim.
> > > > > >
> > > > > > But the performance number from https://lwn.net/Articles/469652/ has not appeal
> > > > > > to me.
> > > > > >
> > > > > > The below number is good, but that just work when the nic is busy.
> > > > > >
> > > > > >         No BQL, tso on: 3000-3200K bytes in queue: 36 tps
> > > > > >         BQL, tso on: 156-194K bytes in queue, 535 tps
> > > > >
> > > > > That is data from 2011 against a gbit interface. Each of those BQL
> > > > > queues is additive.
> > > > >
> > > > > > Or I miss something.
> > > > >
> > > > > What I see nowadays is 16+Mbytes vanishing into ring buffers and
> > > > > affecting packet pacing, and fair queue and QoS behaviors. Certainly
> > > > > my own efforts with eBPF and LibreQos are helping observability here,
> > > > > but it seems to me that the virtualized stack is not getting enough
> > > > > pushback from the underlying cloudy driver - be it this one, or nitro.
> > > > > Most of the time the packet shaping seems to take place in the cloud
> > > > > network or driver on a per-vm basis.
> > > > >
> > > > > I know that adding BQL to virtio has been tried before, and I keep
> > > > > hoping it gets tried again,
> > > > > measuring latency under load.
> > > > >
> > > > > BQL has sprouted some new latency issues since 2011 given the enormous
> > > > > number of hardware queues exposed which I talked about a bit in my
> > > > > netdevconf talk here:
> > > > >
> > > > > https://www.youtube.com/watch?v=rWnb543Sdk8&t=2603s
> > > > >
> > > > > I am also interested in how similar AI workloads are to the infamous
> > > > > rrul test in a virtualized environment also.
> > > > >
> > > > > There is also AFAP thinking mis-understood-  with a really
> > > > > mind-bogglingly-wrong application of it documented over here, where
> > > > > 15ms of delay in the stack is considered good.
> > > > >
> > > > > https://github.com/cilium/cilium/issues/29083#issuecomment-1824756141
> > > > >
> > > > > So my overall concern is a bit broader than "just add bql", but in
> > > > > other drivers, it was only 6 lines of code....
> > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > > >
> > > >
> > > > It is less BQL it is more TCP small queues which do not
> > > > seem to work well when your kernel isn't running part of the
> > > > time because hypervisor scheduled it out. wireless has some
> > > > of the same problem with huge variance in latency unrelated
> > > > to load and IIRC worked around that by
> > > > tuning socket queue size slightly differently.
> > >
> > > Add that to the problems-with-virtualization list, then. :/
> >
> > yep
> >
> > for example, attempts to drop packets to fight bufferbloat do
> > not work well because as you start dropping packets you have less
> > work to do on host and so VM starts going even faster
> > flooding you with even more packets.
> >
> > virtualization has to be treated more like userspace than like
> > a physical machine.
>
> Probaby, but I think we need a new rfc with a benchmark for more
> information (there's no need to bother with the mode switching so it
> should be a tiny patch).

YES.

We need to know the cases that BQL can improve. Then I can do some
benchmarks on it.

I don't think the orphan mode is a problem. We can clarify that
the no-orphan mode is the future, so we can skip the orphan mode.

Thanks.


>
> One interesting thing is that gve implements bql.
>
> Thanks
>
> >
> >
> > > I was
> > > aghast at a fix jakub put in to kick things at 7ms that went by
> > > recently.
> >
> > which one is it?
> >
> > > Wireless is kind of an overly broad topic. I was (6 years ago) pretty
> > > happy with all the fixes we put in there for WiFi softmac devices, the
> > > mt76 and the new mt79 seem to be performing rather well. Ath9k is
> > > still good, ath10k not horrible, I have no data about ath11k, and
> > > let's not talk about the Broadcom nightmare.
> > >
> > > This was still a pretty good day, in my memory:
> > > https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002
> > >
> > > Is something else in wif igoing to hell? There are still, oh, 200
> > > drivers left to fix. ENOFUNDING.
> > >
> > > And so far as I know the 3GPP (5g) work is entirely out of tree and
> > > almost entirely dpdk or ebpf?
> > >
> > > >
> > > >
> > > > --
> > > > MST
> > > >
> > >
> > >
> > > --
> > > https://blog.cerowrt.org/post/2024_predictions/
> > > Dave Täht CSO, LibreQos
> >
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ