netdev - Re: virtio-net + BQL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACGkMEuFRQW6TFkF8KSHd7kGQH991pj_fCAT8BkMt8T51mEbWg@mail.gmail.com>
Date: Mon, 26 Feb 2024 13:03:12 +0800
From: Jason Wang <jasowang@...hat.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Dave Taht <dave.taht@...il.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, 
	hengqi@...ux.alibaba.com, netdev@...r.kernel.org
Subject: Re: virtio-net + BQL

On Mon, Feb 26, 2024 at 4:26 AM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Sun, Feb 25, 2024 at 01:58:53PM -0500, Dave Taht wrote:
> > On Sun, Feb 25, 2024 at 1:36 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > >
> > > On Fri, Feb 23, 2024 at 07:58:34AM -0500, Dave Taht wrote:
> > > > On Fri, Feb 23, 2024 at 3:59 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
> > > > >
> > > > > Hi Dave,
> > > > >
> > > > > We study the BQL recently.
> > > > >
> > > > > For virtio-net, the skb orphan mode is the problem for the BQL. But now, we have
> > > > > netdim, maybe it is time for a change. @Heng is working for the netdim.
> > > > >
> > > > > But the performance number from https://lwn.net/Articles/469652/ has not appeal
> > > > > to me.
> > > > >
> > > > > The below number is good, but that just work when the nic is busy.
> > > > >
> > > > >         No BQL, tso on: 3000-3200K bytes in queue: 36 tps
> > > > >         BQL, tso on: 156-194K bytes in queue, 535 tps
> > > >
> > > > That is data from 2011 against a gbit interface. Each of those BQL
> > > > queues is additive.
> > > >
> > > > > Or I miss something.
> > > >
> > > > What I see nowadays is 16+Mbytes vanishing into ring buffers and
> > > > affecting packet pacing, and fair queue and QoS behaviors. Certainly
> > > > my own efforts with eBPF and LibreQos are helping observability here,
> > > > but it seems to me that the virtualized stack is not getting enough
> > > > pushback from the underlying cloudy driver - be it this one, or nitro.
> > > > Most of the time the packet shaping seems to take place in the cloud
> > > > network or driver on a per-vm basis.
> > > >
> > > > I know that adding BQL to virtio has been tried before, and I keep
> > > > hoping it gets tried again,
> > > > measuring latency under load.
> > > >
> > > > BQL has sprouted some new latency issues since 2011 given the enormous
> > > > number of hardware queues exposed which I talked about a bit in my
> > > > netdevconf talk here:
> > > >
> > > > https://www.youtube.com/watch?v=rWnb543Sdk8&t=2603s
> > > >
> > > > I am also interested in how similar AI workloads are to the infamous
> > > > rrul test in a virtualized environment also.
> > > >
> > > > There is also AFAP thinking mis-understood-  with a really
> > > > mind-bogglingly-wrong application of it documented over here, where
> > > > 15ms of delay in the stack is considered good.
> > > >
> > > > https://github.com/cilium/cilium/issues/29083#issuecomment-1824756141
> > > >
> > > > So my overall concern is a bit broader than "just add bql", but in
> > > > other drivers, it was only 6 lines of code....
> > > >
> > > > > Thanks.
> > > > >
> > > >
> > > >
> > >
> > > It is less BQL it is more TCP small queues which do not
> > > seem to work well when your kernel isn't running part of the
> > > time because hypervisor scheduled it out. wireless has some
> > > of the same problem with huge variance in latency unrelated
> > > to load and IIRC worked around that by
> > > tuning socket queue size slightly differently.
> >
> > Add that to the problems-with-virtualization list, then. :/
>
> yep
>
> for example, attempts to drop packets to fight bufferbloat do
> not work well because as you start dropping packets you have less
> work to do on host and so VM starts going even faster
> flooding you with even more packets.
>
> virtualization has to be treated more like userspace than like
> a physical machine.

Probaby, but I think we need a new rfc with a benchmark for more
information (there's no need to bother with the mode switching so it
should be a tiny patch).

One interesting thing is that gve implements bql.

Thanks

>
>
> > I was
> > aghast at a fix jakub put in to kick things at 7ms that went by
> > recently.
>
> which one is it?
>
> > Wireless is kind of an overly broad topic. I was (6 years ago) pretty
> > happy with all the fixes we put in there for WiFi softmac devices, the
> > mt76 and the new mt79 seem to be performing rather well. Ath9k is
> > still good, ath10k not horrible, I have no data about ath11k, and
> > let's not talk about the Broadcom nightmare.
> >
> > This was still a pretty good day, in my memory:
> > https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002
> >
> > Is something else in wif igoing to hell? There are still, oh, 200
> > drivers left to fix. ENOFUNDING.
> >
> > And so far as I know the 3GPP (5g) work is entirely out of tree and
> > almost entirely dpdk or ebpf?
> >
> > >
> > >
> > > --
> > > MST
> > >
> >
> >
> > --
> > https://blog.cerowrt.org/post/2024_predictions/
> > Dave Täht CSO, LibreQos
>