lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1709000456.2609937-2-xuanzhuo@linux.alibaba.com>
Date: Tue, 27 Feb 2024 10:20:56 +0800
From: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
To: Dave Taht <dave.taht@...il.com>
Cc: Jason Wang <jasowang@...hat.com>,
 "Michael S. Tsirkin" <mst@...hat.com>,
 hengqi@...ux.alibaba.com,
 netdev@...r.kernel.org
Subject: Re: virtio-net + BQL

On Fri, 23 Feb 2024 07:58:34 -0500, Dave Taht <dave.taht@...il.com> wrote:
> On Fri, Feb 23, 2024 at 3:59 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
> >
> > Hi Dave,
> >
> > We study the BQL recently.
> >
> > For virtio-net, the skb orphan mode is the problem for the BQL. But now, we have
> > netdim, maybe it is time for a change. @Heng is working for the netdim.
> >
> > But the performance number from https://lwn.net/Articles/469652/ has not appeal
> > to me.
> >
> > The below number is good, but that just work when the nic is busy.
> >
> >         No BQL, tso on: 3000-3200K bytes in queue: 36 tps
> >         BQL, tso on: 156-194K bytes in queue, 535 tps
>
> That is data from 2011 against a gbit interface. Each of those BQL
> queues is additive.
>
> > Or I miss something.
>
> What I see nowadays is 16+Mbytes vanishing into ring buffers and
> affecting packet pacing, and fair queue and QoS behaviors. Certainly
> my own efforts with eBPF and LibreQos are helping observability here,
> but it seems to me that the virtualized stack is not getting enough
> pushback from the underlying cloudy driver - be it this one, or nitro.
> Most of the time the packet shaping seems to take place in the cloud
> network or driver on a per-vm basis.

So for the virtualized stack, do you mean the virtio-net + tap(host).
But now, on the cloud the virtio-net devices are DPUs in most cases.
The DPU is passthrought to the vm. So the virtio-net devices work
more like the hw devices.

On this case, I can do some benchmarks, but I want to do the test
when the nic is not full to simulate the normal user cases.

Can the BQL help to reduce the latency or increase throughput?
Or other benefit.

Thanks.

>
> I know that adding BQL to virtio has been tried before, and I keep
> hoping it gets tried again,
> measuring latency under load.
>
> BQL has sprouted some new latency issues since 2011 given the enormous
> number of hardware queues exposed which I talked about a bit in my
> netdevconf talk here:
>
> https://www.youtube.com/watch?v=rWnb543Sdk8&t=2603s
>
> I am also interested in how similar AI workloads are to the infamous
> rrul test in a virtualized environment also.
>
> There is also AFAP thinking mis-understood-  with a really
> mind-bogglingly-wrong application of it documented over here, where
> 15ms of delay in the stack is considered good.
>
> https://github.com/cilium/cilium/issues/29083#issuecomment-1824756141
>
> So my overall concern is a bit broader than "just add bql", but in
> other drivers, it was only 6 lines of code....
>
> > Thanks.
> >
>
>
> --
> https://blog.cerowrt.org/post/2024_predictions/
> Dave Täht CSO, LibreQos

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ