[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA93jw7G5ukKv2fM3D3YQKUcAPs7A8cW46gRt6gJnYLYaRnNWg@mail.gmail.com>
Date: Fri, 23 Feb 2024 07:58:34 -0500
From: Dave Taht <dave.taht@...il.com>
To: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Cc: Jason Wang <jasowang@...hat.com>, "Michael S. Tsirkin" <mst@...hat.com>, hengqi@...ux.alibaba.com,
netdev@...r.kernel.org
Subject: Re: virtio-net + BQL
On Fri, Feb 23, 2024 at 3:59 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
>
> Hi Dave,
>
> We study the BQL recently.
>
> For virtio-net, the skb orphan mode is the problem for the BQL. But now, we have
> netdim, maybe it is time for a change. @Heng is working for the netdim.
>
> But the performance number from https://lwn.net/Articles/469652/ has not appeal
> to me.
>
> The below number is good, but that just work when the nic is busy.
>
> No BQL, tso on: 3000-3200K bytes in queue: 36 tps
> BQL, tso on: 156-194K bytes in queue, 535 tps
That is data from 2011 against a gbit interface. Each of those BQL
queues is additive.
> Or I miss something.
What I see nowadays is 16+Mbytes vanishing into ring buffers and
affecting packet pacing, and fair queue and QoS behaviors. Certainly
my own efforts with eBPF and LibreQos are helping observability here,
but it seems to me that the virtualized stack is not getting enough
pushback from the underlying cloudy driver - be it this one, or nitro.
Most of the time the packet shaping seems to take place in the cloud
network or driver on a per-vm basis.
I know that adding BQL to virtio has been tried before, and I keep
hoping it gets tried again,
measuring latency under load.
BQL has sprouted some new latency issues since 2011 given the enormous
number of hardware queues exposed which I talked about a bit in my
netdevconf talk here:
https://www.youtube.com/watch?v=rWnb543Sdk8&t=2603s
I am also interested in how similar AI workloads are to the infamous
rrul test in a virtualized environment also.
There is also AFAP thinking mis-understood- with a really
mind-bogglingly-wrong application of it documented over here, where
15ms of delay in the stack is considered good.
https://github.com/cilium/cilium/issues/29083#issuecomment-1824756141
So my overall concern is a bit broader than "just add bql", but in
other drivers, it was only 6 lines of code....
> Thanks.
>
--
https://blog.cerowrt.org/post/2024_predictions/
Dave Täht CSO, LibreQos
Powered by blists - more mailing lists