[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPpAL=yVvT4VYFMr6gykMuo3MYmfJMApWgW_naWWwTNKeTRGJA@mail.gmail.com>
Date: Wed, 6 Aug 2025 18:42:46 +0800
From: Lei Yang <leiyang@...hat.com>
To: Jason Wang <jasowang@...hat.com>
Cc: "Michael S. Tsirkin" <mst@...hat.com>, xuanzhuo@...ux.alibaba.com, eperezma@...hat.com,
virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V5] virtio_ring: add in order support
Tested this series of patches with virtio-net regression tests,
everything works fine.
Tested-by: Lei Yang <leiyang@...hat.com>
On Tue, Jul 29, 2025 at 10:34 AM Jason Wang <jasowang@...hat.com> wrote:
>
> On Mon, Jul 28, 2025 at 6:17 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> >
> > On Mon, Jul 28, 2025 at 02:41:29PM +0800, Jason Wang wrote:
> > > This patch implements in order support for both split virtqueue and
> > > packed virtqueue. Perfomance could be gained for the device where the
> > > memory access could be expensive (e.g vhost-net or a real PCI device):
> > >
> > > Benchmark with KVM guest:
> > >
> > > Vhost-net on the host: (pktgen + XDP_DROP):
> > >
> > > in_order=off | in_order=on | +%
> > > TX: 5.20Mpps | 6.20Mpps | +19%
> > > RX: 3.47Mpps | 3.61Mpps | + 4%
> > >
> > > Vhost-user(testpmd) on the host: (pktgen/XDP_DROP):
> > >
> > > For split virtqueue:
> > >
> > > in_order=off | in_order=on | +%
> > > TX: 5.60Mpps | 5.60Mpps | +0.0%
> > > RX: 9.16Mpps | 9.61Mpps | +4.9%
> > >
> > > For packed virtqueue:
> > >
> > > in_order=off | in_order=on | +%
> > > TX: 5.60Mpps | 5.70Mpps | +1.7%
> > > RX: 10.6Mpps | 10.8Mpps | +1.8%
> > >
> > > Benchmark also shows no performance impact for in_order=off for queue
> > > size with 256 and 1024.
> > >
> > > Signed-off-by: Jason Wang <jasowang@...hat.com>
> > > Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
> > > ---
> > > Changes since V4:
> > > - Fix build error when DEBUG is enabled
> > > - Fix function duplications
> > > - Remove unnecessary new lines
> > > ---
> > > drivers/virtio/virtio_ring.c | 421 +++++++++++++++++++++++++++++++++--
> > > 1 file changed, 401 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index e675d8305dbf..c6558e271f97 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -70,11 +70,14 @@
> > > enum vq_layout {
> > > SPLIT = 0,
> > > PACKED,
> > > + SPLIT_IN_ORDER,
> > > + PACKED_IN_ORDER,
> > > VQ_TYPE_MAX,
> > > };
> >
> >
> > how about specifying the #s here?
> > SPLIT = 0,
> > PACKED = 1,
> > IN_ORDER = 2,
> > SPLIT_IN_ORDER = 2,
> > PACKED_IN_ORDER = 3,
> > VQ_TYPE_MAX,
> >
> > and then
> >
> > static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
> > {
> > return vq->layout & PACKED;
> > }
> >
> > static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq)
> > {
> > return vq->layout & IN_ORDER;
> > }
> >
> > which is a tiny bit less code.
> >
> > worth doing?
>
> Probably not, for example it would introduce branches. As we
> discussed, once we have sufficient optimizations, most of the branches
> could be saved.
>
> >
> > >
> > > struct vring_desc_state_split {
> > > void *data; /* Data for callback. */
> > > + u32 total_len; /* Buffer Length */
> > >
> > > /* Indirect desc table and extra table, if any. These two will be
> > > * allocated together. So we won't stress more to the memory allocator.
> >
> >
> > this is only used for in_order, and it increases the struct size
> > by half due to padding. why not a separate struct?
> > Or if you like, reuse vring_desc_state_packed - it is same
> > size with this addition.
> >
> >
> > > @@ -84,6 +87,7 @@ struct vring_desc_state_split {
> > >
> > > struct vring_desc_state_packed {
> > > void *data; /* Data for callback. */
> > > + u32 total_len; /* Buffer Length */
> > >
> > > /* Indirect desc table and extra table, if any. These two will be
> > > * allocated together. So we won't stress more to the memory allocator.
> >
> > there's empty space at the end of this struct.
> > struct vring_desc_state_packed {
> > void *data; /* Data for callback. */
> > u32 total_len; /* Buffer Length */
> >
> > /* Indirect desc table and extra table, if any. These two will be
> > * allocated together. So we won't stress more to the memory allocator.
> > */
> > struct vring_packed_desc *indir_desc;
> > u16 num; /* Descriptor list length. */
> > u16 last; /* The last desc state in a list. */
> > };
> >
> > why not put it there?
> >
>
> Fine.
>
> Thanks
>
>
Powered by blists - more mailing lists