[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACGkMEtFRyzafYqrfuT6gYeosAADL94T5-abEwQ3ThTMn7HQkw@mail.gmail.com>
Date: Wed, 24 Sep 2025 13:38:03 +0800
From: Jason Wang <jasowang@...hat.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: xuanzhuo@...ux.alibaba.com, eperezma@...hat.com,
virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V6 19/19] virtio_ring: add in order support
On Mon, Sep 22, 2025 at 2:24 AM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Fri, Sep 19, 2025 at 03:31:54PM +0800, Jason Wang wrote:
> > This patch implements in order support for both split virtqueue and
> > packed virtqueue. Perfomance could be gained for the device where the
> > memory access could be expensive (e.g vhost-net or a real PCI device):
> >
> > Benchmark with KVM guest:
> >
> > Vhost-net on the host: (pktgen + XDP_DROP):
> >
> > in_order=off | in_order=on | +%
> > TX: 5.20Mpps | 6.20Mpps | +19%
> > RX: 3.47Mpps | 3.61Mpps | + 4%
> >
> > Vhost-user(testpmd) on the host: (pktgen/XDP_DROP):
> >
> > For split virtqueue:
> >
> > in_order=off | in_order=on | +%
> > TX: 5.60Mpps | 5.60Mpps | +0.0%
> > RX: 9.16Mpps | 9.61Mpps | +4.9%
> >
> > For packed virtqueue:
> >
> > in_order=off | in_order=on | +%
> > TX: 5.60Mpps | 5.70Mpps | +1.7%
> > RX: 10.6Mpps | 10.8Mpps | +1.8%
> >
> > Benchmark also shows no performance impact for in_order=off for queue
> > size with 256 and 1024.
> >
> > Signed-off-by: Jason Wang <jasowang@...hat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
> > ---
> > drivers/virtio/virtio_ring.c | 421 +++++++++++++++++++++++++++++++++--
> > 1 file changed, 401 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index b700aa3e56c3..c00b5e57f2fc 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -70,6 +70,8 @@
> > enum vq_layout {
> > SPLIT = 0,
> > PACKED,
> > + SPLIT_IN_ORDER,
> > + PACKED_IN_ORDER,
> > VQ_TYPE_MAX,
> > };
> >
> > @@ -80,6 +82,7 @@ struct vring_desc_state_split {
> > * allocated together. So we won't stress more to the memory allocator.
> > */
> > struct vring_desc *indir_desc;
> > + u32 total_len; /* Buffer Length */
> > };
> >
> > struct vring_desc_state_packed {
> > @@ -91,6 +94,7 @@ struct vring_desc_state_packed {
> > struct vring_packed_desc *indir_desc;
> > u16 num; /* Descriptor list length. */
> > u16 last; /* The last desc state in a list. */
> > + u32 total_len; /* Buffer Length */
> > };
> >
> > struct vring_desc_extra {
> > @@ -206,6 +210,17 @@ struct vring_virtqueue {
> >
> > /* Head of free buffer list. */
> > unsigned int free_head;
> > +
> > + /*
> > + * With IN_ORDER, devices write a single used ring entry with
> > + * the id corresponding to the head entry of the descriptor chain
> > + * describing the last buffer in the batch
> > + */
> > + struct used_entry {
> > + u32 id;
> > + u32 len;
> > + } batch_last;
> > +
> > /* Number we've added since last sync. */
> > unsigned int num_added;
> >
> > @@ -258,7 +273,12 @@ static void vring_free(struct virtqueue *_vq);
> >
> > static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
> > {
> > - return vq->layout == PACKED;
> > + return vq->layout == PACKED || vq->layout == PACKED_IN_ORDER;
> > +}
> > +
> > +static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq)
> > +{
> > + return vq->layout == SPLIT_IN_ORDER || vq->layout == PACKED_IN_ORDER;
> > }
> >
> > static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
> > @@ -575,6 +595,8 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
> > struct scatterlist *sg;
> > struct vring_desc *desc;
> > unsigned int i, n, avail, descs_used, err_idx, c = 0;
> > + /* Total length for in-order */
> > + unsigned int total_len = 0;
> > int head;
> > bool indirect;
> >
> > @@ -646,6 +668,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
> > ++c == total_sg ?
> > 0 : VRING_DESC_F_NEXT,
> > premapped);
> > + total_len += len;
> > }
> > }
> > for (; n < (out_sgs + in_sgs); n++) {
> > @@ -663,6 +686,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
> > i, addr, len,
> > (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
> > VRING_DESC_F_WRITE, premapped);
> > + total_len += len;
> > }
> > }
> >
> > @@ -685,7 +709,12 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
> > vq->vq.num_free -= descs_used;
> >
> > /* Update free pointer */
> > - if (indirect)
> > + if (virtqueue_is_in_order(vq)) {
> > + vq->free_head += descs_used;
> > + if (vq->free_head >= vq->split.vring.num)
> > + vq->free_head -= vq->split.vring.num;
> > + vq->split.desc_state[head].total_len = total_len;;
> > + } else if (indirect)
> > vq->free_head = vq->split.desc_extra[head].next;
> > else
> > vq->free_head = i;
> > @@ -858,6 +887,14 @@ static bool more_used_split(const struct vring_virtqueue *vq)
> > return virtqueue_poll_split(vq, vq->last_used_idx);
> > }
> >
> > +static bool more_used_split_in_order(const struct vring_virtqueue *vq)
> > +{
> > + if (vq->batch_last.id != vq->packed.vring.num)
> > + return true;
>
> Hmm why ->packed?
Right, it's a bug. Let me fix that.
>
> This is actually a problem in this approach, kinda easy to get confused
> which variant to call where.
Probably, but we have been doing this since the introduction of packed
virtqueue.
>
> Worth thinking how to fix this.
>
Yes, but I think this series improves this by introducing the
virtqueue ops. Optimization could be done on top.
For example, having separate files for packed and split with private structure.
Thanks
Powered by blists - more mailing lists