linux-kernel - Re: [PATCH V5] virtio_ring: add in order support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPpAL=yVvT4VYFMr6gykMuo3MYmfJMApWgW_naWWwTNKeTRGJA@mail.gmail.com>
Date: Wed, 6 Aug 2025 18:42:46 +0800
From: Lei Yang <leiyang@...hat.com>
To: Jason Wang <jasowang@...hat.com>
Cc: "Michael S. Tsirkin" <mst@...hat.com>, xuanzhuo@...ux.alibaba.com, eperezma@...hat.com, 
	virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V5] virtio_ring: add in order support

Tested this series of patches with virtio-net regression tests,
everything works fine.

Tested-by: Lei Yang <leiyang@...hat.com>

On Tue, Jul 29, 2025 at 10:34 AM Jason Wang <jasowang@...hat.com> wrote:
>
> On Mon, Jul 28, 2025 at 6:17 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> >
> > On Mon, Jul 28, 2025 at 02:41:29PM +0800, Jason Wang wrote:
> > > This patch implements in order support for both split virtqueue and
> > > packed virtqueue. Perfomance could be gained for the device where the
> > > memory access could be expensive (e.g vhost-net or a real PCI device):
> > >
> > > Benchmark with KVM guest:
> > >
> > > Vhost-net on the host: (pktgen + XDP_DROP):
> > >
> > >          in_order=off | in_order=on | +%
> > >     TX:  5.20Mpps     | 6.20Mpps    | +19%
> > >     RX:  3.47Mpps     | 3.61Mpps    | + 4%
> > >
> > > Vhost-user(testpmd) on the host: (pktgen/XDP_DROP):
> > >
> > > For split virtqueue:
> > >
> > >          in_order=off | in_order=on | +%
> > >     TX:  5.60Mpps     | 5.60Mpps    | +0.0%
> > >     RX:  9.16Mpps     | 9.61Mpps    | +4.9%
> > >
> > > For packed virtqueue:
> > >
> > >          in_order=off | in_order=on | +%
> > >     TX:  5.60Mpps     | 5.70Mpps    | +1.7%
> > >     RX:  10.6Mpps     | 10.8Mpps    | +1.8%
> > >
> > > Benchmark also shows no performance impact for in_order=off for queue
> > > size with 256 and 1024.
> > >
> > > Signed-off-by: Jason Wang <jasowang@...hat.com>
> > > Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
> > > ---
> > > Changes since V4:
> > > - Fix build error when DEBUG is enabled
> > > - Fix function duplications
> > > - Remove unnecessary new lines
> > > ---
> > > drivers/virtio/virtio_ring.c | 421 +++++++++++++++++++++++++++++++++--
> > >  1 file changed, 401 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index e675d8305dbf..c6558e271f97 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -70,11 +70,14 @@
> > >  enum vq_layout {
> > >       SPLIT = 0,
> > >       PACKED,
> > > +     SPLIT_IN_ORDER,
> > > +     PACKED_IN_ORDER,
> > >       VQ_TYPE_MAX,
> > >  };
> >
> >
> > how about specifying the #s here?
> >         SPLIT = 0,
> >         PACKED = 1,
> >         IN_ORDER = 2,
> >         SPLIT_IN_ORDER = 2,
> >         PACKED_IN_ORDER = 3,
> >         VQ_TYPE_MAX,
> >
> > and then
> >
> >   static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
> >   {
> >         return vq->layout & PACKED;
> >  }
> >
> >  static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq)
> >  {
> >         return vq->layout & IN_ORDER;
> >   }
> >
> > which is a tiny bit less code.
> >
> > worth doing?
>
> Probably not, for example it would introduce branches. As we
> discussed, once we have sufficient optimizations, most of the branches
> could be saved.
>
> >
> > >
> > >  struct vring_desc_state_split {
> > >       void *data;                     /* Data for callback. */
> > > +     u32 total_len;                  /* Buffer Length */
> > >
> > >       /* Indirect desc table and extra table, if any. These two will be
> > >        * allocated together. So we won't stress more to the memory allocator.
> >
> >
> > this is only used for in_order, and it increases the struct size
> > by half due to padding. why not a separate struct?
> > Or if you like, reuse vring_desc_state_packed - it is same
> > size with this addition.
> >
> >
> > > @@ -84,6 +87,7 @@ struct vring_desc_state_split {
> > >
> > >  struct vring_desc_state_packed {
> > >       void *data;                     /* Data for callback. */
> > > +     u32 total_len;                  /* Buffer Length */
> > >
> > >       /* Indirect desc table and extra table, if any. These two will be
> > >        * allocated together. So we won't stress more to the memory allocator.
> >
> > there's empty space at the end of this struct.
> > struct vring_desc_state_packed {
> >         void *data;                     /* Data for callback. */
> >         u32 total_len;                  /* Buffer Length */
> >
> >         /* Indirect desc table and extra table, if any. These two will be
> >          * allocated together. So we won't stress more to the memory allocator.
> >          */
> >         struct vring_packed_desc *indir_desc;
> >         u16 num;                        /* Descriptor list length. */
> >         u16 last;                       /* The last desc state in a list. */
> > };
> >
> > why not put it there?
> >
>
> Fine.
>
> Thanks
>
>