[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJaqyWc-vGUF_T6_okai40SK_xMYzUe7WHR60W2q6jW0BAjMVw@mail.gmail.com>
Date: Tue, 12 Aug 2025 08:03:28 +0200
From: Eugenio Perez Martin <eperezma@...hat.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Yongji Xie <xieyongji@...edance.com>, Cindy Lu <lulu@...hat.com>,
linux-kernel@...r.kernel.org, Stefano Garzarella <sgarzare@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>, Maxime Coquelin <mcoqueli@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>, virtualization@...ts.linux.dev,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Laurent Vivier <lvivier@...hat.com>
Subject: Re: [RFC 1/6] vduse: add v1 API definition
On Tue, Aug 12, 2025 at 4:55 AM Jason Wang <jasowang@...hat.com> wrote:
>
> On Mon, Aug 11, 2025 at 5:02 PM Eugenio Perez Martin
> <eperezma@...hat.com> wrote:
> >
> > On Mon, Aug 11, 2025 at 4:58 AM Jason Wang <jasowang@...hat.com> wrote:
> > >
> > > On Sun, Aug 10, 2025 at 6:18 PM Eugenio Perez Martin
> > > <eperezma@...hat.com> wrote:
> > > >
> > > > On Fri, Aug 8, 2025 at 2:50 AM Jason Wang <jasowang@...hat.com> wrote:
> > > > >
> > > > > On Thu, Aug 7, 2025 at 6:56 PM Eugenio Perez Martin <eperezma@...hat.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 10, 2025 at 10:36 AM Jason Wang <jasowang@...hat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 9, 2025 at 2:11 PM Eugenio Perez Martin <eperezma@...hat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 9, 2025 at 3:50 AM Jason Wang <jasowang@...hat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 9, 2025 at 9:41 AM Jason Wang <jasowang@...hat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 6, 2025 at 7:50 PM Eugenio Pérez <eperezma@...hat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > This allows to define all functions checking the API version set by the
> > > > > > > > > > > userland device.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Eugenio Pérez <eperezma@...hat.com>
> > > > > > > > > >
> > > > > > > > > > It might be worth clarifying how it works.
> > > > > > > > > >
> > > > > > > > > > For example,
> > > > > > > > > >
> > > > > > > > > > 1) would VDUSE behave differently or if it's just some new ioctls
> > > > > > > >
> > > > > > > > I'd like to test more in-depth, but a device can just bump the version
> > > > > > > > ID and then implement the replies to the vduse messages. No need to
> > > > > > > > implement new ioctls. If the VDUSE device sets 0 in either number of
> > > > > > > > ASID or vq groups, the kernel assumes 1.
> > > > > > >
> > > > > > > Right, this is the way we use now and I think maybe we can document
> > > > > > > this somewhere.
> > > > > > >
> > > > > > > >
> > > > > > > > But you have a very good point here, I think it is wise to evaluate
> > > > > > > > the shortcut of these messages in the VDUSE kernel module. If a VDUSE
> > > > > > > > device only has one vq group and one ASID, it can always return group
> > > > > > > > 0 and asid 0 for everything, and fail every try to ser asid != 0.
> > > > > > >
> > > > > > > Yes, and vhost-vDPA needs to guard against the misconfiguration.
> > > > > > >
> > > > > > > > This
> > > > > > > > way, the update is transparent for the VDUSE device, and future
> > > > > > > > devices do not need to implement the reply of these. What do you
> > > > > > > > think?
> > > > > > >
> > > > > > > This should work.
> > > > > > >
> > > > > > > >
> > > > > > > > > > 2) If VDUSE behave differently, do we need a ioctl to set the API
> > > > > > > > > > version for backward compatibility?
> > > > > > > > >
> > > > > > > > > Speak too fast, there's a VDUSE_SET_API_VERSION actually.
> > > > > > > > >
> > > > > > > > > I think we need to think if it complicates the migration compatibility or not.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Do you mean migration as "increase the VDUSE version number", not "VM
> > > > > > > > live migration from vduse version 0 to vduse version 1", isn't it? The
> > > > > > > > second should not have any problem but I haven't tested it.
> > > > > > >
> > > > > > > I mean if we bump the version, we can't migrate from version 1 to
> > > > > > > version 0. Or we can offload this to the management (do we need to
> > > > > > > extend the vdpa tool for this)?
> > > > > > >
> > > > > >
> > > > > > I just noticed I left this unreplied. But I still do not get what
> > > > > > migrate means here :).
> > > > > >
> > > > > > If migrate means to run current VDUSE devices on kernel with this
> > > > > > series applied these devices don't set V1 API so they have one vq
> > > > > > group, and one asid. I'm actually testing this with my libfuse+VDUSE
> > > > > > modifications that don't use V1 at all. Adding this explanation to the
> > > > > > patch as it is a very good point indeed.
> > > > >
> > > > > Right.
> > > > >
> > > > > >
> > > > > > If it means to migrate a guest from using a V1 VDUSE device to a V0
> > > > > > device "it should work", as it is just a backend implementation
> > > > > > detail.
> > > > >
> > > > > For example src is the VDUSE with multiqueue support (v1) but dest
> > > > > doesn't have this support (v0). I think the qemu should fail to launch
> > > > > in dest.
> > > > >
> > > > > > If we migrate from or to a vdpa device backed by hardware, for
> > > > > > example, one of the devices does not even have the concept of VDUSE
> > > > > > API version.
> > > > > >
> > > > > > In the case of net, it does not work at the moment because the only
> > > > > > way to set features like mq are through the shadow CVQ.
> > > > >
> > > > > I think you mean qemu should fail, I'm not sure this is friendly to libvirt.
> > > > >
> > > >
> > > > No, I think QEMU should not transmit vdpa backend properties not
> > > > visible to the guest, so we don't get an explosion of properties that
> > > > are hard to get. Expanding on this, QEMU is not even able to know if
> > > > it is talking with VDUSE, vdpa_sim, or a vdpa device backed by
> > > > hardware. And I think we don't want QEMU to have this information. So
> > > > sending the VDUSE version implies removing a lot of useful
> > > > abstractions.
> > > >
> > > > In the case of net, the destination QEMU should fail if it is not able
> > > > to restore the device state. At this moment this implies to have at
> > > > least two ASID if the net device has CVQ, and that CVQ is in its own
> > > > ASID, but that may not be true in the future.
> > > >
> > > > But QEMU does not check if that is the case migrating between two
> > > > vdpa_net_sim if one supports ASID but the other doesn't.
> > >
> > > Ok I think I must miss something. I need some context here. For
> > > example, Is the shadow cvq option used by libvirt or not? (Or it has
> > > been enabled by default if Qemu detect cvq has its own group?)
> > >
> > > If shadow cvq is neither used by libvirt nor automatically enabled, we
> > > need to make it work for libvirt first.
> > >
> > > If it relies on libvirt to enable it explicitly, is libvirt expected
> > > to detect the vDPA ability (I guess it is not what libvirt wants)?
> > > If shadow cvq is enabled automatically, migrating from V2 to V1 is
> > > fine but not the reverse.
> > >
> >
> > QEMU uses shadow CVQ automatically if all the conditions (ASID, proper
> > set of features, etc) are supported.
> >
>
> Ok, so V1 implies non-migratable. So I'm still confused about how to
> deal with migration compatibility.
>
> For example, ping-pong migration between V1 and V2.
>
If the device does not have CVQ, everything works.
If the device has CVQ, it is not possible at the moment. Everything
breaks the same way as if the destination vhost_vdpa does not have the
ASID capability. For example, if the destination device is an intel,
pensando, or solidrun vdpa.
Powered by blists - more mailing lists