[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250406113926-mutt-send-email-mst@kernel.org>
Date: Sun, 6 Apr 2025 11:40:33 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: David Hildenbrand <david@...hat.com>
Cc: Halil Pasic <pasic@...ux.ibm.com>, linux-kernel@...r.kernel.org,
linux-s390@...r.kernel.org, virtualization@...ts.linux.dev,
kvm@...r.kernel.org, Chandra Merla <cmerla@...hat.com>,
Stable@...r.kernel.org, Cornelia Huck <cohuck@...hat.com>,
Thomas Huth <thuth@...hat.com>, Eric Farman <farman@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>,
Wei Wang <wei.w.wang@...el.com>
Subject: Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for
non-existing queues
On Fri, Apr 04, 2025 at 12:55:09PM +0200, David Hildenbrand wrote:
> On 04.04.25 12:00, David Hildenbrand wrote:
> > On 04.04.25 06:36, Halil Pasic wrote:
> > > On Thu, 3 Apr 2025 16:28:31 +0200
> > > David Hildenbrand <david@...hat.com> wrote:
> > >
> > > > > Sorry I have to have a look at that discussion. Maybe it will answer
> > > > > some my questions.
> > > >
> > > > Yes, I think so.
> > > >
> > > > > > Let's fix it without affecting existing setups for now by properly
> > > > > > ignoring the non-existing queues, so the indicator bits will match
> > > > > > the queue indexes.
> > > > >
> > > > > Just one question. My understanding is that the crux is that Linux
> > > > > and QEMU (or the driver and the device) disagree at which index
> > > > > reporting_vq is actually sitting. Is that right?
> > > >
> > > > I thought I made it clear: this is only about the airq indicator bit.
> > > > That's where both disagree.
> > > >
> > > > Not the actual queue index (see above).
> > >
> > > I did some more research including having a look at that discussion. Let
> > > me try to sum up how did we end up here.
> >
> > Let me add some more details after digging as well:
> >
> > >
> > > Before commit a229989d975e ("virtio: don't allocate vqs when names[i] =
> > > NULL") the kernel behavior used to be in spec, but QEMU and possibly
> > > other hypervisor were out of spec and things did not work.
> >
> > It all started with VIRTIO_BALLOON_F_FREE_PAGE_HINT. Before that,
> > we only had the single optional VIRTIO_BALLOON_F_STATS_VQ queue at the very
> > end. So there was no possibility for holes "in-between".
> >
> > In the Linux driver, we created the stats queue only if the feature bit
> > VIRTIO_BALLOON_F_STATS_VQ was actually around:
> >
> > nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
> >
> > That changed with VIRTIO_BALLOON_F_FREE_PAGE_HINT, because we would
> > unconditionally create 4 queues. QEMU always supported the first 3 queues
> > unconditionally, but old QEMU did obviously not support the (new)
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT queue.
> >
> > 390x didn't particularly like getting queried for non-existing
> > queues. [1] So the fix was not for a hypervisor that was out of spec, but
> > because quering non-existing queues didn't work.
> >
> > The fix implied that if VIRTIO_BALLOON_F_STATS_VQ is missing, suddenly the queue
> > index of VIRTIO_BALLOON_F_FREE_PAGE_HINT changed as well.
> >
> > Again, as QEMU always implemented the 3 first queues unconditionally, this was
> > not a problem.
> >
> > [1] https://lore.kernel.org/all/c6746307-fae5-7652-af8d-19f560fc31d9@de.ibm.com/#t
> >
> > >
> > > Possibly because of the complexity of fixing the hypervisor(s) commit
> > > a229989d975e ("virtio: don't allocate vqs when names[i] = NULL") opted
> > > for changing the guest side so that it does not fit the spec but fits
> > > the hypervisor(s). It unfortunately also broke notifiers (for the with
> > > holes) scenario for virtio-ccw only.
> >
> > Yes, it broke the notifiers.
> >
> > But note that everything was in spec at that point, because we only documented
> > "free_page_vq == 3" in the spec *2 years later*, in 2020:
> >
> > commit 38448268eba0c105200d131c3f7f660129a4d673
> > Author: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
> > Date: Tue Aug 25 07:45:02 2020 -0700
> >
> > content: Document balloon feature free page hints
> > Free page hints allow the balloon driver to provide information on what
> > pages are not currently in use so that we can avoid the cost of copying
> > them in migration scenarios. Add a feature description for free page hints
> > describing basic functioning and requirements.
> > At that point, what we documented in the spec *did not match reality* in
> > Linux. QEMU was fully compatible, because VIRTIO_BALLOON_F_STATS_VQ is
> > unconditionally set.
> >
> >
> > QEMU and Linux kept using that queue index assignment model, and the spec
> > was wrong (out of sync?) at that point. The spec got more wrong with
> >
> > commit d917d4a8d552c003e046b0e3b1b529d98f7e695b
> > Author: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
> > Date: Tue Aug 25 07:45:17 2020 -0700
> >
> > content: Document balloon feature free page reporting
> > Free page reporting is a feature that allows the guest to proactively
> > report unused pages to the host. By making use of this feature is is
> > possible to reduce the overall memory footprint of the guest in cases where
> > some significant portion of the memory is idle. Add documentation for the
> > free page reporting feature describing the functionality and requirements.
> >
> > Where we documented VIRTIO_BALLOON_F_REPORTING after the changes were added to
> > QEMU+Linux implementation, so the spec did not reflect reality.
> >
> > I'll note also cloud-hypervisor [2] today follows that model.
> >
> > In particular, it *only* supports VIRTIO_BALLOON_F_REPORTING, turning
> > the queue index of VIRTIO_BALLOON_F_REPORTING into *2* instead of documented
> > in the spec to be *4*.
> >
> > So in reality, we can see VIRTIO_BALLOON_F_REPORTING to be either 2/3/4, depending
> > on the availability of the other two features/queues.
> >
> > [2] https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/virtio-devices/src/balloon.rs
> >
> >
> > >
> > > Now we had another look at this, and have concluded that fixing the
> > > hypervisor(s) and fixing the kernel, and making sure that the fixed
> > > kernel can tolerate the old broken hypervisor(s) is way to complicated
> > > if possible at all. So we decided to give the spec a reality check and
> > > fix the notifier bit assignment for virtio-ccw which is broken beyond
> > > doubt if we accept that the correct virtqueue index is the one that the
> > > hypervisor(s) use and not the one that the spec says they should use.
> >
> > In case of virtio-balloon, it's unfortunate that it went that way, but the
> > spec simply did not / does not reflect reality when it was added to the spec.
> >
> > >
> > > With the spec fixed, the whole notion of "holes" will be something that
> > > does not make sense any more. With that the merit of the kernel interface
> > > virtio_find_vqs() supporting "holes" is quite questionable. Now we need
> > > it because the drivers within the Linux kernel still think of the queues
> > > in terms of the current spec, i.e. they try to have the "holes" as
> > > mandated by the spec, and the duty of making it work with the broken
> > > device implementations falls to the transports.
> > >
> >
> > Right, the "holes" only exist in the input array.
> >
> > > Under the assumption that the spec is indeed going to be fixed:
>
> For virito-balloon, we should probably do the following:
>
> From 38e340c2bb53c2a7cc7c675f5dfdd44ecf7701d9 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@...hat.com>
> Date: Fri, 4 Apr 2025 12:53:16 +0200
> Subject: [PATCH] virtio-balloon: Fix queue index assignment for
> non-existing queues
>
> Signed-off-by: David Hildenbrand <david@...hat.com>
> ---
> device-types/balloon/description.tex | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/device-types/balloon/description.tex b/device-types/balloon/description.tex
> index a1d9603..a7396ff 100644
> --- a/device-types/balloon/description.tex
> +++ b/device-types/balloon/description.tex
> @@ -16,6 +16,21 @@ \subsection{Device ID}\label{sec:Device Types / Memory Balloon Device / Device I
> 5
> \subsection{Virtqueues}\label{sec:Device Types / Memory Balloon Device / Virtqueues}
> +
> +\begin{description}
> +\item[inflateq] Exists unconditionally.
> +\item[deflateq] Exists unconditionally.
> +\item[statsq] Only exists if VIRTIO_BALLOON_F_STATS_VQ is set.
> +\item[free_page_vq] Only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set.
> +\item[reporting_vq] Only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set.
> +\end{description}
> +
> +\begin{note}
> +Virtqueue indexes are assigned sequentially for existing queues, starting
> +with index 0; consequently, if a virtqueue does not exist, it does not get
> +an index assigned. Assuming all virtqueues exist for a device, the indexes
> +are:
> +
> \begin{description}
> \item[0] inflateq
> \item[1] deflateq
> @@ -23,12 +38,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Memory Balloon Device / Virtque
> \item[3] free_page_vq
> \item[4] reporting_vq
> \end{description}
> -
> - statsq only exists if VIRTIO_BALLOON_F_STATS_VQ is set.
> -
> - free_page_vq only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set.
> -
> - reporting_vq only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set.
> +\end{note}
> \subsection{Feature bits}\label{sec:Device Types / Memory Balloon Device / Feature bits}
> \begin{description}
> --
> 2.48.1
>
>
> If something along these lines sounds reasonable, I can send a proper patch to the
> proper audience.
Indeed, but do we want to add a note about previous spec versions
saying something different? Maybe, with a hint how devices following
old spec can be detected?
> --
> Cheers,
>
> David / dhildenb
Powered by blists - more mailing lists