lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250406113926-mutt-send-email-mst@kernel.org>
Date: Sun, 6 Apr 2025 11:40:33 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: David Hildenbrand <david@...hat.com>
Cc: Halil Pasic <pasic@...ux.ibm.com>, linux-kernel@...r.kernel.org,
	linux-s390@...r.kernel.org, virtualization@...ts.linux.dev,
	kvm@...r.kernel.org, Chandra Merla <cmerla@...hat.com>,
	Stable@...r.kernel.org, Cornelia Huck <cohuck@...hat.com>,
	Thomas Huth <thuth@...hat.com>, Eric Farman <farman@...ux.ibm.com>,
	Heiko Carstens <hca@...ux.ibm.com>,
	Vasily Gorbik <gor@...ux.ibm.com>,
	Alexander Gordeev <agordeev@...ux.ibm.com>,
	Christian Borntraeger <borntraeger@...ux.ibm.com>,
	Sven Schnelle <svens@...ux.ibm.com>,
	Wei Wang <wei.w.wang@...el.com>
Subject: Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for
 non-existing queues

On Fri, Apr 04, 2025 at 12:55:09PM +0200, David Hildenbrand wrote:
> On 04.04.25 12:00, David Hildenbrand wrote:
> > On 04.04.25 06:36, Halil Pasic wrote:
> > > On Thu, 3 Apr 2025 16:28:31 +0200
> > > David Hildenbrand <david@...hat.com> wrote:
> > > 
> > > > > Sorry I have to have a look at that discussion. Maybe it will answer
> > > > > some my questions.
> > > > 
> > > > Yes, I think so.
> > > > 
> > > > > > Let's fix it without affecting existing setups for now by properly
> > > > > > ignoring the non-existing queues, so the indicator bits will match
> > > > > > the queue indexes.
> > > > > 
> > > > > Just one question. My understanding is that the crux is that Linux
> > > > > and QEMU (or the driver and the device) disagree at which index
> > > > > reporting_vq is actually sitting. Is that right?
> > > > 
> > > > I thought I made it clear: this is only about the airq indicator bit.
> > > > That's where both disagree.
> > > > 
> > > > Not the actual queue index (see above).
> > > 
> > > I did some more research including having a look at that discussion. Let
> > > me try to sum up how did we end up here.
> > 
> > Let me add some more details after digging as well:
> > 
> > > 
> > > Before commit a229989d975e ("virtio: don't allocate vqs when names[i] =
> > > NULL") the kernel behavior used to be in spec, but QEMU and possibly
> > > other hypervisor were out of spec and things did not work.
> > 
> > It all started with VIRTIO_BALLOON_F_FREE_PAGE_HINT. Before that,
> > we only had the single optional VIRTIO_BALLOON_F_STATS_VQ queue at the very
> > end. So there was no possibility for holes "in-between".
> > 
> > In the Linux driver, we created the stats queue only if the feature bit
> > VIRTIO_BALLOON_F_STATS_VQ was actually around:
> > 
> > 	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > 	err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
> > 
> > That changed with VIRTIO_BALLOON_F_FREE_PAGE_HINT, because we would
> > unconditionally create 4 queues. QEMU always supported the first 3 queues
> > unconditionally, but old QEMU did obviously not support the (new)
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT queue.
> > 
> > 390x didn't particularly like getting queried for non-existing
> > queues. [1] So the fix was not for a hypervisor that was out of spec, but
> > because quering non-existing queues didn't work.
> > 
> > The fix implied that if VIRTIO_BALLOON_F_STATS_VQ is missing, suddenly the queue
> > index of VIRTIO_BALLOON_F_FREE_PAGE_HINT changed as well.
> > 
> > Again, as QEMU always implemented the 3 first queues unconditionally, this was
> > not a problem.
> > 
> > [1] https://lore.kernel.org/all/c6746307-fae5-7652-af8d-19f560fc31d9@de.ibm.com/#t
> > 
> > > 
> > > Possibly because of the complexity of fixing the hypervisor(s) commit
> > > a229989d975e ("virtio: don't allocate vqs when names[i] = NULL") opted
> > > for changing the guest side so that it does not fit the spec but fits
> > > the hypervisor(s). It unfortunately also broke notifiers (for the with
> > > holes) scenario for virtio-ccw only.
> > 
> > Yes, it broke the notifiers.
> > 
> > But note that everything was in spec at that point, because we only documented
> > "free_page_vq == 3" in the spec *2 years later*, in 2020:
> > 
> > commit 38448268eba0c105200d131c3f7f660129a4d673
> > Author: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
> > Date:   Tue Aug 25 07:45:02 2020 -0700
> > 
> >       content: Document balloon feature free page hints
> >       Free page hints allow the balloon driver to provide information on what
> >       pages are not currently in use so that we can avoid the cost of copying
> >       them in migration scenarios. Add a feature description for free page hints
> >       describing basic functioning and requirements.
> > At that point, what we documented in the spec *did not match reality* in
> > Linux. QEMU was fully compatible, because VIRTIO_BALLOON_F_STATS_VQ is
> > unconditionally set.
> > 
> > 
> > QEMU and Linux kept using that queue index assignment model, and the spec
> > was wrong (out of sync?) at that point. The spec got more wrong with
> > 
> > commit d917d4a8d552c003e046b0e3b1b529d98f7e695b
> > Author: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
> > Date:   Tue Aug 25 07:45:17 2020 -0700
> > 
> >       content: Document balloon feature free page reporting
> >       Free page reporting is a feature that allows the guest to proactively
> >       report unused pages to the host. By making use of this feature is is
> >       possible to reduce the overall memory footprint of the guest in cases where
> >       some significant portion of the memory is idle. Add documentation for the
> >       free page reporting feature describing the functionality and requirements.
> > 
> > Where we documented VIRTIO_BALLOON_F_REPORTING after the changes were added to
> > QEMU+Linux implementation, so the spec did not reflect reality.
> > 
> > I'll note also cloud-hypervisor [2] today follows that model.
> > 
> > In particular, it *only* supports VIRTIO_BALLOON_F_REPORTING, turning
> > the queue index of VIRTIO_BALLOON_F_REPORTING into *2* instead of documented
> > in the spec to be *4*.
> > 
> > So in reality, we can see VIRTIO_BALLOON_F_REPORTING to be either 2/3/4, depending
> > on the availability of the other two features/queues.
> > 
> > [2] https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/virtio-devices/src/balloon.rs
> > 
> > 
> > > 
> > > Now we had another look at this, and have concluded that fixing the
> > > hypervisor(s) and fixing the kernel, and making sure that the fixed
> > > kernel can tolerate the old broken hypervisor(s) is way to complicated
> > > if possible at all. So we decided to give the spec a reality check and
> > > fix the notifier bit assignment for virtio-ccw which is broken beyond
> > > doubt if we accept that the correct virtqueue index is the one that the
> > > hypervisor(s) use and not the one that the spec says they should use.
> > 
> > In case of virtio-balloon, it's unfortunate that it went that way, but the
> > spec simply did not / does not reflect reality when it was added to the spec.
> > 
> > > 
> > > With the spec fixed, the whole notion of "holes" will be something that
> > > does not make sense any more. With that the merit of the kernel interface
> > > virtio_find_vqs() supporting "holes" is quite questionable. Now we need
> > > it because the drivers within the Linux kernel still think of the queues
> > > in terms of the current spec, i.e. they try to have the "holes" as
> > > mandated by the spec, and the duty of making it work with the broken
> > > device implementations falls to the transports.
> > > 
> > 
> > Right, the "holes" only exist in the input array.
> > 
> > > Under the assumption that the spec is indeed going to be fixed:
> 
> For virito-balloon, we should probably do the following:
> 
> From 38e340c2bb53c2a7cc7c675f5dfdd44ecf7701d9 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@...hat.com>
> Date: Fri, 4 Apr 2025 12:53:16 +0200
> Subject: [PATCH] virtio-balloon: Fix queue index assignment for
>  non-existing queues
> 
> Signed-off-by: David Hildenbrand <david@...hat.com>
> ---
>  device-types/balloon/description.tex | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/device-types/balloon/description.tex b/device-types/balloon/description.tex
> index a1d9603..a7396ff 100644
> --- a/device-types/balloon/description.tex
> +++ b/device-types/balloon/description.tex
> @@ -16,6 +16,21 @@ \subsection{Device ID}\label{sec:Device Types / Memory Balloon Device / Device I
>    5
>  \subsection{Virtqueues}\label{sec:Device Types / Memory Balloon Device / Virtqueues}
> +
> +\begin{description}
> +\item[inflateq] Exists unconditionally.
> +\item[deflateq] Exists unconditionally.
> +\item[statsq] Only exists if VIRTIO_BALLOON_F_STATS_VQ is set.
> +\item[free_page_vq] Only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set.
> +\item[reporting_vq] Only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set.
> +\end{description}
> +
> +\begin{note}
> +Virtqueue indexes are assigned sequentially for existing queues, starting
> +with index 0; consequently, if a virtqueue does not exist, it does not get
> +an index assigned. Assuming all virtqueues exist for a device, the indexes
> +are:
> +
>  \begin{description}
>  \item[0] inflateq
>  \item[1] deflateq
> @@ -23,12 +38,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Memory Balloon Device / Virtque
>  \item[3] free_page_vq
>  \item[4] reporting_vq
>  \end{description}
> -
> -  statsq only exists if VIRTIO_BALLOON_F_STATS_VQ is set.
> -
> -  free_page_vq only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set.
> -
> -  reporting_vq only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set.
> +\end{note}
>  \subsection{Feature bits}\label{sec:Device Types / Memory Balloon Device / Feature bits}
>  \begin{description}
> -- 
> 2.48.1
> 
> 
> If something along these lines sounds reasonable, I can send a proper patch to the
> proper audience.


Indeed, but do we want to add a note about previous spec versions
saying something different? Maybe, with a hint how devices following
old spec can be detected?



> -- 
> Cheers,
> 
> David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ