lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB5276543D7363AE07C309008D8C9FA@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Fri, 30 Jan 2026 03:10:26 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Leon Romanovsky <leon@...nel.org>
CC: Jason Gunthorpe <jgg@...pe.ca>, Pranjal Shrivastava <praan@...gle.com>,
	Sumit Semwal <sumit.semwal@...aro.org>, Christian König
	<christian.koenig@....com>, Alex Deucher <alexander.deucher@....com>, "David
 Airlie" <airlied@...il.com>, Simona Vetter <simona@...ll.ch>, Gerd Hoffmann
	<kraxel@...hat.com>, Dmitry Osipenko <dmitry.osipenko@...labora.com>,
	Gurchetan Singh <gurchetansingh@...omium.org>, Chia-I Wu <olvaffe@...il.com>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, Maxime Ripard
	<mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>, "Lucas De
 Marchi" <lucas.demarchi@...el.com>, Thomas Hellström
	<thomas.hellstrom@...ux.intel.com>, "Vivi, Rodrigo" <rodrigo.vivi@...el.com>,
	Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>, Robin Murphy
	<robin.murphy@....com>, Felix Kuehling <Felix.Kuehling@....com>, "Alex
 Williamson" <alex@...zbot.org>, Ankit Agrawal <ankita@...dia.com>,
	"Kasireddy, Vivek" <vivek.kasireddy@...el.com>, "linux-media@...r.kernel.org"
	<linux-media@...r.kernel.org>, "dri-devel@...ts.freedesktop.org"
	<dri-devel@...ts.freedesktop.org>, "linaro-mm-sig@...ts.linaro.org"
	<linaro-mm-sig@...ts.linaro.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "amd-gfx@...ts.freedesktop.org"
	<amd-gfx@...ts.freedesktop.org>, "virtualization@...ts.linux.dev"
	<virtualization@...ts.linux.dev>, "intel-xe@...ts.freedesktop.org"
	<intel-xe@...ts.freedesktop.org>, "linux-rdma@...r.kernel.org"
	<linux-rdma@...r.kernel.org>, "iommu@...ts.linux.dev"
	<iommu@...ts.linux.dev>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: RE: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete

> From: Leon Romanovsky <leon@...nel.org>
> Sent: Thursday, January 29, 2026 4:42 PM
> 
> On Thu, Jan 29, 2026 at 08:13:18AM +0000, Tian, Kevin wrote:
> > > From: Leon Romanovsky <leon@...nel.org>
> > > Sent: Thursday, January 29, 2026 3:34 PM
> > >
> > > On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg@...pe.ca>
> > > > > Sent: Wednesday, January 28, 2026 12:28 AM
> > > > >
> > > > > On Tue, Jan 27, 2026 at 10:58:35AM +0200, Leon Romanovsky wrote:
> > > > > > > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct
> > > > > vfio_pci_core_device *vdev, bool revoked)
> > > > > > > >  			dma_resv_lock(priv->dmabuf->resv, NULL);
> > > > > > > >  			priv->revoked = revoked;
> > > > > > > >  			dma_buf_invalidate_mappings(priv-
> > > >dmabuf);
> > > > > > > > +			dma_resv_wait_timeout(priv->dmabuf->resv,
> > > > > > > > +
> > > DMA_RESV_USAGE_BOOKKEEP,
> > > > > false,
> > > > > > > > +
> > > MAX_SCHEDULE_TIMEOUT);
> > > > > > > >  			dma_resv_unlock(priv->dmabuf->resv);
> > > > > > > > +			if (revoked) {
> > > > > > > > +				kref_put(&priv->kref,
> > > > > vfio_pci_dma_buf_done);
> > > > > > > > +				/* Let's wait till all DMA unmap are
> > > > > completed. */
> > > > > > > > +				wait = wait_for_completion_timeout(
> > > > > > > > +					&priv->comp,
> > > secs_to_jiffies(1));
> > > > > > >
> > > > > > > Is the 1-second constant sufficient for all hardware, or should the
> > > > > > > invalidate_mappings() contract require the callback to block until
> > > > > > > speculative reads are strictly fenced? I'm wondering about a case
> > > where
> > > > > > > a device's firmware has a high response latency, perhaps due to
> > > internal
> > > > > > > management tasks like error recovery or thermal and it exceeds
> the
> > > 1s
> > > > > > > timeout.
> > > > > > >
> > > > > > > If the device is in the middle of a large DMA burst and the
> firmware is
> > > > > > > slow to flush the internal pipelines to a fully "quiesced"
> > > > > > > read-and-discard state, reclaiming the memory at exactly 1.001
> > > seconds
> > > > > > > risks triggering platform-level faults..
> > > > > > >
> > > > > > > Since the wen explicitly permit these speculative reads until
> unmap is
> > > > > > > complete, relying on a hardcoded timeout in the exporter seems
> to
> > > > > > > introduce a hardware-dependent race condition that could
> > > compromise
> > > > > > > system stability via IOMMU errors or AER faults.
> > > > > > >
> > > > > > > Should the importer instead be required to guarantee that all
> > > > > > > speculative access has ceased before the invalidation call returns?
> > > > > >
> > > > > > It is guaranteed by the dma_resv_wait_timeout() call above. That
> call
> > > > > ensures
> > > > > > that the hardware has completed all pending operations. The
> 1‑second
> > > > > delay is
> > > > > > meant to catch cases where an in-kernel DMA unmap call is missing,
> > > which
> > > > > should
> > > > > > not trigger any DMA activity at that point.
> > > > >
> > > > > Christian may know actual examples, but my general feeling is he was
> > > > > worrying about drivers that have pushed the DMABUF to visibility on
> > > > > the GPU and the move notify & fences only shoot down some access.
> So
> > > > > it has to wait until the DMABUF is finally unmapped.
> > > > >
> > > > > Pranjal's example should be covered by the driver adding a fence and
> > > > > then the unbounded fence wait will complete it.
> > > > >
> > > >
> > > > Bear me if it's an ignorant question.
> > > >
> > > > The commit msg of patch6 says that VFIO doesn't tolerate unbounded
> > > > wait, which is the reason behind the 2nd timeout wait here.
> > >
> > > It is not accurate. A second timeout is present both in the
> > > description of patch 6 and in VFIO implementation. The difference is
> > > that the timeout is enforced within VFIO.
> > >
> > > >
> > > > Then why is "the unbounded fence wait" not a problem in the same
> > > > code path? the use of MAX_SCHEDULE_TIMEOUT imply a worst-case
> > > > timeout in hundreds of years...
> > >
> > > "An unbounded fence wait" is a different class of wait. It indicates broken
> > > hardware that continues to issue DMA transactions even after it has been
> > > told to
> > > stop.
> > >
> > > The second wait exists to catch software bugs or misuse, where the dma-
> buf
> > > importer has misrepresented its capabilities.
> > >
> >
> > Okay I see.
> >
> > > >
> > > > and it'd be helpful to put some words in the code based on what's
> > > > discussed here.
> > >
> > > We've documented as much as we can in dma_buf_attach_revocable()
> and
> > > dma_buf_invalidate_mappings(). Do you have any suggestions on what
> else
> > > should be added here?
> > >
> >
> > the selection of 1s?
> 
> It is indirectly written in description of WARN_ON(), but let's add
> more. What about the following?
> 
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c
> b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index 93795ad2e025..948ba75288c6 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -357,7 +357,13 @@ void vfio_pci_dma_buf_move(struct
> vfio_pci_core_device *vdev, bool revoked)
>                         dma_resv_unlock(priv->dmabuf->resv);
>                         if (revoked) {
>                                 kref_put(&priv->kref, vfio_pci_dma_buf_done);
> -                               /* Let's wait till all DMA unmap are completed. */
> +                               /*
> +                                * Let's wait for 1 second till all DMA unmap
> +                                * are completed. It is supposed to catch dma-buf
> +                                * importers which lied about their support
> +                                * of dmabuf revoke. See dma_buf_invalidate_mappings()
> +                                * for the expected behaviour,
> +                                */
>                                 wait = wait_for_completion_timeout(
>                                         &priv->comp, secs_to_jiffies(1));
>                                 /*
> 

looks good. Just replace the trailing "," with "." 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ