lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <CH3PR12MB7548D2FCEE9A3FA1F4210BDEABDEA@CH3PR12MB7548.namprd12.prod.outlook.com>
Date: Wed, 26 Nov 2025 09:03:19 +0000
From: Shameer Kolothum <skolothumtho@...dia.com>
To: Ankit Agrawal <ankita@...dia.com>, "jgg@...pe.ca" <jgg@...pe.ca>, Yishai
 Hadas <yishaih@...dia.com>, "kevin.tian@...el.com" <kevin.tian@...el.com>,
	"alex@...zbot.org" <alex@...zbot.org>, Aniket Agashe <aniketa@...dia.com>,
	Vikram Sethi <vsethi@...dia.com>, Matt Ochs <mochs@...dia.com>
CC: "Yunxiang.Li@....com" <Yunxiang.Li@....com>, "yi.l.liu@...el.com"
	<yi.l.liu@...el.com>, "zhangdongdong@...incomputing.com"
	<zhangdongdong@...incomputing.com>, Avihai Horon <avihaih@...dia.com>,
	"bhelgaas@...gle.com" <bhelgaas@...gle.com>, "peterx@...hat.com"
	<peterx@...hat.com>, "pstanner@...hat.com" <pstanner@...hat.com>, Alistair
 Popple <apopple@...dia.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Neo Jia
	<cjia@...dia.com>, Kirti Wankhede <kwankhede@...dia.com>, "Tarun Gupta
 (SW-GPU)" <targupta@...dia.com>, Zhi Wang <zhiw@...dia.com>, Dan Williams
	<danw@...dia.com>, Dheeraj Nigam <dnigam@...dia.com>, Krishnakant Jaju
	<kjaju@...dia.com>
Subject: RE: [PATCH v7 6/6] vfio/nvgrace-gpu: wait for the GPU mem to be ready



> -----Original Message-----
> From: Ankit Agrawal <ankita@...dia.com>
> Sent: 26 November 2025 05:26
> To: Ankit Agrawal <ankita@...dia.com>; jgg@...pe.ca; Yishai Hadas
> <yishaih@...dia.com>; Shameer Kolothum <skolothumtho@...dia.com>;
> kevin.tian@...el.com; alex@...zbot.org; Aniket Agashe
> <aniketa@...dia.com>; Vikram Sethi <vsethi@...dia.com>; Matt Ochs
> <mochs@...dia.com>
> Cc: Yunxiang.Li@....com; yi.l.liu@...el.com;
> zhangdongdong@...incomputing.com; Avihai Horon <avihaih@...dia.com>;
> bhelgaas@...gle.com; peterx@...hat.com; pstanner@...hat.com; Alistair
> Popple <apopple@...dia.com>; kvm@...r.kernel.org; linux-
> kernel@...r.kernel.org; Neo Jia <cjia@...dia.com>; Kirti Wankhede
> <kwankhede@...dia.com>; Tarun Gupta (SW-GPU) <targupta@...dia.com>;
> Zhi Wang <zhiw@...dia.com>; Dan Williams <danw@...dia.com>; Dheeraj
> Nigam <dnigam@...dia.com>; Krishnakant Jaju <kjaju@...dia.com>
> Subject: [PATCH v7 6/6] vfio/nvgrace-gpu: wait for the GPU mem to be ready
> 
> From: Ankit Agrawal <ankita@...dia.com>
> 
> Speculative prefetches from CPU to GPU memory until the GPU is
> ready after reset can cause harmless corrected RAS events to
> be logged on Grace systems. It is thus preferred that the
> mapping not be re-established until the GPU is ready post reset.
> 
> The GPU readiness can be checked through BAR0 registers similar
> to the checking at the time of device probe.
> 
> It can take several seconds for the GPU to be ready. So it is
> desirable that the time overlaps as much of the VM startup as
> possible to reduce impact on the VM bootup time. The GPU
> readiness state is thus checked on the first fault/huge_fault
> request or read/write access which amortizes the GPU readiness
> time.
> 
> The first fault and read/write checks the GPU state when the
> reset_done flag - which denotes whether the GPU has just been
> reset. The memory_lock is taken across map/access to avoid
> races with GPU reset.
> 
> Cc: Shameer Kolothum <skolothumtho@...dia.com>
> Cc: Alex Williamson <alex@...zbot.org>
> Cc: Jason Gunthorpe <jgg@...pe.ca>
> Cc: Vikram Sethi <vsethi@...dia.com>
> Suggested-by: Alex Williamson <alex@...zbot.org>
> Signed-off-by: Ankit Agrawal <ankita@...dia.com>
> ---
>  drivers/vfio/pci/nvgrace-gpu/main.c | 81 +++++++++++++++++++++++++---
> -
>  1 file changed, 72 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-
> gpu/main.c
> index b46984e76be7..3064f8aca858 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -104,6 +104,17 @@ static int nvgrace_gpu_open_device(struct
> vfio_device *core_vdev)
>  		mutex_init(&nvdev->remap_lock);
>  	}
> 
> +	/*
> +	 * GPU readiness is checked by reading the BAR0 registers.
> +	 *
> +	 * ioremap BAR0 to ensure that the BAR0 mapping is present before
> +	 * register reads on first fault before establishing any GPU
> +	 * memory mapping.
> +	 */
> +	ret = vfio_pci_core_setup_barmap(vdev, 0);
> +	if (ret)
> +		return ret;

Should make sure vfio_pci_core_disable() is called on err path above.

With that,
Reviewed-by: Shameer Kolothum <skolothumtho@...dia.com>



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ