linux-kernel - Re: [PATCH v8 6/6] vfio/nvgrace-gpu: wait for the GPU mem to be ready

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID:
 <SA1PR12MB719964BD04F7DDA29D570A8AB0DFA@SA1PR12MB7199.namprd12.prod.outlook.com>
Date: Thu, 27 Nov 2025 02:39:12 +0000
From: Ankit Agrawal <ankita@...dia.com>
To: Alex Williamson <alex@...zbot.org>
CC: "jgg@...pe.ca" <jgg@...pe.ca>, Yishai Hadas <yishaih@...dia.com>, Shameer
 Kolothum <skolothumtho@...dia.com>, "kevin.tian@...el.com"
	<kevin.tian@...el.com>, Aniket Agashe <aniketa@...dia.com>, Vikram Sethi
	<vsethi@...dia.com>, Matt Ochs <mochs@...dia.com>, "Yunxiang.Li@....com"
	<Yunxiang.Li@....com>, "yi.l.liu@...el.com" <yi.l.liu@...el.com>,
	"zhangdongdong@...incomputing.com" <zhangdongdong@...incomputing.com>, Avihai
 Horon <avihaih@...dia.com>, "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
	"peterx@...hat.com" <peterx@...hat.com>, "pstanner@...hat.com"
	<pstanner@...hat.com>, Alistair Popple <apopple@...dia.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Neo Jia <cjia@...dia.com>, Kirti Wankhede
	<kwankhede@...dia.com>, "Tarun Gupta (SW-GPU)" <targupta@...dia.com>, Zhi
 Wang <zhiw@...dia.com>, Dan Williams <danw@...dia.com>, Dheeraj Nigam
	<dnigam@...dia.com>, Krishnakant Jaju <kjaju@...dia.com>
Subject: Re: [PATCH v8 6/6] vfio/nvgrace-gpu: wait for the GPU mem to be ready

>> +/*
>> + * If the GPU memory is accessed by the CPU while the GPU is not ready
>> + * after reset, it can cause harmless corrected RAS events to be logged.
>> + * Make sure the GPU is ready before establishing the mappings.
>> + */
>> +static int
>> +nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev)
>> +{
>> +     struct vfio_pci_core_device *vdev = &nvdev->core_device;
>> +     int ret;
>> +
>> +     lockdep_assert_held_read(&vdev->memory_lock);
>> +
>> +     if (!nvdev->reset_done)
>> +             return 0;
>> +
>> +     ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]);
>> +     if (ret)
>> +             return ret;
>> +
>> +     nvdev->reset_done = false;
>> +
>> +     return 0;
>> +}
>
> It seems like we can call wait_device_ready here, generating ioread
> accesses to BAR0, without knowing the memory-enable state of the device
> in the command register.  Is there anything special about this device
> relative to BAR0 accesses regardless of the memory-enable bit that
> allows us to ignore that?

Yes, it is independent of the memory-enable bit. 

> If not, do we need to test before wait_device_ready, such as:
>
>         if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev))
>                 return -EIO;

No, it isn't actually required.

Other than that, Alex would you be able to apply this to the next branch?
If yes, would you be able to remove that check from the common code?
Otherwise, I can send the update to the first patch to move it from the
common helper and put to the vfio_pci_mmap_huge_fault.