linux-kernel - Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function reset for zPCI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <576abfa6-effb-48b7-b15a-c964fad6ddea@linux.ibm.com>
Date: Thu, 14 Aug 2025 15:33:19 -0700
From: Farhan Ali <alifm@...ux.ibm.com>
To: Alex Williamson <alex.williamson@...hat.com>
Cc: Niklas Schnelle <schnelle@...ux.ibm.com>,
        Bjorn Helgaas <helgaas@...nel.org>, linux-s390@...r.kernel.org,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        mjrosato@...ux.ibm.com
Subject: Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function
 reset for zPCI


On 8/14/2025 1:57 PM, Alex Williamson wrote:
> On Thu, 14 Aug 2025 09:33:47 -0700
> Farhan Ali <alifm@...ux.ibm.com> wrote:
>
>> On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
>>> On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
>>>> On Wed, 13 Aug 2025 14:52:24 -0700
>>>> Farhan Ali <alifm@...ux.ibm.com> wrote:
>>>>   
>>>>> On 8/13/2025 1:30 PM, Alex Williamson wrote:
>>>>>> On Wed, 13 Aug 2025 10:08:19 -0700
>>>>>> Farhan Ali <alifm@...ux.ibm.com> wrote:
>>>>>>      
>>>>>>> For zPCI devices we should drive a platform specific function reset
>>>>>>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
>>>>>>> in error state.
>>>>>>>
>>>>>>> Signed-off-by: Farhan Ali <alifm@...ux.ibm.com>
>>>>>>> ---
>>>>>>>     arch/s390/pci/pci.c              |  1 +
>>>>>>>     drivers/vfio/pci/vfio_pci_core.c |  4 ++++
>>>>>>>     drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
>>>>>>>     drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>>>>>>>     4 files changed, 49 insertions(+)
>>> --- snip ---
>>>>>>>     
>>>>>>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>>>>>>> +{
>>>>>>> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
>>>>>>> +	int rc = -EIO;
>>>>>>> +
>>>>>>> +	if (!zdev)
>>>>>>> +		return -ENODEV;
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * If we can't get the zdev->state_lock the device state is
>>>>>>> +	 * currently undergoing a transition and we bail out - just
>>>>>>> +	 * the same as if the device's state is not configured at all.
>>>>>>> +	 */
>>>>>>> +	if (!mutex_trylock(&zdev->state_lock))
>>>>>>> +		return rc;
>>>>>>> +
>>>>>>> +	/* We can reset only if the function is configured */
>>>>>>> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	rc = zpci_hot_reset_device(zdev);
>>>>>>> +	if (rc != 0)
>>>>>>> +		goto out;
>>>>>>> +
>>>>>>> +	if (!vdev->pci_saved_state) {
>>>>>>> +		pci_err(vdev->pdev, "No saved available for the device");
>>>>>>> +		rc = -EIO;
>>>>>>> +		goto out;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	pci_dev_lock(vdev->pdev);
>>>>>>> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
>>>>>>> +	pci_restore_state(vdev->pdev);
>>>>>>> +	pci_dev_unlock(vdev->pdev);
>>>>>>> +out:
>>>>>>> +	mutex_unlock(&zdev->state_lock);
>>>>>>> +	return rc;
>>>>>>> +}
>>>>>> This looks like it should be a device or arch specific reset
>>>>>> implemented in drivers/pci, not vfio.  Thanks,
>>>>>>
>>>>>> Alex
>>>>> Are you suggesting to move this to an arch specific function? One thing
>>>>> we need to do after the zpci_hot_reset_device, is to correctly restore
>>>>> the config space of the device. And for vfio-pci bound devices we want
>>>>> to restore the state of the device to when it was initially opened.
>>>> We generally rely on the abstraction of pci_reset_function() to select
>>>> the correct type of reset for a function scope reset.  We've gone to
>>>> quite a bit of effort to implement all device specific resets and
>>>> quirks in the PCI core to be re-used across the kernel.
>>>>
>>>> Calling zpci_hot_reset_device() directly seems contradictory to those
>>>> efforts.  Should pci_reset_function() call this universally on s390x
>>>> rather than providing access to FLR/PM/SBR reset?
>>>>   
>>> I agree with you Alex. Still trying to figure out what's needed for
>>> this. We already do zpci_hot_reset_device() in reset_slot() from the
>>> s390_pci_hpc.c hotplug slot driver and that does get called via
>>> pci_reset_hotplug_slot() and pci_reset_function(). There are a few
>>> problems though that meant it didn't work for Farhan but I agree maybe
>>> we can fix them for the general case. For one pci_reset_function()
>>> via DEVICE_RESET first tries FLR but that won't work with the device in
>>> the error state and MMIO blocked. Sadly __pci_reset_function_locked()
>>> then concludes that other resets also won't work. So that's something
>>> we might want to improve in general, for example maybe we need
>>> something more like pci_dev_acpi_reset() with higher priority than FLR.
>> Yeah I did think of adding something like s390x CLP reset as part of the
>> reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But
>> that would introduce s390x specific code in pci core common code.
>>
>>> Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
>>> sure why that won't work as is. @Farhan do you know?
>> VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for
>> majority of PCI devices on s390x as that would drive a bus reset. It was
>> sufficient as most devices were single bus devices. But in the latest
>> generation of machines (z17) we expose true SR-IOV and an OS can have
>> access to both PF and VFs and so these are on the same bus and can have
>> different ownership based on what is bound to vfio-pci.
>>
>> My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per
>> function reset mechanism, which maps well with what our architecture
>> provides. On s390x we can drive a per function reset (via firmware)
>> through the CLP instruction driven by the zpci_hot_reset_device(). And
>> doing it as vfio zpci specific function would confine the s390x logic.
>>
>>>>    Why is it
>>>> universally correct here given the ioctl previously made use of
>>>> standard reset mechanisms?
>>>>
>>>> The DEVICE_RESET ioctl is simply an in-place reset of the device,
>>>> without restoring the original device state.  So we're also subtly
>>>> changing that behavior here, presumably because we're targeting the
>>>> specific error recovery case.  Have you considered how this might
>>>> break non-error-recovery use cases?
>>>>
>>>> I wonder if we want a different reset mechanism for this use case
>>>> rather than these subtle semantic changes.
>>> I think an alternative to that, which Farhan actually had in the
>>> previous internal version, is to implement
>>> pci_error_handlers::reset_done() and do the pci_load_saved_state()
>>> there. That would only affect the error recovery case leaving other
>>> cases alone.
>>>
>>>
>>> Thanks,
>>> Niklas
>> The reason I abandoned reset_done() callback idea is because its not
>> sufficient to recover the device correctly. Today before driving a reset
>> we save the state of the device. When a device is in error state, any
>> pci load/store (on s390x they are actual instructions :)) to config
>> space would return an error value (0xffffffff). We don't have any checks
>> in pci_save_state to prevent storing error values. And after a reset
>> when we try to restore the config space (pci_dev_restore) we try to
>> write the error value and this can be problematic. By the time the
>> reset_done() callback is invoked, its already too late.
> It's too late because we've re-written the error value back to config
> space and as a result the device is broken?
>
>
Yes, exactly.

>   What if
> pci_restore_state() were a little smarter to detect that it has bad
> read data from pci_save_state() and only restores state based on kernel
> data?  Would that leave the device in a functional state that
> reset_done() could restore the original saved state and push it out to
> the device?

Yeah I think this could work. I can try something like this.

>> @Alex,
>> I am open to ideas/suggestions on this. Do we think we need a separate
>> VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?
> Unfortunately I was short sighted on VFIO_DEVICE_RESET and it's the one
> ioctl that doesn't have any flags, so it's not very extensible.
>
> Can we do more of the above, ie. enlighten the FLR/PM reset callbacks to
> return -ENOTTY if the device is in an error state and config space is
> returning -1 such that we fall through to a slot reset that doesn't
> care how broken the device is and you auto-magically get the zpci
> function you want?  Follow-up with pushing the original state in
> reset_done()?  Thanks,
>
> Alex
>
Yeah I can do that. I think adding some validation checks to the FLR/PM 
callbacks wouldn't be a bad idea if its acceptable for PCI maintainers.

If you are okay with a reset_done() callback, I will try to incorporate 
your suggestions and spin a new series.

Thanks
Farhan