[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <576abfa6-effb-48b7-b15a-c964fad6ddea@linux.ibm.com>
Date: Thu, 14 Aug 2025 15:33:19 -0700
From: Farhan Ali <alifm@...ux.ibm.com>
To: Alex Williamson <alex.williamson@...hat.com>
Cc: Niklas Schnelle <schnelle@...ux.ibm.com>,
Bjorn Helgaas <helgaas@...nel.org>, linux-s390@...r.kernel.org,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
mjrosato@...ux.ibm.com
Subject: Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function
reset for zPCI
On 8/14/2025 1:57 PM, Alex Williamson wrote:
> On Thu, 14 Aug 2025 09:33:47 -0700
> Farhan Ali <alifm@...ux.ibm.com> wrote:
>
>> On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
>>> On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
>>>> On Wed, 13 Aug 2025 14:52:24 -0700
>>>> Farhan Ali <alifm@...ux.ibm.com> wrote:
>>>>
>>>>> On 8/13/2025 1:30 PM, Alex Williamson wrote:
>>>>>> On Wed, 13 Aug 2025 10:08:19 -0700
>>>>>> Farhan Ali <alifm@...ux.ibm.com> wrote:
>>>>>>
>>>>>>> For zPCI devices we should drive a platform specific function reset
>>>>>>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
>>>>>>> in error state.
>>>>>>>
>>>>>>> Signed-off-by: Farhan Ali <alifm@...ux.ibm.com>
>>>>>>> ---
>>>>>>> arch/s390/pci/pci.c | 1 +
>>>>>>> drivers/vfio/pci/vfio_pci_core.c | 4 ++++
>>>>>>> drivers/vfio/pci/vfio_pci_priv.h | 5 ++++
>>>>>>> drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>>>>>>> 4 files changed, 49 insertions(+)
>>> --- snip ---
>>>>>>>
>>>>>>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>>>>>>> +{
>>>>>>> + struct zpci_dev *zdev = to_zpci(vdev->pdev);
>>>>>>> + int rc = -EIO;
>>>>>>> +
>>>>>>> + if (!zdev)
>>>>>>> + return -ENODEV;
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * If we can't get the zdev->state_lock the device state is
>>>>>>> + * currently undergoing a transition and we bail out - just
>>>>>>> + * the same as if the device's state is not configured at all.
>>>>>>> + */
>>>>>>> + if (!mutex_trylock(&zdev->state_lock))
>>>>>>> + return rc;
>>>>>>> +
>>>>>>> + /* We can reset only if the function is configured */
>>>>>>> + if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
>>>>>>> + goto out;
>>>>>>> +
>>>>>>> + rc = zpci_hot_reset_device(zdev);
>>>>>>> + if (rc != 0)
>>>>>>> + goto out;
>>>>>>> +
>>>>>>> + if (!vdev->pci_saved_state) {
>>>>>>> + pci_err(vdev->pdev, "No saved available for the device");
>>>>>>> + rc = -EIO;
>>>>>>> + goto out;
>>>>>>> + }
>>>>>>> +
>>>>>>> + pci_dev_lock(vdev->pdev);
>>>>>>> + pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
>>>>>>> + pci_restore_state(vdev->pdev);
>>>>>>> + pci_dev_unlock(vdev->pdev);
>>>>>>> +out:
>>>>>>> + mutex_unlock(&zdev->state_lock);
>>>>>>> + return rc;
>>>>>>> +}
>>>>>> This looks like it should be a device or arch specific reset
>>>>>> implemented in drivers/pci, not vfio. Thanks,
>>>>>>
>>>>>> Alex
>>>>> Are you suggesting to move this to an arch specific function? One thing
>>>>> we need to do after the zpci_hot_reset_device, is to correctly restore
>>>>> the config space of the device. And for vfio-pci bound devices we want
>>>>> to restore the state of the device to when it was initially opened.
>>>> We generally rely on the abstraction of pci_reset_function() to select
>>>> the correct type of reset for a function scope reset. We've gone to
>>>> quite a bit of effort to implement all device specific resets and
>>>> quirks in the PCI core to be re-used across the kernel.
>>>>
>>>> Calling zpci_hot_reset_device() directly seems contradictory to those
>>>> efforts. Should pci_reset_function() call this universally on s390x
>>>> rather than providing access to FLR/PM/SBR reset?
>>>>
>>> I agree with you Alex. Still trying to figure out what's needed for
>>> this. We already do zpci_hot_reset_device() in reset_slot() from the
>>> s390_pci_hpc.c hotplug slot driver and that does get called via
>>> pci_reset_hotplug_slot() and pci_reset_function(). There are a few
>>> problems though that meant it didn't work for Farhan but I agree maybe
>>> we can fix them for the general case. For one pci_reset_function()
>>> via DEVICE_RESET first tries FLR but that won't work with the device in
>>> the error state and MMIO blocked. Sadly __pci_reset_function_locked()
>>> then concludes that other resets also won't work. So that's something
>>> we might want to improve in general, for example maybe we need
>>> something more like pci_dev_acpi_reset() with higher priority than FLR.
>> Yeah I did think of adding something like s390x CLP reset as part of the
>> reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But
>> that would introduce s390x specific code in pci core common code.
>>
>>> Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
>>> sure why that won't work as is. @Farhan do you know?
>> VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for
>> majority of PCI devices on s390x as that would drive a bus reset. It was
>> sufficient as most devices were single bus devices. But in the latest
>> generation of machines (z17) we expose true SR-IOV and an OS can have
>> access to both PF and VFs and so these are on the same bus and can have
>> different ownership based on what is bound to vfio-pci.
>>
>> My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per
>> function reset mechanism, which maps well with what our architecture
>> provides. On s390x we can drive a per function reset (via firmware)
>> through the CLP instruction driven by the zpci_hot_reset_device(). And
>> doing it as vfio zpci specific function would confine the s390x logic.
>>
>>>> Why is it
>>>> universally correct here given the ioctl previously made use of
>>>> standard reset mechanisms?
>>>>
>>>> The DEVICE_RESET ioctl is simply an in-place reset of the device,
>>>> without restoring the original device state. So we're also subtly
>>>> changing that behavior here, presumably because we're targeting the
>>>> specific error recovery case. Have you considered how this might
>>>> break non-error-recovery use cases?
>>>>
>>>> I wonder if we want a different reset mechanism for this use case
>>>> rather than these subtle semantic changes.
>>> I think an alternative to that, which Farhan actually had in the
>>> previous internal version, is to implement
>>> pci_error_handlers::reset_done() and do the pci_load_saved_state()
>>> there. That would only affect the error recovery case leaving other
>>> cases alone.
>>>
>>>
>>> Thanks,
>>> Niklas
>> The reason I abandoned reset_done() callback idea is because its not
>> sufficient to recover the device correctly. Today before driving a reset
>> we save the state of the device. When a device is in error state, any
>> pci load/store (on s390x they are actual instructions :)) to config
>> space would return an error value (0xffffffff). We don't have any checks
>> in pci_save_state to prevent storing error values. And after a reset
>> when we try to restore the config space (pci_dev_restore) we try to
>> write the error value and this can be problematic. By the time the
>> reset_done() callback is invoked, its already too late.
> It's too late because we've re-written the error value back to config
> space and as a result the device is broken?
>
>
Yes, exactly.
> What if
> pci_restore_state() were a little smarter to detect that it has bad
> read data from pci_save_state() and only restores state based on kernel
> data? Would that leave the device in a functional state that
> reset_done() could restore the original saved state and push it out to
> the device?
Yeah I think this could work. I can try something like this.
>> @Alex,
>> I am open to ideas/suggestions on this. Do we think we need a separate
>> VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?
> Unfortunately I was short sighted on VFIO_DEVICE_RESET and it's the one
> ioctl that doesn't have any flags, so it's not very extensible.
>
> Can we do more of the above, ie. enlighten the FLR/PM reset callbacks to
> return -ENOTTY if the device is in an error state and config space is
> returning -1 such that we fall through to a slot reset that doesn't
> care how broken the device is and you auto-magically get the zpci
> function you want? Follow-up with pushing the original state in
> reset_done()? Thanks,
>
> Alex
>
Yeah I can do that. I think adding some validation checks to the FLR/PM
callbacks wouldn't be a bad idea if its acceptable for PCI maintainers.
If you are okay with a reset_done() callback, I will try to incorporate
your suggestions and spin a new series.
Thanks
Farhan
Powered by blists - more mailing lists