lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <350a9bd5-c2a9-4206-98fd-8a7913d36112@linux.ibm.com>
Date: Thu, 14 Aug 2025 09:33:47 -0700
From: Farhan Ali <alifm@...ux.ibm.com>
To: Niklas Schnelle <schnelle@...ux.ibm.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Bjorn Helgaas <helgaas@...nel.org>
Cc: linux-s390@...r.kernel.org, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, mjrosato@...ux.ibm.com
Subject: Re: [PATCH v1 5/6] vfio-pci/zdev: Perform platform specific function
 reset for zPCI


On 8/14/2025 6:12 AM, Niklas Schnelle wrote:
> On Wed, 2025-08-13 at 16:56 -0600, Alex Williamson wrote:
>> On Wed, 13 Aug 2025 14:52:24 -0700
>> Farhan Ali <alifm@...ux.ibm.com> wrote:
>>
>>> On 8/13/2025 1:30 PM, Alex Williamson wrote:
>>>> On Wed, 13 Aug 2025 10:08:19 -0700
>>>> Farhan Ali <alifm@...ux.ibm.com> wrote:
>>>>   
>>>>> For zPCI devices we should drive a platform specific function reset
>>>>> as part of VFIO_DEVICE_RESET. This reset is needed recover a zPCI device
>>>>> in error state.
>>>>>
>>>>> Signed-off-by: Farhan Ali <alifm@...ux.ibm.com>
>>>>> ---
>>>>>    arch/s390/pci/pci.c              |  1 +
>>>>>    drivers/vfio/pci/vfio_pci_core.c |  4 ++++
>>>>>    drivers/vfio/pci/vfio_pci_priv.h |  5 ++++
>>>>>    drivers/vfio/pci/vfio_pci_zdev.c | 39 ++++++++++++++++++++++++++++++++
>>>>>    4 files changed, 49 insertions(+)
> --- snip ---
>>>>>    
>>>>> +int vfio_pci_zdev_reset(struct vfio_pci_core_device *vdev)
>>>>> +{
>>>>> +	struct zpci_dev *zdev = to_zpci(vdev->pdev);
>>>>> +	int rc = -EIO;
>>>>> +
>>>>> +	if (!zdev)
>>>>> +		return -ENODEV;
>>>>> +
>>>>> +	/*
>>>>> +	 * If we can't get the zdev->state_lock the device state is
>>>>> +	 * currently undergoing a transition and we bail out - just
>>>>> +	 * the same as if the device's state is not configured at all.
>>>>> +	 */
>>>>> +	if (!mutex_trylock(&zdev->state_lock))
>>>>> +		return rc;
>>>>> +
>>>>> +	/* We can reset only if the function is configured */
>>>>> +	if (zdev->state != ZPCI_FN_STATE_CONFIGURED)
>>>>> +		goto out;
>>>>> +
>>>>> +	rc = zpci_hot_reset_device(zdev);
>>>>> +	if (rc != 0)
>>>>> +		goto out;
>>>>> +
>>>>> +	if (!vdev->pci_saved_state) {
>>>>> +		pci_err(vdev->pdev, "No saved available for the device");
>>>>> +		rc = -EIO;
>>>>> +		goto out;
>>>>> +	}
>>>>> +
>>>>> +	pci_dev_lock(vdev->pdev);
>>>>> +	pci_load_saved_state(vdev->pdev, vdev->pci_saved_state);
>>>>> +	pci_restore_state(vdev->pdev);
>>>>> +	pci_dev_unlock(vdev->pdev);
>>>>> +out:
>>>>> +	mutex_unlock(&zdev->state_lock);
>>>>> +	return rc;
>>>>> +}
>>>> This looks like it should be a device or arch specific reset
>>>> implemented in drivers/pci, not vfio.  Thanks,
>>>>
>>>> Alex
>>> Are you suggesting to move this to an arch specific function? One thing
>>> we need to do after the zpci_hot_reset_device, is to correctly restore
>>> the config space of the device. And for vfio-pci bound devices we want
>>> to restore the state of the device to when it was initially opened.
>> We generally rely on the abstraction of pci_reset_function() to select
>> the correct type of reset for a function scope reset.  We've gone to
>> quite a bit of effort to implement all device specific resets and
>> quirks in the PCI core to be re-used across the kernel.
>>
>> Calling zpci_hot_reset_device() directly seems contradictory to those
>> efforts.  Should pci_reset_function() call this universally on s390x
>> rather than providing access to FLR/PM/SBR reset?
>>
> I agree with you Alex. Still trying to figure out what's needed for
> this. We already do zpci_hot_reset_device() in reset_slot() from the
> s390_pci_hpc.c hotplug slot driver and that does get called via
> pci_reset_hotplug_slot() and pci_reset_function(). There are a few
> problems though that meant it didn't work for Farhan but I agree maybe
> we can fix them for the general case. For one pci_reset_function()
> via DEVICE_RESET first tries FLR but that won't work with the device in
> the error state and MMIO blocked. Sadly __pci_reset_function_locked()
> then concludes that other resets also won't work. So that's something
> we might want to improve in general, for example maybe we need
> something more like pci_dev_acpi_reset() with higher priority than FLR.

Yeah I did think of adding something like s390x CLP reset as part of the 
reset methods. AFAIU the s390x CLP reset is similar to ACPI _RST. But 
that would introduce s390x specific code in pci core common code.

>
> Now for pci_reset_hotplug_slot() via VFIO_DEVICE_PCI_HOT_RESET I'm not
> sure why that won't work as is. @Farhan do you know?

VFIO_DEVICE_PCI_HOT_RESET would have been sufficient interface for 
majority of PCI devices on s390x as that would drive a bus reset. It was 
sufficient as most devices were single bus devices. But in the latest 
generation of machines (z17) we expose true SR-IOV and an OS can have 
access to both PF and VFs and so these are on the same bus and can have 
different ownership based on what is bound to vfio-pci.

My thinking for extending VFIO_DEVICE_RESET is because AFAIU its a per 
function reset mechanism, which maps well with what our architecture 
provides. On s390x we can drive a per function reset (via firmware) 
through the CLP instruction driven by the zpci_hot_reset_device(). And 
doing it as vfio zpci specific function would confine the s390x logic.

>>   Why is it
>> universally correct here given the ioctl previously made use of
>> standard reset mechanisms?
>>
>> The DEVICE_RESET ioctl is simply an in-place reset of the device,
>> without restoring the original device state.  So we're also subtly
>> changing that behavior here, presumably because we're targeting the
>> specific error recovery case.  Have you considered how this might
>> break non-error-recovery use cases?
>>
>> I wonder if we want a different reset mechanism for this use case
>> rather than these subtle semantic changes.
> I think an alternative to that, which Farhan actually had in the
> previous internal version, is to implement
> pci_error_handlers::reset_done() and do the pci_load_saved_state()
> there. That would only affect the error recovery case leaving other
> cases alone.
>
>
> Thanks,
> Niklas

The reason I abandoned reset_done() callback idea is because its not 
sufficient to recover the device correctly. Today before driving a reset 
we save the state of the device. When a device is in error state, any 
pci load/store (on s390x they are actual instructions :)) to config 
space would return an error value (0xffffffff). We don't have any checks 
in pci_save_state to prevent storing error values. And after a reset 
when we try to restore the config space (pci_dev_restore) we try to 
write the error value and this can be problematic. By the time the 
reset_done() callback is invoked, its already too late.

@Alex,
I am open to ideas/suggestions on this. Do we think we need a separate 
VFIO ioctl to drive this or a new reset mechanism as Niklas suggested?

Thanks
Farhan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ