[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250825171226.1602-1-alifm@linux.ibm.com>
Date: Mon, 25 Aug 2025 10:12:17 -0700
From: Farhan Ali <alifm@...ux.ibm.com>
To: linux-s390@...r.kernel.org, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: alex.williamson@...hat.com, helgaas@...nel.org, alifm@...ux.ibm.com,
schnelle@...ux.ibm.com, mjrosato@...ux.ibm.com
Subject: [PATCH v2 0/9] Error recovery for vfio-pci devices on s390x
Hi,
This Linux kernel patch series introduces support for error recovery for
passthrough PCI devices on System Z (s390x).
Background
----------
For PCI devices on s390x an operating system receives platform specific
error events from firmware rather than through AER.Today for
passthrough/userspace devices, we don't attempt any error recovery and
ignore any error events for the devices. The passthrough/userspace devices
are managed by the vfio-pci driver. The driver does register error handling
callbacks (error_detected), and on an error trigger an eventfd to
userspace. But we need a mechanism to notify userspace
(QEMU/guest/userspace drivers) about the error event.
Proposal
--------
We can expose this error information (currently only the PCI Error Code)
via a device feature. Userspace can then obtain the error information
via VFIO_DEVICE_FEATURE ioctl and take appropriate actions such as driving
a device reset.
I would appreciate some feedback on this series.
Thanks
Farhan
ChangeLog
---------
v1 series https://lore.kernel.org/all/20250813170821.1115-1-alifm@linux.ibm.com/
v1 - > v2
- Patches 1 and 2 adds some additional checks for FLR/PM reset to
try other function reset method (suggested by Alex).
- Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
functions.
- Patch 7 adds a new device feature for zPCI devices for the VFIO_DEVICE_FEATURE
ioctl. The ioctl is used by userspace to retriece any PCI error
information for the device (suggested by Alex).
- Patch 8 adds a reset_done() callback for the vfio-pci driver, to
restore the state of the device after a reset.
- Patch 9 removes the pcie check for triggering VFIO_PCI_ERR_IRQ_INDEX.
Farhan Ali (9):
PCI: Avoid restoring error values in config space
PCI: Add additional checks for flr and pm reset
PCI: Allow per function PCI slots for hypervisor isolated functions
s390/pci: Restore airq unconditionally for the zPCI device
s390/pci: Update the logic for detecting passthrough device
s390/pci: Store PCI error information for passthrough devices
vfio-pci/zdev: Add a device feature for error information
vfio: Add a reset_done callback for vfio-pci driver
vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX
arch/s390/include/asm/pci.h | 30 ++++++++-
arch/s390/pci/pci.c | 1 +
arch/s390/pci/pci_event.c | 107 +++++++++++++++++-------------
arch/s390/pci/pci_irq.c | 9 +--
drivers/pci/pci.c | 10 +++
drivers/pci/slot.c | 19 +++++-
drivers/vfio/pci/vfio_pci_core.c | 20 ++++--
drivers/vfio/pci/vfio_pci_intrs.c | 3 +-
drivers/vfio/pci/vfio_pci_priv.h | 8 +++
drivers/vfio/pci/vfio_pci_zdev.c | 45 ++++++++++++-
include/uapi/linux/vfio.h | 14 ++++
11 files changed, 200 insertions(+), 66 deletions(-)
--
2.43.0
Powered by blists - more mailing lists