[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ee79c53-4c29-475f-b44e-6839b1feef78@linux.ibm.com>
Date: Thu, 16 Oct 2025 14:00:22 -0700
From: Farhan Ali <alifm@...ux.ibm.com>
To: Niklas Schnelle <schnelle@...ux.ibm.com>, Lukas Wunner <lukas@...ner.de>
Cc: Benjamin Block <bblock@...ux.ibm.com>, linux-s390@...r.kernel.org,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, alex.williamson@...hat.com,
helgaas@...nel.org, clg@...hat.com, mjrosato@...ux.ibm.com
Subject: Re: [PATCH v4 01/10] PCI: Avoid saving error values for config space
On 10/14/2025 5:07 AM, Niklas Schnelle wrote:
> On Sun, 2025-10-12 at 08:34 +0200, Lukas Wunner wrote:
>> On Thu, Oct 09, 2025 at 11:12:03AM +0200, Niklas Schnelle wrote:
>>> On Wed, 2025-10-08 at 20:14 +0200, Lukas Wunner wrote:
>>>> And yet you're touching the device by trying to reset it.
>>>>
>>>> The code you're introducing in patch [01/10] only becomes necessary
>>>> because you're not following the above-quoted protocol. If you
>>>> follow the protocol, patch [01/10] becomes unnecessary.
>>> I agree with your point above error_detected() should not touch the
>>> device. My understanding of Farhan's series though is that it follows
>>> that rule. As I understand it error_detected() is only used to inject
>>> the s390 specific PCI error event into the VM using the information
>>> stored in patch 7. As before vfio-pci returns
>>> PCI_ERS_RESULT_CAN_RECOVER from error_detected() but then with patch 7
>>> the pass-through case is detected and this gets turned into
>>> PCI_ERS_RESULT_RECOVERED and the rest of the s390 recovery code gets
>>> skipped. And yeah, writing it down I'm not super happy with this part,
>>> maybe it would be better to have an explicit
>>> PCI_ERS_RESULT_LEAVE_AS_IS.
>> Thanks, that's the high-level overview I was looking for.
>>
>> It would be good to include something like this at least
>> in the cover letter or additionally in the commit messages
>> so that it's easier for reviewers to connect the dots.
>>
>> I was expecting paravirtualized error handling, i.e. the
>> VM is aware it's virtualized and vfio essentially proxies
>> the pci_ers_result return value of the driver (e.g. nvme)
>> back to the host, thereby allowing the host to drive error
>> recovery normally. I'm not sure if there are technical
>> reasons preventing such an approach.
> It does sound technically feasible but sticking to the already
> architected error reporting and recovery has clear advantages. For one
> it will work with existing Linux versions without guest changes and it
> also has precedent with it working already in the z/VM hypervisor for
> years. I agree that there is some level of mismatch with Linux'
> recovery support but I don't think that outweighs having a clean
> virtualization support where the host and guest use the same interface.
>
>> If you do want to stick with your alternative approach,
>> maybe doing the error handling in the ->mmio_enabled() phase
>> instead of ->error_detected() would make more sense.
>> In that phase you're allowed to access the device,
>> you can also attempt a local reset and return
>> PCI_ERS_RESULT_RECOVERED on success.
>>
>> You'd have to return PCI_ERS_RESULT_CAN_RECOVER though
>> from the ->error_detected() callback in order to progress
>> to the ->mmio_enabled() step.
>>
>> Does that make sense?
>>
>> Thanks,
>>
>> Lukas
> The problem with using ->mmio_enabled() is two fold. For one we
> sometimes have to do a reset instead of clearing the error state, for
> example if the device was not only put in the error state but also
> disabled, or if the guest driver wants it, so we would also have to use
> ->slot_reset() and could end up with two resets. Second and more
> importantly this would break the guests assumption that the device will
> be in the error state with MMIO and DMA blocked when it gets an error
> event. On the other hand, that's exactly the state it is in if we
> report the error in the ->error_detected() callback and then skip the
> rest of recovery so it can be done in the guest, likely with the exact
> same Linux code. I'd assume this should be similar if QEMU/KVM wanted
> to virtualize AER+DPC except that there MMIO remains accessible?
>
> Here's an idea. Could it be an option to detect the pass-through in the
> vfio-pci driver's ->error_detected() callback, possibly with feedback
> from QEMU (@Alex?), and then return PCI_ERS_RESULT_RECOVERED from there
> skipping the rest of recovery?
>
> The skipping would be in-line with the below section of the
> documentation i.e. "no further intervention":
>
> - PCI_ERS_RESULT_RECOVERED
> Driver returns this if it thinks the device is usable despite
> the error and does not need further intervention.
>
> It's just that in this case the device really remains with MMIO and DMA
> blocked, usable only in the sense that the vfio-pci + guest VM combo
> knows how to use a device with MMIO and DMA blocked with the guest
> recovery.
>
> Thanks,
> Niklas
Hi Lukas,
Hope this helps to clarify why we still need patch [01/10] (or at least
the check in pci_save_state() to see if the device responds with error
value or not if we move forward with your patch series PCI: Universal
error recoverability of devices). We can discuss if that check needs to
be moved somewhere else if there is concern with overhead in
pci_save_state(). Discussing with Niklas (off mailing list), we were
thinking if it makes sense if vfio_pci_core_aer_err_detected() returned
PCI_ERS_RESULT_RECOVERED if it doesn't need any further intervention
from platform recovery to align closer to pcie-error-recovery
documentation? One proposal would be to have a flag in struct
vfio_pci_core_device(eg vdev->mediated_recovery), which can be used to
return PCI_ERS_RESULT_RECOVERED in vfio_pci_core_aer_err_detected()if
the flag was set. The flag could be set by userspace using
VFIO_DEVICE_FEATURE_SET for the device feature
VFIO_DEVICE_FEATURE_ZPCI_ERROR (would like to hear Alex's thoughts on
this proposal).
Thanks
Farhan
Powered by blists - more mailing lists