linux-kernel - Re: [PATCH v4 01/10] PCI: Avoid saving error values for config space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aOtN_r_Mp-nQ6Ckj@wunner.de>
Date: Sun, 12 Oct 2025 08:43:10 +0200
From: Lukas Wunner <lukas@...ner.de>
To: Farhan Ali <alifm@...ux.ibm.com>
Cc: Benjamin Block <bblock@...ux.ibm.com>, linux-s390@...r.kernel.org,
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-pci@...r.kernel.org, alex.williamson@...hat.com,
	helgaas@...nel.org, clg@...hat.com, schnelle@...ux.ibm.com,
	mjrosato@...ux.ibm.com
Subject: Re: [PATCH v4 01/10] PCI: Avoid saving error values for config space

On Thu, Oct 09, 2025 at 10:02:12AM -0700, Farhan Ali wrote:
> On 10/8/2025 9:52 PM, Lukas Wunner wrote:
> > On Wed, Oct 08, 2025 at 02:55:56PM -0700, Farhan Ali wrote:
> > > > > On 10/8/2025 6:34 AM, Lukas Wunner wrote:
> > > > > > I also don't quite understand why the VM needs to perform a reset.
> > > > > > Why can't you just let the VM tell the host that a reset is needed
> > > > > > (PCI_ERS_RESULT_NEED_RESET) and then the host resets the device on
> > > > > > behalf of the VM?
> > > The reset is not performed by the VM, reset is still done by the host. My
> > > approach for a VM to let the host know that reset was needed, was to
> > > intercept any reset instructions for the PCI device in QEMU. QEMU would
> > > then drive a reset via VFIO_DEVICE_RESET. Maybe I am missing something,
> > > but based on what we have today in vfio driver, we don't have a mechanism
> > > for userspace to reset a device other than VFIO_DEVICE_RESET and
> > > VFIO_PCI_DEVICE_HOT_RESET ioctls.
> > The ask is for the host to notify the VM of the ->error_detected() event
> > and the VM then responding with one of the "enum pci_ers_result" values.
> 
> Maybe there is some confusion here. Could you clarify what do you mean by VM
> responding with "enum pci_ers_result" values? Is it a device driver (for
> example an NVMe driver) running in the VM that should do that? Or is it
> something else you are suggesting?

My expectation was that the host notifies the VM of the error,
the kernel in the VM notifies the nvme driver of the error,
the nvme driver returns a pci_ers_result return value from
its pci_error_handlers which the VM passes back to the host,
the host drives error recovery normally.

I was missing the high-level architectural overview that Niklas
subsequently provided.  You should provide it as part of your
series because otherwise it's difficult for reviewers to understand
what the individual patches are trying to achieve as a whole.

Thanks,

Lukas