lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aPT26UZ41DsN5C01@wunner.de>
Date: Sun, 19 Oct 2025 16:34:17 +0200
From: Lukas Wunner <lukas@...ner.de>
To: Niklas Schnelle <schnelle@...ux.ibm.com>
Cc: Farhan Ali <alifm@...ux.ibm.com>, Benjamin Block <bblock@...ux.ibm.com>,
	linux-s390@...r.kernel.org, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
	alex.williamson@...hat.com, helgaas@...nel.org, clg@...hat.com,
	mjrosato@...ux.ibm.com
Subject: Re: [PATCH v4 01/10] PCI: Avoid saving error values for config space

On Tue, Oct 14, 2025 at 02:07:57PM +0200, Niklas Schnelle wrote:
> On Sun, 2025-10-12 at 08:34 +0200, Lukas Wunner wrote:
> > If you do want to stick with your alternative approach,
> > maybe doing the error handling in the ->mmio_enabled() phase
> > instead of ->error_detected() would make more sense.
> > In that phase you're allowed to access the device,
> > you can also attempt a local reset and return
> > PCI_ERS_RESULT_RECOVERED on success.
> > 
> > You'd have to return PCI_ERS_RESULT_CAN_RECOVER though
> > from the ->error_detected() callback in order to progress
> > to the ->mmio_enabled() step.
> 
> The problem with using ->mmio_enabled() is two fold. For one we
> sometimes have to do a reset instead of clearing the error state, for
> example if the device was not only put in the error state but also
> disabled, or if the guest driver wants it,

Well in that case you could reset the device in the ->mmio_enabled() step
from the guest using the vfio reset ioctl.

> Second and more
> importantly this would break the guests assumption that the device will
> be in the error state with MMIO and DMA blocked when it gets an error
> event. On the other hand, that's exactly the state it is in if we
> report the error in the ->error_detected() callback

At the risk of continuously talking past each other:

How about this, the host notifies the guest of the error in the
->error_detected() callback.  The guest notifies the driver and
collects the result (whether a reset is requested or not), then
returns PCI_ERS_RESULT_CAN_RECOVER to the host.

The host re-enables I/O to the device, invokes the ->mmio_detected()
callback.  The guest then resets the device based on the result it
collected earlier or invokes the driver's ->mmio_enabled() callback.

If the driver returns PCI_ERS_RESULT_NEED_RESET from the
->mmio_enabled() callback, you can likewise reset the device from
the guest using the ioctl method.

My concern is that by insisting that you handle device recovery
completely in the ->error_detected() phase, you're not complying
with the protocol specified in Documentation/PCI/pci-error-recovery.rst
and as a result, you have to amend the reset code in the PCI core
because it assumes that all arches adheres to the protocol.
In my view, that suggests that the approach needs to be reworked
to comply with the protocol.  Then the workarounds for performing
a reset while I/O is blocked become unnecessary.

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ