lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6703760a502d146909482f3aeb4333bf33cb431b.camel@linux.ibm.com>
Date: Tue, 16 Sep 2025 12:54:30 +0200
From: Niklas Schnelle <schnelle@...ux.ibm.com>
To: Farhan Ali <alifm@...ux.ibm.com>, linux-s390@...r.kernel.org,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-pci@...r.kernel.org
Cc: alex.williamson@...hat.com, helgaas@...nel.org, mjrosato@...ux.ibm.com
Subject: Re: [PATCH v3 07/10] s390/pci: Store PCI error information for
 passthrough devices

On Mon, 2025-09-15 at 11:12 -0700, Farhan Ali wrote:
> On 9/15/2025 4:42 AM, Niklas Schnelle wrote:
> > On Thu, 2025-09-11 at 11:33 -0700, Farhan Ali wrote:
> > > For a passthrough device we need co-operation from user space to recover
> > > the device. This would require to bubble up any error information to user
> > > space.  Let's store this error information for passthrough devices, so it
> > > can be retrieved later.
> > > 
> > > Signed-off-by: Farhan Ali <alifm@...ux.ibm.com>
> > > ---
> > > 
--- snip ---
> > > +	mutex_unlock(&zdev->pending_errs_lock);
> > > +}
> > > +
> > > +void zpci_cleanup_pending_errors(struct zpci_dev *zdev)
> > > +{
> > > +	struct pci_dev *pdev = NULL;
> > > +
> > > +	mutex_lock(&zdev->pending_errs_lock);
> > > +	pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
> > > +	if (zdev->pending_errs.count)
> > > +		pr_err("%s: Unhandled PCI error events count=%zu",
> > > +				pci_name(pdev), zdev->pending_errs.count);
> > I think this could be a zpci_dbg(). That way you also don't need the
> > pci_get_slot() which is also buggy as it misses a pci_dev_put(). The
> > message also doesn't seem useful for the user. As I understand it this
> > would happen if a vfio-pci user dies without handling all the error
> > events but then vfio-pci will also reset the slot on closing of the
> > fds, no? So the device will get reset anyway.
> 
> Right, the device will reset anyway. But I wanted to at least give an 
> indication to the user that some events were not handled correctly. 
> Maybe pr_err is a little extreme, so can convert to a warn? This should 
> be rare as well behaving applications shouldn't do this. I am fine with 
> zpci_dbg as well, its just the kernel needs to be in debug mode for us 
> to get this info.

No, zpci_dbg() logs to /sys/kernel/debug/s390dbf/pci_msg/sprintf
without need for debug mode. I'm also ok with a pr_warn() or maybe even
pr_info(). I can see your argument that this may be useful to have in
dmesg e.g. when debugging a user-space driver without having to know
about s390 specific debug aids.

> 
> > 
> > > +	memset(&zdev->pending_errs, 0, sizeof(struct zpci_ccdf_pending));
> > If this goes wrong and we subsequently crash or take a live memory dump
> > I'd prefer to have bread crumbs such as the errors that weren't cleaned
> > up. Wouldn't it be enough to just set the count to zero and for debug
> > the original count will be in s390dbf.
> 
> I think setting count to zero should be enough, but I am wary about 
> keeping stale state around. How about just logging the count that was 
> not handled, in s390dbf? I think we already dump the ccdf in s390df if 
> we get any error event. So it should be enough for us to trace back the 
> unhandled error events?
> 
> > Also maybe it would make sense
> > to pull the zdev->mediated_recovery clearing in here?
> 
> I would like to keep the mediated_recovery flag separate from just 
> cleaning up the errors. The flag gets initialized when we open the vfio 
> device and so having the flag cleared on close makes it easier to track 
> this IMHO.

Ok yeah I can see the symmetry argument.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ