lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190424175758.GC244134@google.com>
Date:   Wed, 24 Apr 2019 12:57:58 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     mr.nuke.me@...il.com, linux-pci@...r.kernel.org,
        austin_bolen@...l.com, alex_gagniuc@...lteam.com,
        keith.busch@...el.com, Shyam_Iyer@...l.com, lukas@...ner.de,
        okaya@...nel.org, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI: Add link_change error handler and vfio-pci user

On Tue, Apr 23, 2019 at 04:42:28PM -0600, Alex Williamson wrote:
> The PCIe bandwidth notification service generates logging any time a
> link changes speed or width to a state that is considered downgraded.
> Unfortunately, it cannot differentiate signal integrity related link
> changes from those intentionally initiated by an endpoint driver,
> including drivers that may live in userspace or VMs when making use
> of vfio-pci.  Therefore, allow the driver to have a say in whether
> the link is indeed downgraded and worth noting in the log, or if the
> change is perhaps intentional.
> 
> For vfio-pci, we don't know the intentions of the user/guest driver
> either, but we do know that GPU drivers in guests actively manage
> the link state and therefore trigger the bandwidth notification for
> what appear to be entirely intentional link changes.
> 
> Fixes: e8303bb7a75c PCI/LINK: Report degraded links via link bandwidth notification
> Link: https://lore.kernel.org/linux-pci/155597243666.19387.1205950870601742062.stgit@gimli.home/T/#u
> Signed-off-by: Alex Williamson <alex.williamson@...hat.com>
> ---
> 
> Changing to pci_dbg() logging is not super usable, so let's try the
> previous idea of letting the driver handle link change events as they
> see fit.  Ideally this might be two patches, but for easier handling,
> folding the pci and vfio-pci bits together.  Comments?  Thanks,

I'm a little uneasy about the bandwidth notification logging as a
whole.  Messages in dmesg don't seem like a solid base for building
management tools.

I assume the eventual goal would be some sort of digested notification
along the lines of "hey mr/ms administrator, the link to device X
unexpectedly became slower, you might want to check that out."

If I were building that, I don't think I would use dmesg.  I might
write a daemon that polls /sys/.../current_link_{speed,width}, or
maybe use some sort of netlink event.  Maybe it would be useful to
have the admin designate devices of interest.

I'm hesitant about adding a .link_change() handler.  If there's
something useful a driver could do with it, that's one thing.  But
using it merely to suppress a message doesn't really seem worth the
trouble, and it seems unfriendly to ask drivers to add it when they
didn't ask for it and get no benefit from it.

>  drivers/pci/probe.c         |   13 +++++++++++++
>  drivers/vfio/pci/vfio_pci.c |   10 ++++++++++
>  include/linux/pci.h         |    3 +++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 7e12d0163863..233cd4b5b6e8 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2403,6 +2403,19 @@ void pcie_report_downtraining(struct pci_dev *dev)
>  	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
>  		return;
>  
> +	/*
> +	 * If driver handles link_change event, defer to driver.  PCIe drivers
> +	 * can call pcie_print_link_status() to print current link info.
> +	 */
> +	device_lock(&dev->dev);
> +	if (dev->driver && dev->driver->err_handler &&
> +	    dev->driver->err_handler->link_change) {
> +		dev->driver->err_handler->link_change(dev);
> +		device_unlock(&dev->dev);
> +		return;
> +	}
> +	device_unlock(&dev->dev);
> +
>  	/* Print link status only if the device is constrained by the fabric */
>  	__pcie_print_link_status(dev, false);
>  }
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index cab71da46f4a..c9ffc0ccabb3 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1418,8 +1418,18 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  	return PCI_ERS_RESULT_CAN_RECOVER;
>  }
>  
> +/*
> + * Ignore link change notification, we can't differentiate signal related
> + * link changes from user driver power management type operations, so do
> + * nothing.  Potentially this could be routed out to the user.
> + */
> +static void vfio_pci_link_change(struct pci_dev *pdev)
> +{
> +}
> +
>  static const struct pci_error_handlers vfio_err_handlers = {
>  	.error_detected = vfio_pci_aer_err_detected,
> +	.link_change = vfio_pci_link_change,
>  };
>  
>  static struct pci_driver vfio_pci_driver = {
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 27854731afc4..e9194bc03f9e 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -763,6 +763,9 @@ struct pci_error_handlers {
>  
>  	/* Device driver may resume normal operations */
>  	void (*resume)(struct pci_dev *dev);
> +
> +	/* PCIe link change notification */
> +	void (*link_change)(struct pci_dev *dev);
>  };
>  
>  
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ