lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAErSpo7VvWreoTj7MUE3cMGaM1dVMfZymO=1P-m8E-joRSgE7g@mail.gmail.com>
Date:	Wed, 28 Nov 2012 13:15:21 -0700
From:	Bjorn Helgaas <bhelgaas@...gle.com>
To:	"Pandarathil, Vijaymohan R" <vijaymohan.pandarathil@...com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"linasvepstas@...il.com" <linasvepstas@...il.com>,
	Myron Stowe <myron.stowe@...il.com>,
	"Ortiz, Lance E" <lance.oritz@...com>,
	Huang Ying <ying.huang@...el.com>,
	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
	"Patterson, Andrew D (LeftHand Networks)" <andrew.patterson@...com>,
	Zhang Yanmin <yanmin.zhang@...el.com>
Subject: Re: [PATCH v3] PCI-AER: Do not report successful error recovery for
 devices with AER-unaware drivers

On Sat, Nov 17, 2012 at 4:47 AM, Pandarathil, Vijaymohan R
<vijaymohan.pandarathil@...com> wrote:
> When an error is detected on a PCIe device which does not have an
> AER-aware driver, prevent AER infrastructure from reporting
> successful error recovery.
>
> This is because the report_error_detected() function that gets
> called in the first phase of recovery process allows forward
> progress even when the driver for the device does not have AER
> capabilities. It seems that all callbacks (in pci_error_handlers
> structure) registered by drivers that gets called during error
> recovery are not mandatory. So the intention of the infrastructure
> design seems to be to allow forward progress even when a specific
> callback has not been registered by a driver. However, if error
> handler structure itself has not been registered, it doesn't make
> sense to allow forward progress.
>
> As a result of the current design, in the case of a single device
> having an AER-unaware driver or in the case of any function in a
> multi-function card having an AER-unaware driver, a successful
> recovery is reported.
>
> Typical scenario this happens is when a PCI device is detached
> from a KVM host and the pci-stub driver on the host claims the
> device. The pci-stub driver does not have error handling capabilities
> but the AER infrastructure still reports that the device recovered
> successfully.
>
> The changes proposed here leaves the device(s)in an unrecovered state
> if the driver for the device or for any device in the subtree
> does not have error handler structure registered. This reflects
> the true state of the device and prevents any partial recovery (or no
> recovery at all) reported as successful.
>
>
> v3:
>   - Fixed a checkpatch/build issue
>
> v2:
>   - Made changes so that all devices in the subtree have the error
>     state set correctly.
>
>
> Reviewed-by: Linas Vepstas <linasvepstas <at> gmail.com>
> Reviewed-by: Myron Stowe <mstowe <at> redhat.com>
> Reviewed-by: Bjorn Helgaas <bhelgaas <at> google.com>
> Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@...com>

I added this to my -next branch, which will be merged for v3.8.

Please double-check that I resolved the merge conflict correctly.  It
should show up here in the next few minutes:

http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs/heads/next

>  drivers/pci/pcie/aer/aerdrv.h      |  5 ++++-
>  drivers/pci/pcie/aer/aerdrv_core.c | 21 ++++++++++++++++++---
>  include/linux/pci.h                |  3 +++
>  3 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
> index 94a7598..22f840f 100644
> --- a/drivers/pci/pcie/aer/aerdrv.h
> +++ b/drivers/pci/pcie/aer/aerdrv.h
> @@ -87,6 +87,9 @@ struct aer_broadcast_data {
>  static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
>                 enum pci_ers_result new)
>  {
> +       if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
> +               return PCI_ERS_RESULT_NO_AER_DRIVER;
> +
>         if (new == PCI_ERS_RESULT_NONE)
>                 return orig;
>
> @@ -97,7 +100,7 @@ static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
>                 break;
>         case PCI_ERS_RESULT_DISCONNECT:
>                 if (new == PCI_ERS_RESULT_NEED_RESET)
> -                       orig = new;
> +                       orig = PCI_ERS_RESULT_NEED_RESET;
>                 break;
>         default:
>                 break;
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 9f80cb0..b67f91a 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -232,11 +232,26 @@ static int report_error_detected(struct pci_dev *dev, void *data)
>                                    dev->driver ?
>                                    "no AER-aware driver" : "no driver");
>                 }
> -               return 0;
> +
> +               /*
> +                * If there's any device in the subtree that does not
> +                * have an error_detected callback, returning
> +                * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
> +                * the subsequent mmio_enabled/slot_reset/resume
> +                * callbacks of "any" device in the subtree. All the
> +                * devices in the subtree are left in the error state
> +                * without recovery.
> +                */
> +
> +               if (!(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
> +                       vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> +               else
> +                       vote = PCI_ERS_RESULT_NONE;
> +       } else {
> +               err_handler = dev->driver->err_handler;
> +               vote = err_handler->error_detected(dev, result_data->state);
>         }
>
> -       err_handler = dev->driver->err_handler;
> -       vote = err_handler->error_detected(dev, result_data->state);
>         result_data->result = merge_result(result_data->result, vote);
>         return 0;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index ab17a08..c458602 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -540,6 +540,9 @@ enum pci_ers_result {
>
>         /* Device driver is fully recovered and operational */
>         PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5,
> +
> +       /* No AER capabilities registered for the driver */
> +       PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
>  };
>
>  /* PCI bus error event callbacks */
> --
> 1.7.11.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ