lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 11 May 2018 17:40:39 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     Alexandru Gagniuc <mr.nuke.me@...il.com>
Cc:     alex_gagniuc@...lteam.com, austin_bolen@...l.com,
        shyam_iyer@...l.com, "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Len Brown <lenb@...nel.org>, Tony Luck <tony.luck@...el.com>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Robert Moore <robert.moore@...el.com>,
        Erik Schmauss <erik.schmauss@...el.com>,
        Tyler Baicar <tbaicar@...eaurora.org>,
        Will Deacon <will.deacon@....com>,
        James Morse <james.morse@....com>,
        Shiju Jose <shiju.jose@...wei.com>,
        "Jonathan (Zhixiong) Zhang" <zjzhang@...eaurora.org>,
        Dongjiu Geng <gengdongjiu@...wei.com>,
        linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-edac@...r.kernel.org, devel@...ica.org
Subject: Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors
 reported through GHES

On Mon, Apr 30, 2018 at 04:33:52PM -0500, Alexandru Gagniuc wrote:
> The policy was to panic() when GHES said that an error is "Fatal".
> This logic is wrong for several reasons, as it doesn't take into
> account what caused the error.
> 
> PCIe fatal errors indicate that the link to a device is either
> unstable or unusable. They don't indicate that the machine is on fire,
> and they are not severe enough that we need to panic(). Instead of
> relying on crackmonkey firmware, evaluate the error severity based on
	     ^^^^^^^^^^^^

Please keep the smartass formulations for the ML only and do not let
them leak into commit messages.

> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@...il.com>
> ---
>  drivers/acpi/apei/ghes.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index c9f1971333c1..49318fba409c 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -425,8 +425,7 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
>   * GHES_SEV_RECOVERABLE -> AER_NONFATAL
>   * GHES_SEV_RECOVERABLE && CPER_SEC_RESET -> AER_FATAL
>   *     These both need to be reported and recovered from by the AER driver.
> - * GHES_SEV_PANIC does not make it to this handling since the kernel must
> - *     panic.
> + * GHES_SEV_PANIC -> AER_FATAL
>   */
>  static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
>  {
> @@ -459,6 +458,46 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
>  #endif
>  }
>  
> +/* PCIe errors should not cause a panic. */
> +static int ghes_sec_pcie_severity(struct acpi_hest_generic_data *gdata)
> +{
> +	struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
> +
> +	if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
> +	    pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO &&
> +	    IS_ENABLED(CONFIG_ACPI_APEI_PCIEAER))

How is PCIe error severity dependent on whether the AER error reporting
driver is enabled (and possibly not even loaded) on the system?

> +		return CPER_SEV_RECOVERABLE;
> +
> +	return ghes_cper_severity(gdata->error_severity);
> +}
> +/*

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ