lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0h4Z_gND9nTBFeWmW+y5TMG-FjxmaFXPWPX7hRjWe9UPw@mail.gmail.com>
Date: Thu, 29 Feb 2024 18:35:58 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Avadhut Naik <avadhut.naik@....com>
Cc: linux-acpi@...r.kernel.org, rafael@...nel.org, lenb@...nel.org, 
	james.morse@....com, tony.luck@...el.com, bp@...en8.de, 
	linux-kernel@...r.kernel.org, yazen.ghannam@....com, avadnaik@....com
Subject: Re: [PATCH v4] ACPI: APEI: Skip initialization of GHES_ASSIST
 structures for Machine Check Architecture

On Thu, Feb 29, 2024 at 7:22 AM Avadhut Naik <avadhut.naik@....com> wrote:
>
> To support GHES_ASSIST on Machine Check Architecture (MCA) error sources,
> a set of GHES structures is provided by the system firmware for each MCA
> error source. Each of these sets consists of a GHES structure for each MCA
> bank on each logical CPU, with all structures of a set sharing a common
> Related Source ID, equal to the Source ID of one of the MCA error source
> structures.[1] On SOCs with large core counts, this typically equates to
> tens of thousands of GHES_ASSIST structures for MCA under
> "/sys/bus/platform/drivers/GHES".
>
> Support for GHES_ASSIST however, hasn't been implemented in the kernel. As
> such, the information provided through these structures is not consumed by
> Linux. Moreover, these GHES_ASSIST structures for MCA, which are supposed
> to provide supplemental information in context of an error reported by
> hardware, are setup as independent error sources by the kernel during HEST
> initialization.
>
> Additionally, if the Type field of the Notification structure, associated
> with these GHES_ASSIST structures for MCA, is set to Polled, the kernel
> sets up a timer for each individual structure. The duration of the timer
> is derived from the Poll Interval field of the Notification structure. On
> SOCs with high core counts, this will result in tens of thousands of
> timers expiring periodically causing unnecessary preemptions and wastage
> of CPU cycles. The problem will particularly intensify if Poll Interval
> duration is not sufficiently high.
>
> Since GHES_ASSIST support is not present in kernel, skip initialization
> of GHES_ASSIST structures for MCA to eliminate their performance impact.
>
> [1] ACPI specification 6.5, section 18.7
>
> Signed-off-by: Avadhut Naik <avadhut.naik@....com>
> Reviewed-by: Yazen Ghannam <yazen.ghannam@....com>
> Reviewed-by: Tony Luck <tony.luck@...el.com>
> ---
> Changes in v2:
> 1. Since is_ghes_assist_struct() returns if any of the conditions is hit
> if-else-if chain is redundant. Replace it with just if statements.
> 2. Fix formatting errors.
> 3. Add Reviewed-by: Yazen Ghannam <yazen.ghannam@....com>
>
> Changes in v3:
> 1. Modify structure (mces) comment, per Tony's recommendation, to better
> reflect the structure's usage.
>
> Changes in v4:
> 1. No changes within the patch. Just sending out to gather more attention.
> 2. Add Reviewed-by: Tony Luck <tony.luck@...el.com>
> ---
>  drivers/acpi/apei/hest.c | 51 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
>
> diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
> index 6aef1ee5e1bd..20d757687e3d 100644
> --- a/drivers/acpi/apei/hest.c
> +++ b/drivers/acpi/apei/hest.c
> @@ -37,6 +37,20 @@ EXPORT_SYMBOL_GPL(hest_disable);
>
>  static struct acpi_table_hest *__read_mostly hest_tab;
>
> +/*
> + * Since GHES_ASSIST is not supported, skip initialization of GHES_ASSIST
> + * structures for MCA.
> + * During HEST parsing, detected MCA error sources are cached from early
> + * table entries so that the Flags and Source Id fields from these cached
> + * values are then referred to in later table entries to determine if the
> + * encountered GHES_ASSIST structure should be initialized.
> + */
> +static struct {
> +       struct acpi_hest_ia_corrected *cmc;
> +       struct acpi_hest_ia_machine_check *mc;
> +       struct acpi_hest_ia_deferred_check *dmc;
> +} mces;
> +
>  static const int hest_esrc_len_tab[ACPI_HEST_TYPE_RESERVED] = {
>         [ACPI_HEST_TYPE_IA32_CHECK] = -1,       /* need further calculation */
>         [ACPI_HEST_TYPE_IA32_CORRECTED_CHECK] = -1,
> @@ -70,22 +84,54 @@ static int hest_esrc_len(struct acpi_hest_header *hest_hdr)
>                 cmc = (struct acpi_hest_ia_corrected *)hest_hdr;
>                 len = sizeof(*cmc) + cmc->num_hardware_banks *
>                         sizeof(struct acpi_hest_ia_error_bank);
> +               mces.cmc = cmc;
>         } else if (hest_type == ACPI_HEST_TYPE_IA32_CHECK) {
>                 struct acpi_hest_ia_machine_check *mc;
>                 mc = (struct acpi_hest_ia_machine_check *)hest_hdr;
>                 len = sizeof(*mc) + mc->num_hardware_banks *
>                         sizeof(struct acpi_hest_ia_error_bank);
> +               mces.mc = mc;
>         } else if (hest_type == ACPI_HEST_TYPE_IA32_DEFERRED_CHECK) {
>                 struct acpi_hest_ia_deferred_check *mc;
>                 mc = (struct acpi_hest_ia_deferred_check *)hest_hdr;
>                 len = sizeof(*mc) + mc->num_hardware_banks *
>                         sizeof(struct acpi_hest_ia_error_bank);
> +               mces.dmc = mc;
>         }
>         BUG_ON(len == -1);
>
>         return len;
>  };
>
> +/*
> + * GHES and GHESv2 structures share the same format, starting from
> + * Source Id and ending in Error Status Block Length (inclusive).
> + */
> +static bool is_ghes_assist_struct(struct acpi_hest_header *hest_hdr)
> +{
> +       struct acpi_hest_generic *ghes;
> +       u16 related_source_id;
> +
> +       if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR &&
> +           hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR_V2)
> +               return false;
> +
> +       ghes = (struct acpi_hest_generic *)hest_hdr;
> +       related_source_id = ghes->related_source_id;
> +
> +       if (mces.cmc && mces.cmc->flags & ACPI_HEST_GHES_ASSIST &&
> +           related_source_id == mces.cmc->header.source_id)
> +               return true;
> +       if (mces.mc && mces.mc->flags & ACPI_HEST_GHES_ASSIST &&
> +           related_source_id == mces.mc->header.source_id)
> +               return true;
> +       if (mces.dmc && mces.dmc->flags & ACPI_HEST_GHES_ASSIST &&
> +           related_source_id == mces.dmc->header.source_id)
> +               return true;
> +
> +       return false;
> +}
> +
>  typedef int (*apei_hest_func_t)(struct acpi_hest_header *hest_hdr, void *data);
>
>  static int apei_hest_parse(apei_hest_func_t func, void *data)
> @@ -114,6 +160,11 @@ static int apei_hest_parse(apei_hest_func_t func, void *data)
>                         return -EINVAL;
>                 }
>
> +               if (is_ghes_assist_struct(hest_hdr)) {
> +                       hest_hdr = (void *)hest_hdr + len;
> +                       continue;
> +               }
> +
>                 rc = func(hest_hdr, data);
>                 if (rc)
>                         return rc;
>
> base-commit: 07a90c3d91505e44a38e74e1588b304131ad8028
> --

Applied as 6.9 material, thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ