lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcxDJ7iHshyZdCp9ngrfFV4j2=HTvXF5naN+E+D_AvcH8n4wg@mail.gmail.com>
Date:   Fri, 11 Feb 2022 12:08:46 -0800
From:   Jue Wang <juew@...gle.com>
To:     Tony Luck <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org,
        patches@...ts.linux.dev
Subject: Re: [PATCH] x86/mce: Add workaround for SKX/CLX/CPX spurious machine checks

Tony and Borislav,

Gently ping?

Thanks,
-Jue

On Tue, Feb 8, 2022 at 7:09 AM Jue Wang <juew@...gle.com> wrote:
>
> The fast string copy instructions ("rep movs*") could consume an
> uncorrectable memory error in the cache line _right after_ the
> desired region to copy and raise an MCE.
>
> Bit 0 of MSR_IA32_MISC_ENABLE can be cleared to disable fast string copy
> and will avoid such spurious machine checks. However, that is less
> preferrable due to the permanent performance impact. Considering memory
> poison is rare, it's desirable to keep fast string enabled until an MCE
> is seen.
>
> Intel has confirmed the following:
> 1. The CPU erratum of fast string copy only applies to
> SKX/CLX/CPL generations.
> 2. Directly return from MCE handler will result in complete execution
> of the fast string copy (rep movs*) with no data loss or corruption.
> 3. Directly return from MCE handler will not result in another MCE
> firing on the next poisoned cache line due to rep movs*.
> 4. Directly return from MCE handler will resume execution from a
> correct point in code.
> 5. Directly return from MCE handler due to any other SRAR MCEs will
> result in the same instruction that triggered the MCE firing a second
> MCE immediately.
> 6. It's not safe to directly return without disabling the fast string
> copy, as the next fast string copy of the same buffer on the same CPU
> would result in a PANIC MCE.
>
> The mitigation in this patch should mitigate the erratum completely with
> the only caveat that the fast string copy is disabled on the affected
> hyper thread thus performance degradation.
>
> This is still better than the OS crashes on MCEs raised on an
> irrelevant process due to 'rep movs*' accesses in a kernel context,
> e.g., copy_page.
>
> Since a host drain / fail-over usually starts right after the first
> MCE is signaled, which results in VM migration or termination, the
> performance degradation is a transient effect.
>
> Tested:
>
> Injected errors on 1st cache line of 8 anonymous pages of process
> 'proc1' and observed MCE consumption from 'proc2' with no panic
> (directly returned).
>
> Without the fix, the host panicked within a few minutes on a
> random 'proc2' process due to kernel access from copy_page.
>
> Signed-off-by: Jue Wang <juew@...gle.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c     | 53 ++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/mce/internal.h |  5 ++-
>  2 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5818b837fd4d..abbd4936dfa8 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -834,6 +834,49 @@ static void quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
>         m->cs = regs->cs;
>  }
>
> +/*
> + * Disable fast string copy and return from the MCE handler upon the first SRAR
> + * MCE on bank 1 due to a CPU erratum on Intel SKX/CLX/CPL CPUs.
> + * The fast string copy instructions ("rep movs*") could consume an
> + * uncorrectable memory error in the cache line _right after_ the
> + * desired region to copy and raise an MCE with RIP pointing to the
> + * instruction _after_ the "rep movs*".
> + * This mitigation addresses the issue completely with the caveat of
> + * performance degradation on the CPU affected. This is still better
> + * than the OS crashes on MCEs raised on an irrelevant process due to
> + * 'rep movs*' accesses in a kernel context (e.g., copy_page).
> + * Since a host drain / fail-over usually starts right after the first
> + * MCE is signaled, which results in VM migration or termination, the
> + * performance degradation is a transient effect.
> + *
> + * Returns true when fast string copy on cpu should be disabled.
> + */
> +static bool quirk_skylake_repmov(void)
> +{
> +       u64 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
> +       u64 misc_enable = __rdmsr(MSR_IA32_MISC_ENABLE);
> +
> +       if ((mcgstatus & MCG_STATUS_LMCES) &&
> +           unlikely(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) {
> +               u64 mc1_status = mce_rdmsrl(MSR_IA32_MCx_STATUS(1));
> +
> +               if ((mc1_status &
> +                    (MCI_STATUS_VAL|MCI_STATUS_OVER|MCI_STATUS_UC|MCI_STATUS_EN|
> +                     MCI_STATUS_ADDRV|MCI_STATUS_MISCV|MCI_STATUS_PCC|
> +                     MCI_STATUS_AR|MCI_STATUS_S)) ==
> +                   (MCI_STATUS_VAL|MCI_STATUS_UC|MCI_STATUS_EN|MCI_STATUS_ADDRV|
> +                    MCI_STATUS_MISCV|MCI_STATUS_AR|MCI_STATUS_S)) {
> +                       msr_clear_bit(MSR_IA32_MISC_ENABLE,
> +                                     MSR_IA32_MISC_ENABLE_FAST_STRING_BIT);
> +                       mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> +                       mce_wrmsrl(MSR_IA32_MCx_STATUS(1), 0);
> +                       pr_err_once("Errata detected, disable fast string copy instructions.\n");
> +                       return true;
> +               }
> +       }
> +       return false;
> +}
> +
>  /*
>   * Do a quick check if any of the events requires a panic.
>   * This decides if we keep the events around or clear them.
> @@ -1403,6 +1446,9 @@ noinstr void do_machine_check(struct pt_regs *regs)
>         else if (unlikely(!mca_cfg.initialized))
>                 return unexpected_machine_check(regs);
>
> +       if (mce_flags.skx_repmov_quirk && quirk_skylake_repmov())
> +               return;
> +
>         /*
>          * Establish sequential order between the CPUs entering the machine
>          * check handler.
> @@ -1858,6 +1904,13 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
>
>                 if (c->x86 == 6 && c->x86_model == 45)
>                         mce_flags.snb_ifu_quirk = 1;
> +
> +               /*
> +                * Skylake, Cascacde Lake and Cooper Lake require a quirk on
> +                * rep movs.
> +                */
> +               if (c->x86 == 6 && c->x86_model == INTEL_FAM6_SKYLAKE_X)
> +                       mce_flags.skx_repmov_quirk = 1;
>         }
>
>         if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
> diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
> index 52c633950b38..cec227c25138 100644
> --- a/arch/x86/kernel/cpu/mce/internal.h
> +++ b/arch/x86/kernel/cpu/mce/internal.h
> @@ -170,7 +170,10 @@ struct mce_vendor_flags {
>         /* SandyBridge IFU quirk */
>         snb_ifu_quirk           : 1,
>
> -       __reserved_0            : 57;
> +       /* Skylake, Cascade Lake, Cooper Lake rep movs quirk */
> +       skx_repmov_quirk                : 1,
> +
> +       __reserved_0            : 56;
>  };
>
>  extern struct mce_vendor_flags mce_flags;
> --
> 2.35.0.263.gb82422642f-goog
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ