linux-kernel - Re: [PATCH] x86/AMD: Apply Erratum 688 fix when BIOS doesn't

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171022110438.jt3erskmxrvz52us@gmail.com>
Date:   Sun, 22 Oct 2017 13:04:38 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Borislav Petkov <bp@...en8.de>
Cc:     X86 ML <x86@...nel.org>, Sherry Hurwitz <sherry.hurwitz@....com>,
        Yazen Ghannam <Yazen.Ghannam@....com>, mirh@...tonmail.ch,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86/AMD: Apply Erratum 688 fix when BIOS doesn't


* Borislav Petkov <bp@...en8.de> wrote:

> From: Borislav Petkov <bp@...e.de>
> 
> Some F14h machines have an erratum which, "under a highly specific
> and detailed set of internal timing conditions" can lead to skipping
> instructions and rIP corruption. Add the fix for those machines when
> their BIOS doesn't apply it or there simply isn't BIOS update for them.
> 
> Signed-off-by: Borislav Petkov <bp@...e.de>
> Tested-by: <mirh@...tonmail.ch>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=197285
> Cc: Sherry Hurwitz <sherry.hurwitz@....com>
> Cc: Yazen Ghannam <Yazen.Ghannam@....com>
> Cc: <stable@...r.kernel.org>
> ---
>  arch/x86/kernel/amd_nb.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 458da8509b75..7ad1dfc8f40e 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -27,6 +27,8 @@ static const struct pci_device_id amd_root_ids[] = {
>  	{}
>  };
>  
> +#define PCI_DEVICE_ID_AMD_CNB17H_F4     0x1704
> +
>  const struct pci_device_id amd_nb_misc_ids[] = {
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB_MISC) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_MISC) },
> @@ -37,6 +39,7 @@ const struct pci_device_id amd_nb_misc_ids[] = {
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F3) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F3) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_DF_F3) },
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F3) },
>  	{}
>  };
>  EXPORT_SYMBOL_GPL(amd_nb_misc_ids);
> @@ -48,6 +51,7 @@ static const struct pci_device_id amd_nb_link_ids[] = {
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F4) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F4) },
>  	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_DF_F4) },
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) },
>  	{}
>  };
>  
> @@ -402,11 +406,46 @@ void amd_flush_garts(void)
>  }
>  EXPORT_SYMBOL_GPL(amd_flush_garts);
>  
> +static void __fix_erratum_688(void *info)
> +{
> +#define MSR_AMD64_IC_CFG 0xC0011021
> +
> +	msr_set_bit(MSR_AMD64_IC_CFG, 3);
> +	msr_set_bit(MSR_AMD64_IC_CFG, 14);
> +}
> +
> +/* Apply erratum 688 fix so machines without a BIOS fix work. */
> +static __init void fix_erratum_688(void)
> +{
> +	struct pci_dev *F4;
> +	u32 val;
> +
> +	if (boot_cpu_data.x86 != 0x14)
> +		return;
> +
> +	if (!amd_northbridges.num)
> +		return;
> +
> +	F4 = node_to_amd_nb(0)->link;
> +	if (!F4)
> +		return;
> +
> +	if (pci_read_config_dword(F4, 0x164, &val))
> +		return;
> +
> +	if (val & BIT(2))
> +		return;
> +
> +	on_each_cpu(__fix_erratum_688, NULL, 0);

Any objections to me adding a printk message that we applied a fix?

	pr_info("x86/cpu/AMD: CPU erratum 688 worked around\n");

or so?

That would also create some pressure for customers to prod manufacturers to prod 
BIOS makers to fix the erratum in a BIOS update or so.

Plus, in the unlikely event that the erratum was not applied due to some other 
erratum, or the erratum was mis-documented, we'd eventually discover that as well.

Thanks,

	Ingo