linux-kernel - Re: [PATCH 1/2] x86/MCE/AMD, EDAC/mce_amd: Use AMD NodeId for Family17h+ DRAM Decode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200815084212.GA2444151@gmail.com>
Date:   Sat, 15 Aug 2020 10:42:12 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Yazen Ghannam <Yazen.Ghannam@....com>
Cc:     linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
        bp@...e.de, tony.luck@...el.com, x86@...nel.org,
        Smita.KoralahalliChannabasappa@....com
Subject: Re: [PATCH 1/2] x86/MCE/AMD, EDAC/mce_amd: Use AMD NodeId for
 Family17h+ DRAM Decode


* Yazen Ghannam <Yazen.Ghannam@....com> wrote:

> From: Yazen Ghannam <yazen.ghannam@....com>
> 
> The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and
> later systems. This function is used in amd64_edac_mod to do
> system-specific decoding for DRAM ECC errors. The function takes a
> "NodeId" as a parameter.
> 
> In AMD documentation, NodeId is used to identify a physical die in a
> system. This can be used to identify a node in the AMD_NB code and also
> it is used with umc_normaddr_to_sysaddr().
> 
> However, the input used for decode_dram_ecc() is currently the NUMA node
> of a logical CPU. In the default configuration, the NUMA node and
> physical die will be equivalent, so this doesn't have an impact. But the
> NUMA node configuration can be adjusted with optional memory
> interleaving schemes. This will cause the NUMA node enumeration to not
> match the physical die enumeration. The mismatch will cause the address
> translation function to fail or report incorrect results.
> 
> Save the "NodeId" as a percpu value during init in AMD MCE code. Export
> a function to return the value which can be used from modules like
> edac_mce_amd.
> 
> Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID")
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> ---
>  arch/x86/include/asm/mce.h    |  2 ++
>  arch/x86/kernel/cpu/mce/amd.c | 11 +++++++++++
>  drivers/edac/mce_amd.c        |  2 +-
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index cf503824529c..92527cc9ed06 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -343,6 +343,8 @@ extern struct smca_bank smca_banks[MAX_NR_BANKS];
>  extern const char *smca_get_long_name(enum smca_bank_types t);
>  extern bool amd_mce_is_memory_error(struct mce *m);
>  
> +extern u8 amd_cpu_to_node(unsigned int cpu);
> +
>  extern int mce_threshold_create_device(unsigned int cpu);
>  extern int mce_threshold_remove_device(unsigned int cpu);
>  
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 99be063fcb1b..524edf81e287 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -202,6 +202,9 @@ static DEFINE_PER_CPU(unsigned int, bank_map);
>  /* Map of banks that have more than MCA_MISC0 available. */
>  static DEFINE_PER_CPU(u32, smca_misc_banks_map);
>  
> +/* CPUID_Fn8000001E_ECX[NodeId] used to identify a physical node/die. */
> +static DEFINE_PER_CPU(u8, node_id);
> +
>  static void amd_threshold_interrupt(void);
>  static void amd_deferred_error_interrupt(void);
>  
> @@ -233,6 +236,12 @@ static void smca_set_misc_banks_map(unsigned int bank, unsigned int cpu)
>  
>  }
>  
> +u8 amd_cpu_to_node(unsigned int cpu)
> +{
> +	return per_cpu(node_id, cpu);
> +}
> +EXPORT_SYMBOL_GPL(amd_cpu_to_node);
> +
>  static void smca_configure(unsigned int bank, unsigned int cpu)
>  {
>  	unsigned int i, hwid_mcatype;
> @@ -240,6 +249,8 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
>  	u32 high, low;
>  	u32 smca_config = MSR_AMD64_SMCA_MCx_CONFIG(bank);
>  
> +	this_cpu_write(node_id, cpuid_ecx(0x8000001e) & 0xFF);

So we already have this magic number used for a similar purpose, in 
amd_get_topology():

                cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);

                node_id  = ecx & 0xff;

Firstly, could we please at least give 0x8000001e a proper symbolic 
name, use it in hygon.c too (which AFAIK is derived from AMD anyway), 
and then use it in these new patches?

Secondly, why not stick node_id into struct cpuinfo_x86, where the MCA 
code can then use it without having to introduce a new percpu data 
structure?

There's also the underlying assumption that there's only ever going to 
be 256 nodes, which limitation I'm sure we'll hear about in a couple 
of years as not being quite enough. ;-)

So less hardcoding and more generalizations please.

Thanks,

	Ingo