linux-kernel - RE: [PATCH v3 0/5] AMD64 EDAC: Check for nodes without memory, etc.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <SN6PR12MB263984026F57EC7F3B33B2BFF8780@SN6PR12MB2639.namprd12.prod.outlook.com>
Date:   Thu, 7 Nov 2019 13:47:53 +0000
From:   "Ghannam, Yazen" <Yazen.Ghannam@....com>
To:     Borislav Petkov <bp@...en8.de>
CC:     "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v3 0/5] AMD64 EDAC: Check for nodes without memory, etc.

> -----Original Message-----
> From: linux-edac-owner@...r.kernel.org <linux-edac-owner@...r.kernel.org> On Behalf Of Borislav Petkov
> Sent: Thursday, November 7, 2019 5:39 AM
> To: Ghannam, Yazen <Yazen.Ghannam@....com>
> Cc: linux-edac@...r.kernel.org; linux-kernel@...r.kernel.org
> Subject: Re: [PATCH v3 0/5] AMD64 EDAC: Check for nodes without memory, etc.
> 
> On Wed, Nov 06, 2019 at 08:54:17PM +0100, Borislav Petkov wrote:
> > which are also two attempts.
> >
> > Anyway, I'll queue your set and I'll try to debug that thing because it
> > is getting on my nerves slowly...
> 
> Yah, the problem is that because we have:
> 
> MODULE_DEVICE_TABLE(x86cpu, amd64_cpuids);
> 
> it gets tried on each CPU because an uevent gets dispatched for each
> device, and each CPU is a device.
> 
> That's why I see it twice on this box - it has two CPUs.
> 

Okay, that's makes sense.

BTW, what do you think about loading based on PCI devices? The module
used to do this. I ask because I'm starting to see that future systems may
re-use PCI IDs, and this indicates the same level of hardware support.

> And Greg says making it attempt once per system can't be done. Unless we
> start doing hacks with sending uevents per BSP only which is too much.
> Or we can remember the previous return value of the module init function
> into edac_core but that's nasty too.
> 
> I'm thinking we should simply kill this fat ecc_msg thing which is not
> very useful and be done with it:
> 
> [    5.697275] EDAC MC: Ver: 3.0.0
> [    5.909530] EDAC amd64: F10h detected (node 0).
> [    6.345231] EDAC amd64: Node 0: DRAM ECC disabled.
> [    6.370815] EDAC amd64: F10h detected (node 0).
> [    6.370929] EDAC amd64: Node 0: DRAM ECC disabled.
> 
> That's probably still a bit annoying on a large machine but better than
> nothing.
> 

Yeah, I agree.

> ---
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 3aeb5173e200..0738237e3f09 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3188,18 +3188,6 @@ static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid,
>  		amd64_warn("Error restoring NB MCGCTL settings!\n");
>  }
> 
> -/*
> - * EDAC requires that the BIOS have ECC enabled before
> - * taking over the processing of ECC errors. A command line
> - * option allows to force-enable hardware ECC later in
> - * enable_ecc_error_reporting().
> - */
> -static const char *ecc_msg =
> -	"ECC disabled in the BIOS or no ECC capability, module will not load.\n"
> -	" Either enable ECC checking or force module loading by setting "
> -	"'ecc_enable_override'.\n"
> -	" (Note that use of the override may cause unknown side effects.)\n";
> -
>  static bool ecc_enabled(struct amd64_pvt *pvt)
>  {
>  	u16 nid = pvt->mc_node_id;
> @@ -3246,11 +3234,10 @@ static bool ecc_enabled(struct amd64_pvt *pvt)
>  	amd64_info("Node %d: DRAM ECC %s.\n",
>  		   nid, (ecc_en ? "enabled" : "disabled"));
> 
> -	if (!ecc_en || !nb_mce_en) {
> -		amd64_info("%s", ecc_msg);
> +	if (!ecc_en || !nb_mce_en)
>  		return false;
> -	}
> -	return true;
> +	else
> +		return true;
>  }

Just a nit, but this else seems unnecessary right?

Thanks,
Yazen